Motivation: MEDLINE®/PubMed® currently indexes over 18 million biomedical articles, providing unprecedented opportunity and challenges for text analysis. Using Medical Subject Heading Overrepresentation Profiles (MeSHOPs), an entity of interest can be robustly summarized, quantitatively identifying associated biomedical terms and suggesting indirect associations.
Results: A procedure is introduced for quantitative representa-tion of MeSH annotations assigned to any group of articles (e.g articles for a specific gene). Similarity scores comparing MeSHOPs of genes and diseases successfully infer association of novel disease terms to genes, validated by future publications. Results indicate the number of papers for a gene or disease has a strong influence on predicted associations. Up to 16% improvement in predictive performance over baselines was obtained using MeSHOP comparisons.
Read the preliminary paper ( PDF ).
See the research webpage for more details.
Online ResourcesPlease bear with some instability as we migrate our database servers and update the datasets.
Examine Gene-Disease Profile MeSH Term Overlap
Browse Predicted Indirect Gene-Disease Predictions and Validation Sets
Browse Predicted Indirect Drug-Disease Predictions
Fetch all PubMed articles on a MeSH term topic associated with a list of genes
Download ResultsAll Human Gene-Disease Associations predicted via literature profiles: WARNING - Very Large File (8.4G, 2010-07-22)
Gene-Disease Co-Occurrence in MEDLINE Validation Set: New relationships established between genes and diseases via gene2pubmed after 2008(8.3M, Relationships 2008-2010)
Drug-Disease Gold Standard: Validation set taken from PREDICT(Gottlieb, Stein, Ruppin, Sharan 2011) - DrugBank Drugs mapped to MeSH - OMIM diseases mapped to MeSH Source code for computing direct associations and profile-based predictions
Source code for computing validation statistics
PubMed Baseline 2013, MeSH 2013, Entrez Gene 2013-02