Motivation: MEDLINE®/PubMed® currently indexes over 18 million biomedical articles, providing unprecedented opportunity and challenges for text analysis. Using Medical Subject Heading Overrepresentation Profiles (MeSHOPs), an entity of interest can be robustly summarized, quantitatively identifying associated biomedical terms and suggesting indirect associations.

Results: A procedure is introduced for quantitative representa-tion of MeSH annotations assigned to any group of articles (e.g articles for a specific gene). Similarity scores comparing MeSHOPs of genes and diseases successfully infer association of novel disease terms to genes, validated by future publications. Results indicate the number of papers for a gene or disease has a strong influence on predicted associations. Up to 16% improvement in predictive performance over baselines was obtained using MeSHOP comparisons.

Read the preliminary paper ( PDF ).

See the research webpage for more details.

Online Resources

Please bear with some instability as we migrate our database servers and update the datasets.
Examine all terms associated with a gene, or all genes associated with a term
Examine all terms associated with a disease, or all diseases associated with a term.
Examine all terms associated with a chemical compound, or all chemical compounds associated with a term.
Examine all terms over-represented in the top 50 results for a PubMed query.
Examine all terms over-represented in a specified set of PubMed articles.

Examine Gene-Disease Profile MeSH Term Overlap
Browse Predicted Indirect Gene-Disease Predictions and Validation Sets

Browse Predicted Indirect Drug-Disease Predictions

Fetch all PubMed articles on a MeSH term topic associated with a list of genes

Download Results

All Human Gene-Disease Associations predicted via literature profiles: WARNING - Very Large File (8.4G, 2010-07-22)
Gene-Disease Co-Occurrence in MEDLINE Validation Set: New relationships established between genes and diseases via gene2pubmed after 2008(8.3M, Relationships 2008-2010)
Drug-Disease Gold Standard: Validation set taken from PREDICT(Gottlieb, Stein, Ruppin, Sharan 2011) - DrugBank Drugs mapped to MeSH - OMIM diseases mapped to MeSH Source code for computing direct associations and profile-based predictions
Source code for computing validation statistics
PubMed Baseline 2013, MeSH 2013, Entrez Gene 2013-02