HumanBase's source data is refreshed periodically from NCBI, Gene Ontology, MSigDB and other external repositories. The tabs below describe the data in effect during each era.
Tissue and cell-lineage interaction networks built from integrated genomics data.
| Name | Detail |
|---|---|
| MAGE | 289 tissue/cell-type-specific functional networks plus one global functional network, built from 7,463 studies using a two-stage masked graph autoencoder and tissue-specific gradient-boosting integration framework |
| GIANT | 144 tissue/cell-lineage-specific functional networks plus one global functional network (Greene et al. 2015), built from 987 studies |
Per-gene identifiers, symbols, types, descriptions, and cross-references used across HumanBase tools.
| Name | Version | Used in |
|---|---|---|
| Homo sapiens gene records (NCBI Gene) | 2022-05-30 | Foundational human gene records (Entrez IDs, symbols, descriptions, cross-references, aliases) used across all HumanBase tools, including the gene metadata shown for entries in the GIANT human compendium |
| MAGE-curated gene records (NCBI Gene) | 2026-01-08 | Gene index for the MAGE human compendium (36,077 genes including ncRNA, pseudo, snoRNA, and other types not present in GIANT) |
Hierarchies of terms used in HumanBase enrichment analyses and predictions.
| Name | Version | Used in |
|---|---|---|
| Gene Ontology (Biological Process) | releases/2018-05-14 | FMD enrichment (GIANT + MAGE), Networks: 'Process' panel and gene predictions, SEEK (beta) |
| Disease Ontology | releases/2017-06-13 | FMD enrichment (GIANT + MAGE), Networks: 'Process' panel and gene predictions |
| Uberon | uberon/releases/2022-06-30/ext.owl | Dataset tissue tags in SEEK (beta) metadata (not a term-enrichment source) |
| BRENDA Ontology | releases/2021-10-26 | Networks: 'Tissue' panel |
Gene-to-term mappings used in enrichment analyses, term predictions, and curated gene scores.
| Name | Version | Used in |
|---|---|---|
| GO BP annotations: Homo sapiens (NCBI gene2go) | 2026-03-16 | FMD enrichment (MAGE) |
| GO BP annotations: Homo sapiens (UniProt-GOA) | 2018-03-26 | FMD enrichment (GIANT), Networks: 'Process' panel and gene predictions, SEEK (beta) |
| GO BP annotations: Mus musculus, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae (GAF) | 2022-05-17 | SEEK (beta) |
| MSigDB Hallmark (H) | MSigDB_2023.2.Hs | FMD enrichment (GIANT + MAGE) |
| MSigDB Canonical Pathways (C2-CP) | MSigDB_2023.2.Hs | FMD enrichment (GIANT + MAGE) |
| SFARI gene scores | SFARI-Gene_human-gene-scores_release_04-17-2018 | Networks: 'Process' panel |
| OMIM | 2018-05-17 | Folded into Disease Ontology terms (not a separate annotation source) |
| HPRD tissue annotations | Release 9 | Source for BRENDA tissue mapping at load time (not directly visible as terms) |
Genomics studies used to build the networks.
| Name | Studies | Download |
|---|---|---|
| Human Compendium (MAGE) | 7,463 | mage_human_compendium.tsv |
| Human Compendium (GIANT) | 990 | human_compendium.tsv |
Genomics studies used by the SEEK (beta) gene-coexpression search engine.
| Name | Studies | Download |
|---|---|---|
| modSeek homo-sapiens | 12,842 | modseek_human.tsv |
| modSeek mus-musculus | 12,056 | modseek_mouse.tsv |
| modSeek caenorhabditis-elegans | 398 | modseek_worm.tsv |
| modSeek drosophila-melanogaster | 786 | modseek_fly.tsv |
| modSeek danio-rerio | 343 | modseek_zebrafish.tsv |
| modSeek saccharomyces-cerevisiae | 400 | modseek_yeast.tsv |
Training data sources for deep learning frameworks.
| Model | Training Data Sources |
|---|---|
| DeepSEA (Beluga) | Trained on 2,002 chromatin profile features, including DNase hypersensitivity and ChIP-seq profiles. The complete list of features used to train Beluga can be found at this CSV file. |
| Sei | Trained on 21,907 chromatin profiles, including DNase hypersensitivity, ChIP-seq, and ATAC-seq profiles. The list of profiles incorporated in the Sei model can be found in Supplementary Table 1 of Chen et al 2022. |
| ExPecto | Incorporated the Beluga chromatin profile model as well as 218 tissue expression profiles from GTEx, Roadmap Epigenomics and ENCODE (Zhou et al. 2018). |
| ExPectoSC | Incorporated the Beluga chromatin profile model as well as single-cell gene expression profiles for over 100 cell types from seven organ systems (listed in Supplementary Table 1 of Sokolova et al. 2023). |
| Seqweaver | Trained using data from 232 cross-linking immunoprecipitation (CLIP) based RNA binding protein datasets (Supplementary Table 1 of Park et al. 2021). |