HumanBase Data Sources

HumanBase's source data is refreshed periodically from NCBI, Gene Ontology, MSigDB and other external repositories. The tabs below describe the data in effect during each era.

Networks

Tissue and cell-lineage interaction networks built from integrated genomics data.

NameDetail
MAGE289 tissue/cell-type-specific functional networks plus one global functional network, built from 7,463 studies using a two-stage masked graph autoencoder and tissue-specific gradient-boosting integration framework
GIANT144 tissue/cell-lineage-specific functional networks plus one global functional network (Greene et al. 2015), built from 987 studies

Gene records

Per-gene identifiers, symbols, types, descriptions, and cross-references used across HumanBase tools.

NameVersionUsed in
Homo sapiens gene records (NCBI Gene)2022-05-30Foundational human gene records (Entrez IDs, symbols, descriptions, cross-references, aliases) used across all HumanBase tools, including the gene metadata shown for entries in the GIANT human compendium
MAGE-curated gene records (NCBI Gene)2026-01-08Gene index for the MAGE human compendium (36,077 genes including ncRNA, pseudo, snoRNA, and other types not present in GIANT)

Ontologies

Hierarchies of terms used in HumanBase enrichment analyses and predictions.

NameVersionUsed in
Gene Ontology (Biological Process)releases/2018-05-14FMD enrichment (GIANT + MAGE), Networks: 'Process' panel and gene predictions, SEEK (beta)
Disease Ontologyreleases/2017-06-13FMD enrichment (GIANT + MAGE), Networks: 'Process' panel and gene predictions
Uberonuberon/releases/2022-06-30/ext.owlDataset tissue tags in SEEK (beta) metadata (not a term-enrichment source)
BRENDA Ontologyreleases/2021-10-26Networks: 'Tissue' panel

Gene annotations

Gene-to-term mappings used in enrichment analyses, term predictions, and curated gene scores.

NameVersionUsed in
GO BP annotations: Homo sapiens (NCBI gene2go)2026-03-16FMD enrichment (MAGE)
GO BP annotations: Homo sapiens (UniProt-GOA)2018-03-26FMD enrichment (GIANT), Networks: 'Process' panel and gene predictions, SEEK (beta)
GO BP annotations: Mus musculus, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae (GAF)2022-05-17SEEK (beta)
MSigDB Hallmark (H)MSigDB_2023.2.HsFMD enrichment (GIANT + MAGE)
MSigDB Canonical Pathways (C2-CP)MSigDB_2023.2.HsFMD enrichment (GIANT + MAGE)
SFARI gene scoresSFARI-Gene_human-gene-scores_release_04-17-2018Networks: 'Process' panel
OMIM2018-05-17Folded into Disease Ontology terms (not a separate annotation source)
HPRD tissue annotationsRelease 9Source for BRENDA tissue mapping at load time (not directly visible as terms)

Network compendia

Genomics studies used to build the networks.

NameStudiesDownload
Human Compendium (MAGE)7,463mage_human_compendium.tsv
Human Compendium (GIANT)990human_compendium.tsv

SEEK (beta) data compendia

Genomics studies used by the SEEK (beta) gene-coexpression search engine.

NameStudiesDownload
modSeek homo-sapiens12,842modseek_human.tsv
modSeek mus-musculus12,056modseek_mouse.tsv
modSeek caenorhabditis-elegans398modseek_worm.tsv
modSeek drosophila-melanogaster786modseek_fly.tsv
modSeek danio-rerio343modseek_zebrafish.tsv
modSeek saccharomyces-cerevisiae400modseek_yeast.tsv

Variant effect prediction models

Training data sources for deep learning frameworks.

ModelTraining Data Sources
DeepSEA (Beluga)Trained on 2,002 chromatin profile features, including DNase hypersensitivity and ChIP-seq profiles. The complete list of features used to train Beluga can be found at this CSV file.
SeiTrained on 21,907 chromatin profiles, including DNase hypersensitivity, ChIP-seq, and ATAC-seq profiles. The list of profiles incorporated in the Sei model can be found in Supplementary Table 1 of Chen et al 2022.
ExPectoIncorporated the Beluga chromatin profile model as well as 218 tissue expression profiles from GTEx, Roadmap Epigenomics and ENCODE (Zhou et al. 2018).
ExPectoSCIncorporated the Beluga chromatin profile model as well as single-cell gene expression profiles for over 100 cell types from seven organ systems (listed in Supplementary Table 1 of Sokolova et al. 2023).
SeqweaverTrained using data from 232 cross-linking immunoprecipitation (CLIP) based RNA binding protein datasets (Supplementary Table 1 of Park et al. 2021).