About ExPecto

Introduction

ExPecto is a framework for ab initio sequence-based prediction of mutation gene expression effects and disease risks. With this web interface, we provide an explorer of tissue-specific expression effect predictions. The current release contains all single nucleotide substitutions within 1kb and all 1000 Genomes variants that passed a minimum predicted effect threshold (>0.3 log fold-change in any tissue).

The code for predicting expression effects for human genome variants and training new expression models is available at this github repository.

The ExPecto framework is described in the following manuscript:

Jian Zhou, Chandra L. Theesfeld, Kevin Yao, Kathleen M. Chen, Aaron K. Wong, and Olga G. Troyanskaya, Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk, Nature Genetics, 2018 (in press)

Download

Predicted expression effects

This is the bulk download link of all mutation predictions.

Variation potential directionality scores

Variation potential of a gene in a tissue or cell-type can reflect the evolutionary constraint on its expression level. Specifically, we compute the variation potential directionality score as the sum of all mutation effects within 1kb to TSS. A negative variation potential indicates active expression and constraint toward higher expression level, and vice versa. The variation potential directionality scores and the inferred evolution constraint probabilities can be downloaded here.

The full prediction of all 140 million mutation will be made available soon (large files).

More information

ExPecto uses exponential basis function-based linear modeling upon deep convolutional network prediction of chromatin effects. ExPecto predict expression levels directly from sequence and is capable of predicting effect of arbitrary sequence variations (only small variants have been rigorously evaluated).

The chromatin predictions, made from DeepSEA "Beluga", were computed per 200bp bin, and 200 bins centered at TSS (40kb region) were used as input to predict expression effects. To reduce the dimensionality for ExPecto model training, the predictions were based on summarized spatial features computed from prediction chromatin spatial patterns by 10 exponential basis functions. The summarized spatial features and gene expression levels were used to train regularized linear models that make the final step of the predictions.

ExPecto framework also proposes a path toward ab initio disease risk prediction through combining the prediction of expression effects and the estimation of evolution constraints on expression levels, which can in turn be estimated through systematic profiling of mutation effects through in silico mutagenesis. For example, mutations predicted to have strong negative expression effects on a positively constrained gene are predicted to be deleterious. Strictly speaking this approach predict fitness risks, which we expect to highly correlate with disease risks. We showed proof-of-principle evidences of this approach on both curated HGMD disease mutation data and disease GWAS.