Explaining Missing Heritability Using Gaussian
Process Regression (Reader’s Digest)
Summary
The paper “Explaining Missing Heritability Using Gaussian Process Regression” by Sharp et al. tries to tackle the problem of missing heritability and the detection of higherorder interaction effects through Gaussian process regression, a technique widely used in the machine learning community. The authors obtained estimates of broadsense heritability for a number of mice and yeast phenotypes using an RBF kernel that models higherorder interactions and found these estimates significantly larger than the narrowsense heritability of these phenotypes. The authors also detected several loci displaying interaction effects.
Background
Heritability
In genetics, phenotypes are modeled by the following equation
where is the phenotype measurement of the ith individual, the genotype vector, a random effect term that captures relatedness among individuals, and the environmental noise. Here, is a function that maps the genotype vector into a real number. Under this model, heritability is defined as the proportion of variance in that is due to variation of ,
Different flavors of heritability exist based on the complexity of the function and the input that goes into. In general geneticists work with four types of heritability, as listed below.
 Broadsense heritability: Broadsense heritability is the amount of variance in phenotypes that is due to all genetic variations including both additive and epistatic effects. For , the function can be any function that incorporates any order of interactions between genetic variations. This is the most general definition of heritability.
 Narrowsense heritability: Narrowsense heritability is the amount of variance in phenotypes that is due to all additive genetic effects. For, the function is a linear function that takes in firstorder terms.
 SNP heritability : SNP heritability is the amount of variance in phenotypes that is due to additive genetic effects of a given set of SNPs. For, the function is a linear function that takes in a fixed set of SNPs.
 GWAS heritability: GWAS heritability is the amount of variance in phenotypes that is due to additive genetic effects of GWAS hits. For, the function is a linear function that takes in GWAS hits only.
Based on the definition of the four flavors of heritability, it follows that. The missing heritability problem often refers to the gap between and the narrow / broad sense heritability.
Gaussian Process Regression
Parametric regression problems often involve a function , governed by a set of parameters, that maps each input with a response. For example, in Poisson regression, the distribution of the response variable is characterized by the mean parameter and the density function of Poisson.
Gaussian Process Regression is different from parametric regression in that one does not assume any parametric form for the function . Instead, a Gaussian Process prior assumes that the function values of , , for a number of inputs, , follow a multivariate normal distributionwhere is the kernel matrix, measuring the similarity between samples, that contraints the possible space of . Because the only constraint on the kernel function is that the covariance matrix is positive definite, this enables Gaussian Process Regression to model a broad range of functions.
The following is a list of kernel functions that are widely used (credit to Wikipedia),
Applying Gaussian Process Regression
The Kernel Function
Specifying the kernel function is a fundamental step of Gaussian Process Regression. An appropriate kernel allows one to model interaction of any order among genetic variations. In the Sharp et al. paper, the authors proposed a generalized version of the RBF kernel to measure similarity between two individuals, and , acorss the genotypes of SNPs,
where is a parameter that governs the overall similarity between and , the contribution of SNP to the variations of the phenotype – a large suggests that SNP contributes little to the variation of the phenotype, and a small implies significant contribution. By examining the magnitude of the hyperparameters, one can infer whether a genetic loci contribute significantly to the trait.
SparsityInducing Priors
Overfit may occur when the number of parameters to estimate is larger than the amount of data one has. To avoid overfitting and improve parsimony of the model, the authors imposed a Gamma prior over the inverse of , . The Gamma prior has density function
Setting removes any mode in the density function, resulting in a monotonically decreasing function with a heavy tail concentrated around 0 (see figure below), enforcing most of to be close to zero.
Posterior Distribution of the Parameters
Gaussian Process prior allows one to analytically perform integration over the space of , resulting in a posterior for the parameters
where incoporates the sparsityinducing priors. The integration step effectively averages over all possible f(⋅)f(⋅), discarding the need to estimate each instance of separately. This step also increases power to detect loci that contribute to phenotypes.
There is no analytical solution to the posterior mode or mean of θ. However, sampling based approach (e.g. MCMC) can be used to start from a starting point and lead to the posteior mode. In the Sharp et al. paper, a Hybrid Monte Carlo that models a particle’s trajectory was used to make inference over θ.
Estimating BroadSense Heritability
Once the parameters estimated, one can use these estimates to quantify broadsense heritability from the Gaussian Regression model. The basic idea is as follows:
 For each sample in the training data, one first predicts its phenotype using the estimated parameters.
 The variance of the predicted phenotype can be found analytically using the conditional distribution of multivariate normal.
 The ratio between the sum of each individual’s variance and the phenotype variance gives the broadsense heritability.
We regularly post new content, see more below:
On papers:

Insights from ‘zooming in’ to look at Local Genetics Correlation using Summary Statistics (ρHESS) by Robert Smith, Huwenbo Shi, and Nick Mancuso, February 8th, 2018; based on paper: Shi H., Mancuso N., Spendlove S., Pasaniuc B. Local genetic correlation gives insights into the shared genetic architecture of complex traits
About software:

Visualizing finemapping studies with CANVIS, by Ruth Johnson, January 25th, 2018
Other :

Tips for Formatting A Lot of GWAS Summary Association Statistics Data, by Huwenbo Shi, February 2nd, 2018