GAMuT by epstein-software

An example GAMuT analysis

In the code that follows, we use notation consistent with the corresponding AJHG paper by Broadaway et al. (2016).

The non-printing lines of R code shown in this page are contained in the file 'GAMuT.R' found within the archive file GAMuT.zip.

Preliminary code dependencies

For this example, we will be using the davies() function defined in the R package CompQuadForm. This package is loaded with the library command:

library(CompQuadForm)

The functions used are defined and written in the file GAMuT-functions.R found in the zip archive. For the functions to be recognized, this needs to be sourced:

source("GAMuT-functions.R")

All of these functions have the substring 'GAMuT' in their name. Thus, to check the availability of these functions a filtered list can be generated:

ls(pattern="GAMuT")

## [1] "linear_GAMuT_geno"  "linear_GAMuT_pheno" "proj_GAMuT_pheno"  
## [4] "TestGAMuT"

Step 1: Read in `PHENO`, `GENO`, and `COVAR` matrices

The sample phenotype, genotype, and covariate data are provided in the respective files, phenotypes.csv, variants.csv, and covariates.csv. These files are also provided within the zip archive file. We read this data into R with the following commands:

GENO  <- read.csv("variants.csv")
PHENO <- read.csv("phenotypes.csv")
COVAR <- read.csv("covariates.csv")

There are 368 columns in GENO, so here we print only the first eight columns and four rows using head():

head(GENO[,1:8], n=4)

##   V1 V2 V3 V4 V5 V6 V7 V8
## 1  0  0  0  0  0  0  0  0
## 2  0  0  0  0  0  0  0  0
## 3  0  0  0  0  0  0  0  0
## 4  0  0  0  0  0  0  0  0

The first four rows and all columns of the other two arrays are similarly checked:

head(PHENO, n=4)

##   Phenotype1 Phenotype2 Phenotype3 Phenotype4 Phenotype5 Phenotype6
## 1    -1.5761    -0.5599   -0.96585    -1.1779    -0.7366    -0.8885
## 2     0.5411     1.3688   -0.01711     0.7002     2.0259     1.9142
## 3    -0.5115    -0.2910   -0.03845    -0.7068    -0.0155     0.3282
## 4    -0.5073     0.2174    0.49525     0.3535    -0.7514     1.0904

head(COVAR, n=4)

##     COVAR1 COVAR2
## 1 -0.01153 0.2955
## 2  1.38593 0.2443
## 3  0.08433 0.0934
## 4  0.49490 2.7669

Step 2: Residualize the phenotypes in `PHENO` on the covariates in `COVAR`

Note: Each phenotype in PHENO will be residualized on each covariate in COVAR. If you wish to only residualize phenotypes on a subset of covariates in COVAR, make sure to make appropriate adjustments to this block of code:

# Matrix of residualized phenotypes:
residuals = matrix(NA, nrow=nrow(PHENO), ncol=ncol(PHENO))

for(l in 1:ncol(PHENO)){
  model.residual = lm(PHENO[,l] ~ as.matrix(COVAR))
  residuals[,l] = resid(model.residual)
}

The first six rows of the 1000x6 residuals matrix are:

head(residuals, n=4)

##         [,1]     [,2]     [,3]     [,4]    [,5]    [,6]
## [1,] -1.5964 -0.53340 -1.01969 -1.21835 -0.8288 -0.9447
## [2,]  0.3916  1.22380 -0.31997  0.46899  1.6950  1.5994
## [3,] -0.5360 -0.26521 -0.09175 -0.74692 -0.1035  0.2680
## [4,] -0.6326  0.04302  0.13016  0.07602 -1.1876  0.7673

Step 3: Form the phenotypic similarity matrix and corresponding eigenvalues based on residualized phenotypes from Step 2

## centered & scaled matrix of residualized phenotypes:
P0 <- apply(residuals, 2, scale, scale=T, center=T)
P  <- as.matrix(P0) # convert dataframes into matrix

For this example, we construct Yc using the projection matrix: The function proj_GAMuT_pheno() constructs the projection matrix and corresponding eigenvalues:

proj_pheno = proj_GAMuT_pheno(P)
Yc = proj_pheno$Kc                # projection matrix
lambda_Y = proj_pheno$ev_Kc       # eigenvalues of Yc

If one wishes to construct Yc using the linear kernel instead of projection matrix, then use the linear_GAMuT_pheno() function instead. Uncommenting the following code would produce the linear kernel and the corresponding eigenvalues:

## linear_pheno <- linear_GAMuT_pheno(P) 
## Yc = linear_pheno$Kc           # linear kernel similarity matrix
## lambda_Y = linear_pheno$ev_Kc  # eigenvalues of Yc

Step 4: Form the genotypic similarity matrix and corresponding eigenvalues

We assume weighted linear kernel for genotypes where weights are function of minor-allele frequency (MAF). We apply the beta-distribution MAF weights of Wu et al. (2011) in the SKAT paper:

MAF = colMeans(GENO)/2                      # sample MAF of each variant in the sample
beta_weight = dbeta(MAF,1,25)/dbeta(0,1,25) # assume beta-distribution weights

Note: one can use other weight functions by simply recoding beta_weight as desired.
The first eight (out of 368) components for the MAF array are:

MAF[1:8]

##     V1     V2     V3     V4     V5     V6     V7     V8 
## 0.0010 0.0005 0.0010 0.0005 0.0005 0.0010 0.0010 0.0015

Then the weighted rare variants and the centered genotype matrix are calculated by

G0 = as.matrix(GENO)%*%diag(beta_weight) # Weighted rare variants
G  = as.matrix(scale(G0,center=T,scale=F))  # centered genotype matrix

Next we call the function linear_GAMuT_geno() to construct the weighted linear kernel matrix for genotypes along with the eigenvalues:

linear_geno <- linear_GAMuT_geno(G)
Xc <- linear_geno$Lc                        # linear kernel similarity matrix
lambda_X <- linear_geno$ev_Lc               # eigenvalues of Xc

Step 5: Construct the GAMuT and obtain the p-value

Finally, the pvalue is calculated using the function TestGAMuT():

GAMuT_pvalue <- TestGAMuT(Yc,lambda_Y,Xc,lambda_X)
GAMuT_pvalue

## [1] 0.02144

GAMuT