View on GitHub

apex

Toolkit for QTL mapping and meta-analysis.

APEX: cis-xQTL analysis guide

This page describes cis-xQTL analysis using APEX. Once installed, you can quickly get started by running ./apex cis --help.

Overview

cis-xQTL analysis in APEX uses either a) ordinary least squares (OLS) or b) a linear mixed model (LMM) fit by restricted maximum likelihood (REML). For OLS, APEX requires 3 input files: molecular trait data, technical covariate data, and genotype data. LMM can be used to account for either cryptic familial relatedness using a kinship or genetic relatedness matrix (GRM), or to account for technical and biological variation using a low-rank matrix of random-effect covariates. For detailed descriptions of input file formats, please see the input file documentation page.

Table of Contents
  1. OLS cis-xQTL analysis
  2. LMM cis-xQTL analysis with a GRM
  3. Command line options

Return to APEX main page.

OLS cis-xQTL analysis

Example command:
./apex cis --vcf {vcf} --bed {trait-file} --cov {covariate-file} --prefix {out-name} --long

Output files. The above command generates 3 output files, {out-name}.cis_sumstats.tsv.gz, {out-name}.cis_gene_table.tsv.gz, {out-name}.cis_long_table.tsv.gz. The cis_sumstats output file contains association score statistics in a condensed format, which can be used for downstream analysis with the command ./apex meta. Human-readable output files are described below:

*.cis_long_table.tsv.gz (flag --long) columns:

  1. #chrom : Variant chromosome.
  2. pos : Variant chromosomal position (basepairs).
  3. ref : Variant reference allele (A, C, T, or G).
  4. alt : Variant alternate allele.
  5. gene : Molecular trait identifier (as specified in --bed {trait-file}).
  6. beta : OLS regression slope for variant on trait.
  7. se : Standard error of regression slope.
  8. pval : Single-variant association nominal p-value.

*.cis_gene_table.tsv.gz columns:

  1. #chrom : Molecular trait chromosome.
  2. start : Molecular trait start position.
  3. end : Molecular trait end position.
  4. gene : Molecular trait identifier.
  5. gene_pval : Trait-level p-value calculated across all variants in the cis region using the Cauchy combination test, comparable to beta-approximated permutation p-values.
  6. n_samples : Number of samples included in analysis.
  7. n_covar : Number of covariates included in analysis, including intercept.
  8. resid_sd : Square root of regression mean squared error under the null model.
  9. n_cis_variants : Number of variants in the cis region (which were used to calculate gene_pval).

QTL software concordance. When no GRM is specified, APEX single-variant output is numerically equivalent to the R regression model lm(traits[,j] ~ covariates + genotype[,k]) for each trait j and genotype k. APEX output is additionally equivalent to FastQTL single-variant output. Note that some tools, such as QTLtools, instead fit the model lm(residuals[,j] ~ genotype[,k]) where residuals[,j] = resid(lm(traits[,j] ~ covariates)). APEX can mimic this model if the flag --no-resid-geno is specified. This approach is slightly faster than standard OLS, but can cause conservative p-values (loss of statistical power).

LMM cis-xQTL analysis

Example command:
./apex cis --vcf {vcf} --bed {expression-file} --cov {covariate-file} --grm {grm-file} --prefix {out-name}

Here, APEX uses a linear mixed model (LMM) to account for cryptic or familial relatedness in cis-eQTL analysis of the form where and . To use this feature, specify a genetic relatedness matrix (GRM) file to APEX using --grm {grm-file}. Output files and options are otherwise similar to those from OLS cis-xQTL analysis (when --grm is not specified). See here for accepted input file formats.

Example command:
./apex cis --vcf {vcf} --bed {trait-file} --cov {covariate-file} --prefix {out-name} --long

Output files. Output files from LMM analysis are broadly similar to OLS. One additional output file, {out-name}.theta.gz, contains variance component parameter estimates from the LMM. The first 4 columns of this file list trait chromosomal position and identifier, and columns 5-7 list the residual variance component estimate (independent error variance), heritable variance component estimate , and their ratio .

  1. Genetic variance component estimate (due to GRM).
  2. Residual-genetic variance ratio. LMM software concordance. APEX’s LMM estimates are consistent (nearly numerically equivalent) with the R packages GMMAT and GENESIS using AI-REML.

Command line arguments

A partial list of options is given below. Please run ./apex cis --help to see a complete list of command line flags and options.