APEX: cis-xQTL analysis guide
This page describes cis-xQTL analysis using APEX. Once installed, you can quickly get started by running ./apex cis --help
.
Overview
cis-xQTL analysis in APEX uses either a) ordinary least squares (OLS) or b) a linear mixed model (LMM) fit by restricted maximum likelihood (REML). For OLS, APEX requires 3 input files: molecular trait data, technical covariate data, and genotype data. LMM can be used to account for either cryptic familial relatedness using a kinship or genetic relatedness matrix (GRM), or to account for technical and biological variation using a low-rank matrix of random-effect covariates. For detailed descriptions of input file formats, please see the input file documentation page.
Table of Contents
OLS cis-xQTL analysis
Example command:
./apex cis --vcf {vcf} --bed {trait-file} --cov {covariate-file} --prefix {out-name} --long
Output files. The above command generates 3 output files, {out-name}.cis_sumstats.tsv.gz
, {out-name}.cis_gene_table.tsv.gz
, {out-name}.cis_long_table.tsv.gz
. The cis_sumstats
output file contains association score statistics in a condensed format, which can be used for downstream analysis with the command ./apex meta
. Human-readable output files are described below:
*.cis_long_table.tsv.gz
(flag --long
) columns:
#chrom
: Variant chromosome.pos
: Variant chromosomal position (basepairs).ref
: Variant reference allele (A
,C
,T
, orG
).alt
: Variant alternate allele.gene
: Molecular trait identifier (as specified in--bed {trait-file}
).beta
: OLS regression slope for variant on trait.se
: Standard error of regression slope.pval
: Single-variant association nominal p-value.
*.cis_gene_table.tsv.gz
columns:
#chrom
: Molecular trait chromosome.start
: Molecular trait start position.end
: Molecular trait end position.gene
: Molecular trait identifier.gene_pval
: Trait-level p-value calculated across all variants in the cis region using the Cauchy combination test, comparable to beta-approximated permutation p-values.n_samples
: Number of samples included in analysis.n_covar
: Number of covariates included in analysis, including intercept.resid_sd
: Square root of regression mean squared error under the null model.n_cis_variants
: Number of variants in the cis region (which were used to calculategene_pval
).
QTL software concordance. When no GRM is specified, APEX single-variant output is numerically equivalent to the R regression model lm(traits[,j] ~ covariates + genotype[,k])
for each trait j
and genotype k
. APEX output is additionally equivalent to FastQTL single-variant output. Note that some tools, such as QTLtools, instead fit the model lm(residuals[,j] ~ genotype[,k])
where residuals[,j] = resid(lm(traits[,j] ~ covariates))
. APEX can mimic this model if the flag --no-resid-geno
is specified. This approach is slightly faster than standard OLS, but can cause conservative p-values (loss of statistical power).
LMM cis-xQTL analysis
Example command:
./apex cis --vcf {vcf} --bed {expression-file} --cov {covariate-file} --grm {grm-file} --prefix {out-name}
Here, APEX uses a linear mixed model (LMM) to account for cryptic or familial relatedness in cis-eQTL analysis of the form where and . To use this feature, specify a genetic relatedness matrix (GRM) file to APEX using --grm {grm-file}
. Output files and options are otherwise similar to those from OLS cis-xQTL analysis (when --grm
is not specified). See here for accepted input file formats.
Example command:
./apex cis --vcf {vcf} --bed {trait-file} --cov {covariate-file} --prefix {out-name} --long
Output files. Output files from LMM analysis are broadly similar to OLS. One additional output file, {out-name}.theta.gz
, contains variance component parameter estimates from the LMM. The first 4 columns of this file list trait chromosomal position and identifier, and columns 5-7 list the residual variance component estimate (independent error variance), heritable variance component estimate , and their ratio .
- Genetic variance component estimate (due to GRM).
- Residual-genetic variance ratio. LMM software concordance. APEX’s LMM estimates are consistent (nearly numerically equivalent) with the R packages GMMAT and GENESIS using AI-REML.
Command line arguments
A partial list of options is given below. Please run ./apex cis --help
to see a complete list of command line flags and options.
- General options
--window {BP}
,-w {BP}
: Window size in base pairs for cis-xQTL analysis. Only variant-trait pairs within BP upstream or downstream of trait TSS will be analyzed (default: 1Mb, or1000000
).
- Output options
--prefix
,-o
: Output file prefix.--long
,-l
: Write cis-eQTL results in long-table format.
- Scale and transform options
--rankNormal
: Apply rank normal transform to trait values.--rankNormal-resid
: Apply rank normal transform to residuals (can be used with rankNormal). [Not compatible with LMM].--no-resid-geno
: Do not residualize genotypes (not recommended). Output using this flag is concordant with QTLtools and some other tools.
- Computational resources
--threads {N}
: No. threads to be used (not to exceed no. available cores).--low-mem
: Reduce memory usage by reading and processing genotypes in chunks.
- Subsetting samples
--exclude-iids {LIST}
: Comma-delimited list of sample IDs to exclude.--include-iids {LIST}
: Only include the specified comma-delimited sample IDs.
- Filtering regions and variants
--region {chr:start-end}
: Only analysis variants and traits within specified region.--gene {LIST}
: Only analyze the specified comma-delimited molecular traits IDs.--exclude-snps {LIST}
: Comma-delimited list of SNPs to exclude.--include-snps {LIST}
: Only include the specified comma-delimited SNPs.