APEX: cis-xQTL analysis guide
This page describes cis-xQTL analysis using APEX. Once installed, you can quickly get started by running ./apex cis --help.
Overview
cis-xQTL analysis in APEX uses either a) ordinary least squares (OLS) or b) a linear mixed model (LMM) fit by restricted maximum likelihood (REML). For OLS, APEX requires 3 input files: molecular trait data, technical covariate data, and genotype data. LMM can be used to account for either cryptic familial relatedness using a kinship or genetic relatedness matrix (GRM), or to account for technical and biological variation using a low-rank matrix of random-effect covariates. For detailed descriptions of input file formats, please see the input file documentation page.
Table of Contents
OLS cis-xQTL analysis
Example command:
./apex cis --vcf {vcf} --bed {trait-file} --cov {covariate-file} --prefix {out-name} --long
Output files. The above command generates 3 output files, {out-name}.cis_sumstats.tsv.gz, {out-name}.cis_gene_table.tsv.gz, {out-name}.cis_long_table.tsv.gz. The cis_sumstats output file contains association score statistics in a condensed format, which can be used for downstream analysis with the command ./apex meta. Human-readable output files are described below:
*.cis_long_table.tsv.gz (flag --long) columns:
#chrom: Variant chromosome.pos: Variant chromosomal position (basepairs).ref: Variant reference allele (A,C,T, orG).alt: Variant alternate allele.gene: Molecular trait identifier (as specified in--bed {trait-file}).beta: OLS regression slope for variant on trait.se: Standard error of regression slope.pval: Single-variant association nominal p-value.
*.cis_gene_table.tsv.gz columns:
#chrom: Molecular trait chromosome.start: Molecular trait start position.end: Molecular trait end position.gene: Molecular trait identifier.gene_pval: Trait-level p-value calculated across all variants in the cis region using the Cauchy combination test, comparable to beta-approximated permutation p-values.n_samples: Number of samples included in analysis.n_covar: Number of covariates included in analysis, including intercept.resid_sd: Square root of regression mean squared error under the null model.n_cis_variants: Number of variants in the cis region (which were used to calculategene_pval).
QTL software concordance. When no GRM is specified, APEX single-variant output is numerically equivalent to the R regression model lm(traits[,j] ~ covariates + genotype[,k]) for each trait j and genotype k. APEX output is additionally equivalent to FastQTL single-variant output. Note that some tools, such as QTLtools, instead fit the model lm(residuals[,j] ~ genotype[,k]) where residuals[,j] = resid(lm(traits[,j] ~ covariates)). APEX can mimic this model if the flag --no-resid-geno is specified. This approach is slightly faster than standard OLS, but can cause conservative p-values (loss of statistical power).
LMM cis-xQTL analysis
Example command:
./apex cis --vcf {vcf} --bed {expression-file} --cov {covariate-file} --grm {grm-file} --prefix {out-name}
Here, APEX uses a linear mixed model (LMM) to account for cryptic or familial relatedness in cis-eQTL analysis of the form where
and
. To use this feature, specify a genetic relatedness matrix (GRM) file to APEX using
--grm {grm-file}. Output files and options are otherwise similar to those from OLS cis-xQTL analysis (when --grm is not specified). See here for accepted input file formats.
Example command:
./apex cis --vcf {vcf} --bed {trait-file} --cov {covariate-file} --prefix {out-name} --long
Output files. Output files from LMM analysis are broadly similar to OLS. One additional output file, {out-name}.theta.gz, contains variance component parameter estimates from the LMM. The first 4 columns of this file list trait chromosomal position and identifier, and columns 5-7 list the residual variance component estimate (independent error variance), heritable variance component estimate
, and their ratio
.
- Genetic variance component estimate (due to GRM).
- Residual-genetic variance ratio. LMM software concordance. APEX’s LMM estimates are consistent (nearly numerically equivalent) with the R packages GMMAT and GENESIS using AI-REML.
Command line arguments
A partial list of options is given below. Please run ./apex cis --help to see a complete list of command line flags and options.
- General options
--window {BP},-w {BP}: Window size in base pairs for cis-xQTL analysis. Only variant-trait pairs within BP upstream or downstream of trait TSS will be analyzed (default: 1Mb, or1000000).
- Output options
--prefix,-o: Output file prefix.--long,-l: Write cis-eQTL results in long-table format.
- Scale and transform options
--rankNormal: Apply rank normal transform to trait values.--rankNormal-resid: Apply rank normal transform to residuals (can be used with rankNormal). [Not compatible with LMM].--no-resid-geno: Do not residualize genotypes (not recommended). Output using this flag is concordant with QTLtools and some other tools.
- Computational resources
--threads {N}: No. threads to be used (not to exceed no. available cores).--low-mem: Reduce memory usage by reading and processing genotypes in chunks.
- Subsetting samples
--exclude-iids {LIST}: Comma-delimited list of sample IDs to exclude.--include-iids {LIST}: Only include the specified comma-delimited sample IDs.
- Filtering regions and variants
--region {chr:start-end}: Only analysis variants and traits within specified region.--gene {LIST}: Only analyze the specified comma-delimited molecular traits IDs.--exclude-snps {LIST}: Comma-delimited list of SNPs to exclude.--include-snps {LIST}: Only include the specified comma-delimited SNPs.