R/STAAR [Manual] [Tutorial]

An R package for performing variant-Set Test for Association using Annotation infoRmation (STAAR) procedure in whole-genome sequencing studies.

STAAR is a general framework that incorporates both qualitative functional categories and quantitative complementary functional annotations using an omnibus multi-dimensional weighting scheme. STAAR accounts for population structure and relatedness, and is scalable for analyzing large WGS studies of continuous and dichotomous traits.

Version 0.9.8 (February 7, 2025)


R/MetaSTAAR [Manual]

An R package for performing Meta-analysis of variant-Set Test for Association using Annotation infoRmation (MetaSTAAR) procedure in whole-genome sequencing (WGS) studies

MetaSTAAR is a powerful and resource-efficient rare variant (RV) meta-analysis framework scalable to large WGS studies. MetaSTAAR accounts for relatedness and population structure for both quantitative and dichotomous traits and boosts the power of RV tests by incorporating multiple variant functional annotations.

Version 0.9.6.3 (February 5, 2024)


R/STAARpipeline and R/STAARpipelineSummary [Manual 1, 2] [Tutorial]

R packages for performing association analysis of whole-genome/whole-exome sequencing (WGS/WES) studies using STAAR pipeline.

STAARpipeline is a resource-efficient and powerful WGS/WES association analysis pipeline. STAARpipeline provides a streamlined rare variant association-detection framework for sequencing data, including gene-centric analysis and non-gene-centric analysis using a variety of coding and noncoding functional categories (masks), conditional analysis to identify variant set signals independent of nearby common variants, and visualization of analysis results.

Version 0.9.8 (March 23, 2025)


R/MultiSTAAR [Manual] [Tutorial]

An R package for performing Multi-trait variant-Set Test for Association using Annotation infoRmation (STAAR) procedure in whole-genome sequencing studies.

MultiSTAAR is a general framework that (1) leverages the correlation structure between multiple phenotypes to improve power of multi-trait analysis over single-trait analysis, and (2) incorporates both qualitative functional categories and quantitative complementary functional annotations using an omnibus multi-dimensional weighting scheme. MultiSTAAR accounts for population structure and relatedness, and is scalable for jointly analyzing large WGS studies of multiple correlated traits.

Version 0.9.7.1 (November 14, 2024)


Python/MACIE [Whole-Genome MACIE Scores Part 1, 2, 3, 4]

MACIE (Multi-dimensional Annotation Class Integrative Estimation) is an unsupervised multivariate mixed model framework to assess multi-dimensional functional impacts for both coding and non-coding variants in the human genome. MACIE integrates a variety of functional annotations, including protein function scores, evolutionary conservation scores, and epigenetic annotations from ENCODE and Roadmap Epigenomics, and estimates the joint posterior probabilities of each genetic variant being functional.


dx-toolkit/vcf2agds [vcf_trimmer] [vcf_merger] [vcf2gds] [favorannotator]

An all-in-one toolkit that efficiently converts WGS data from Variant Call Format (VCF) format to the annotated Genomic Data Structure (aGDS) format, which significantly reduces data size while supporting seamless genomic and functional data integration for comprehensive genetic analyses. The toolkit provides a four-step workflow, including VCF trimming, VCF merging, format conversion, and variant annotation.

Joint work with Dr. Andrew Wood and Dr. Zilin Li


R/SCANG [Manual] [Tutorial]

An R package for performing SCAN the Genome (SCANG) procedure in whole genome sequencing studies.

SCANG is a flexible and computationally efficient scan statistic procedure that uses the p-value of a variant set-based test as a scan statistic of each moving window, to detect rare variant association regions for both continuous and dichotomous traits.

Joint work with Dr. Zilin Li