We will develop statistical methodology for rare variant analysis in whole genome sequencing (WGS) studies that takes the sparseness of the data into account and does not require any asymptotic approximations. We will reanalyze the Cure Alzheimer’s Fund WGS family study with the new methodology and distribute software implementation of the approaches via our Family-Based Association Test (FBAT) package.
We are seeking to develop new analysis tools for the Cure Alzheimer’s Fund whole genome sequencing (WGS) data of the National Institute of Mental Health family sample, as recent developments in statistics have provided new theoretical insights that enable the construction of much more powerful analysis approaches. The re-analysis of our existing dataset with the new approaches will provide important additional insights into the genetic architecture of Alzheimer’s disease. We will continue our work on the sex-specific analysis of the WGS data, which already has identified new disease loci that differentiate the disease risk for Alzheimer’s by sex. Our findings are currently under review by the journal Nature. We now will focus on decline phenotypes, analysis tools for such analyses and the influence of rare variants on the sex-specific genetic risk of Alzheimer’s disease. Furthermore, as now numerous AD loci are known, we will develop a polygenic risk model for AD that will allow us to identify patients at high risk of developing AD.
2015 to 2018
The availability of next generation sequencing data in large scale association studies for Alzheimer’s disease provides a unique research opportunity. The data contains the information that is required to identify causal disease susceptibility loci (DSL) for Alzheimer’s disease and many other mental health phenotypes and psychiatric diseases. In order to translate the wealth of information into DSL discovery for Alzheimer’s disease, powerful statistical methodology is required. So far, a large number of rare variant association tests have been proposed. However, they do not incorporate all the important information about the variants. So far, none of the existing approaches takes the physical location of the variant into account. Under the assumption that deleterious DSLs and protective DSLs cluster in different genomic regions, we will develop a general association analysis framework for Alzheimer’s disease that is built on spatial clustering approaches. The framework will be able to handle complex phenotypes, e.g. binary, quantitative, etc., and be applicable to different study designs, i.e. family-based studies and designs of unrelated subjects. If the DSLs cluster indeed, the increase of statistical power of the approach will be of practical relevance, enabling the discovery of DSLs. In the absence of DSL clustering, our approach will achieve similar power levels as existing methodology. Furthermore, in order to test larger genomic regions for association, we will develop network-based association methodology. The network-based approach will have sufficient power for larger genomic regions than existing approaches, and, at the same time, provide an intuitive understanding of the complex relationships between the variants that drive the association, fostering new biological insights. The approach can incorporate complex phenotypes and different design types. All the proposed methodology will be implemented in user-friendly software packages with existing user-communities, i.e. PBAT, NPBAT and R. We will test, validate and compare the proposed approaches with the existing methodology, using large scale simulation studies and by applications to the whole genome sequencing family study for Alzheimer’s disease from the Tanzi lab.