Ellen McRae Greytak, Ph.D.
Ellen McRae Greytak, Ph.D
Parabon NanoLabs

25 Aug — by Dr. Ellen McRae Greytak

It's been a while, but exciting things have been happening here at Compute Against Alzheimer's Disease (CAAD) headquarters. Using the preliminary data gathered from our initial analyses of the change in entorhinal cortex (EC) volume over time, we wrote a grant proposal for the National Institute on Aging (NIA)'s Small Business Innovation Research (SBIR) program. We were recently notified that our submission was accepted for funding to study the possibility of predicting Late-Onset Alzheimer's Disease (LOAD) endophenotypes!

In our Phase I project, we will have 6 months to demonstrate the feasibility of our larger goal, which is developing a lifetime predictive model for LOAD risk with sufficiently high accuracy that it can be used in the clinic. We will be moving on from the high-density genotype data that we've been using thus far and instead focus on whole-genome sequence (WGS) data from both the Alzheimer's Disease Neuroimaging Initiative (ADNI) and the Alzheimer's Disease Sequencing Project (ADSP).

While genotype data contains hundreds of thousands of previously identified single nucleotide polymorphisms (SNPs) that are common in the human population, WGS opens up a new frontier of possibilities. WGS data contains tens of million of SNPs, as well as insertion/deletion polymorphisms (indels), and structural variants (SVs). Additionally, WGS captures those SNPs that are very rare in the population, occurring in only a few people. These variants have the potential to contribute significantly to our understanding of disease, as they are more likely to have a large effect on a person's genotype.

However, these new types of variants (as well as their sheer number) require new analytical approaches. Thus, our software engineers have been hard at work implementing new capabilities to the Parabon Crush-MDR software package, which scores each variant or group of variants for its association with a phenotype of interest and uses an evolutionary search algorithm to dynamically evolve toward the optimal solution over millions of iterations.

The major update that we have made is to add multi-objective optimization (MO), which will allow us to simultaneously optimize a number of important scores, rather than focusing on a single score. MO yields a set of equally optimal results called the Pareto front, rather than a single best result. By defining our objectives intelligently, we will be able to discover new and important variants and variant interactions that could not have been found otherwise.

We will also be adding the ability to analyze rare variants (RVs), which are so infrequent that they cannot be analyzed with traditional regression statistics. For example, if a variant occurs only in two people, and both those people are affected by the disease, it is very difficult to determine whether this is due to a genuine association with disease, because the outcome is also highly likely to occur by chance. To surmount this problem, RVs can be binned according to their functional annotation (e.g., sum the number of RVs occurring in the coding region of each gene and then look for those genes that tend to have higher numbers of RVs in affected or unaffected subjects). This requires annotating each variant according to many different types of functions, as well as implementing statistical methods to analyze the results.

These new analyses will be combined with traditional analysis of common variants to determine those variants most associated with each endophenotype. We will then use our machine learning approaches to build those variants into predictive models. We're very excited about the potential of this new work. Thank you to everyone who continues to support CAAD as we work to make a difference in Alzheimer's Disease!