29 August — by Dr. Ellen McRae Greytak
Hello! In this post, I'll give an update on some exciting results we have recently produced under the Compute Against Alzheimer's Disease (CAAD) project.
One of the first phenotypes we've chosen to examine is the change in the size of the entorhinal cortex (EC). The EC is one of the most significantly affected brain regions in Alzheimer's Disease (AD) patients. In fact, functional MRI (fMRI) studies have shown that the neural degeneration seen in AD begins in the lateral EC (yellow) before spreading to other regions of the brain (red). Thus, understanding which genes affect degeneration in this region and, especially, predicting which individuals are at risk of rapid deterioration in this region, is very important.
We are using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Scientists involved in ADNI measured the volume of the EC by analyzing MRI images taken each time a patient was seen. Because we are interested in the deterioration of the EC, we chose to use the 24-month change in EC volume as a target phenotype.
Our first task was to remove from the data any non-genetic effects that might confound our search for genes of interest. For example, if a phenotype tends to change with age, then two people with the same genes could show different values for that phenotype, which would significantly decrease our ability to find meaningful genetic associations. Thus, the effects of such factors must first be removed using a mathematical technique called multivariate regression. In this dataset, we found that the change in EC volume over 24 months (EC24) was significantly associated with Alzheimer's diagnostic status, ethnicity, and the protocol under which the data were collected. These three variables were regressed out, and the residuals were used as the phenotype for our search of the genome.
Rather than simply searching for individual genetic variants (single nucleotide polymorphisms, or SNPs) that associate with phenotype, we used Parabon's Crush-MDR evolutionary search algorithm to identify significant high-order interactions among SNPs. Using the CAAD network of compute nodes, we evaluated more than 600 million possible genetic interactions. Then, using the SNPs discovered in the highest-scoring interactions, we built a predictive model for EC24 that can be used to evaluate a new individual's risk of showing rapid decline in the EC. For this modeling, we used a slightly different phenotype, this time regressing out the effects of any variables that would not be known early in life (age at baseline, education level, marital status, diagnostic status, and study protocol), while including the other variables as covariates (sex, ethnicity, race, and APOE4 genotype) in the model. Accordingly, the resultant model could be used as a lifetime AD risk indicator for people of any sex, age, or race.
In this initial study, we were delighted to discover that this approach yielded an extremely powerful predictive model, explaining nearly 60% of the variance in this phenotype. By analyzing the contribution of each variable to the model, we discovered that the covariates were actually contributing very little, and most of the model's power truly came from the novel SNPs that were discovered with our unique data mining approach.
Stay tuned; we will be posting further results as they become available. We'd like to offer our continued thanks for your support. Please tell a friend about CAAD! This analysis is computationally complex, and we need all the computing power we can get!