Ellen McRae Greytak, Ph.D.
Ellen McRae Greytak, Ph.D
Parabon NanoLabs

20 May — by Dr. Ellen McRae Greytak

Welcome! This is the first installation of the Compute Against Alzheimer's Disease blog. For this first post, I wanted to give you a bit of background on some of the work we're doing and why it's important.

Many previous genetic studies of Alzheimer's Disease (AD) have been stymied by the fact that diagnosis is very inconsistent, and the disease itself is very heterogeneous. This means that two patients who show the same symptoms may have very different underlying pathologies in their brains, while two people with similar pathologies may have very different outcomes. This can make it extremely difficult to detect the genes that predispose people to AD.

To address this, the Alzheimer's Disease Neuroimaging Initiative (ADNI, adni.loni.usc.edu), has focused on measuring the precise changes in the brain that occur during normal aging and in AD, known as "biomarkers" of disease. These measures are more sensitive to disease changes than a single diagnosis, which is why we selected the ADNI dataset for our initial investigations. ADNI's goal is to identify early physiological signs of AD and help develop treatments so that patients can be diagnosed and treated earlier in the progression of the disease. Each of the hundreds of patients has not only been diagnosed as cognitively normal (CN), significant memory concern (SMC), mild cognitive impairment (MCI), or AD, but they have also been intensively studied using cognitive testing, MRI, PET imaging, and cerebrospinal fluid (CSF) measurements. Many of these evaluations have been performed at multiple time points as well, allowing researchers to also study rates of change.

The image below, from the ADNI website, shows how 5 indicators of disease progression change over time. First, amyloid beta (Aβ) begins to accumulate in plaques the brain, which can be detected in the CSF and using PET imaging. Second, tau proteins begin to accumulate inside neurons, resulting in synaptic dysfunction, which can also be measured in the CSF and using FDG-PET, which measures the uptake of glucose by neurons. Third, the brain begins to atrophy as neurons are lost, which can be seen as decreased volume or density in MRI scans. Finally, memory begins to decline, followed by more general cognitive decline, as measured in cognitive testing, at which point a diagnosis can be made.

All of the ADNI subjects provided DNA samples, which have been genotyped at hundreds of thousands of sites on the genome where people tend to differ from one another. These single nucleotide polymorphisms (SNPs) are sites where I might have a DNA base of A (adenine), for example, while you have a G (guanine) and this difference may "code" for differences in our appearance or health profile. Using this vast amount of data, researchers can study which of those SNPs are responsible for the differences we see between people, such as predisposition to biomarker changes and disease. This is called a genome-wide association study (GWAS), and it involves looking at every one of those hundreds of thousands of SNPs to find those that have the strongest statistical association with a particular trait or health outcome.

Parabon has launched the Compute Against Alzheimer's Disease campaign in order to take this type of work even further. Rather than looking at how single SNPs associate, we are interested in how combinations of SNPs might be responsible for producing the physical changes we see in AD. Significantly associated combinations are called epistatic interactions. The problem with this kind of work is the sheer number of possible combinations among SNPs. With 500,000 SNPs, a typical number for GWAS, there are nearly 125 billion possible pairs of SNPs and 2 x 1016 possible three-way combinations, and the truly exciting interactions may contain even more SNPs. Exhaustively searching a space this large is not currently possible on a human timescale, so previous researchers have had to either limit their searches to only pairwise interactions or select a small subset of SNPs as candidates for interactions. Both of these approaches are likely to miss a large proportion of the potentially interesting interactions.

Parabon has developed a new approach to interaction analysis. Our software, Parabon Crush, uses an evolutionary search algorithm to intelligently explore the space of possible SNP combinations, "evolving" toward the most significant interactions. Crush begins by distributing populations of random combinations of SNPs to as many compute nodes as are available. Each SNP combination is scored for its association with the trait under study, and the high-scoring combinations are randomly mutated, with SNPs added, subtracted, or swapped out, to generate a new population, and the process then repeats. This approach does not reduce the total number of possible combinations to be considered, but by intelligently evolving the search, we can optimally explore the space while ignoring those regions with little potential for interesting results.

The goal of this data mining work is to identify new SNPs that are significantly involved in predisposing people to AD and thus enhance our understanding of this devastating disease. Every compute node that contributes power to Compute Against Alzheimer's Disease allows us to explore ever more interactions and discover new significant results. We have already uncovered some interesting SNP interactions that have not been reported in the literature, and I will be reporting on them in upcoming blog posts.