Akbari A, Vitti JJ, Iranmehr A, Bakhtiari M, Sabeti PC, Mirarab S, Bafna V. Identifying the favored mutation in a positive selective sweep. Nat Methods. 2018 Feb 19. doi: 10.1038/nmeth.4606. [Epub ahead of print]
Most approaches that capture signatures of selective sweeps in population genomics data do not identify the specific mutation favored by selection. We present iSAFE (for "integrated selection of allele favored by evolution"), a method that enables researchers to accurately pinpoint the favored mutation in a large region (∼5 Mbp) by using a statistic derived solely from population genetics signals. iSAFE does not require knowledge of demography, the phenotype under selection, or functional annotations of mutations.
In a new study published in Nature Methods, scientists have developed a new algorithm that allows for the prediction of mutations favored by natural selection in large regions of the human genome. What does this new study mean for treatment options for genetic disorders?
Researchers needed to study the sequenced genome of a population size of 1000 individuals, so they turned to computational techniques to help perform this project. Researchers created an algorithm entitled iSAFE, which is able to analyze a certain region of the genome and determine which mutation is favored by which selection. Previous studies have been able to detect which regions of the human genome are evolving under which selection pressure, but have not been able to shed light on the specific mutation that responds to that particular selective pressure. This algorithm however, does not need to know the function of the genomic region it is analyzing nor does it require any demographic information since it works by reading population genetic signals imprinted on the genomes of the sampled individuals to identify the mutation. During nature selection, neighboring mutations can essentially “hitchhike” with a mutation that is under positive selection causing a loss in genetic diversity near that mutation. iSAFE is able to exploit the signals of neighboring mutations in order to pinpoint the location of the favored mutation. The algorithm is shedding light on the possibility of understand genetic disorders and possibly pinpoint underlying causes of those disorders; hopefully, paving the way to potential therapeutic targets.
A team of scientists has developed an algorithm that can accurately pinpoint, in large regions of the human genome, mutations favored by natural selection. The finding provides deeper insight into how evolution works, and ultimately could lead to better treatments for genetic disorders. For example, adaptation to chronic hypoxia at high altitude can suggest targets for cardiovascular and other ischemic diseases.
The sequenced genome of a single individual yields about half a terabyte of data of information—that's about as much information as you'll find on 106 DVDs. A population sample of size 1000 individuals contains 1000 times as much information. So to examine such a massive amount of data, researchers turned to computational techniques.
"Computer science and data science are playing a significant role to better understand the code of life and uncover the hidden patterns in our genome," said Ali Akbari, the paper's first author and a Ph.D. student in electrical and computer engineering at the University of California San Diego. "We are analyzing massively large sets of human genomic data to ultimately improve our understanding of genetic basis of diseases."
Researchers detail the algorithm, dubbed iSAFE, in the Feb. 19 issue of Nature Methods.
Many existing genomic analysis approaches can detect which regions of the human genome are evolving under selection pressure. Often, these regions are large, covering millions of base-pairs and do not shed light on the specific mutations that are responding to the selection pressure. iSAFE doesn't need to know the function of the genomic region it is analyzing or any demographic information for the human population it belongs to. Instead, the researchers used population genetic signals imprinted in the genomes of the sampled individuals and machine learning techniques to reliably identify the mutation favored by selection.
In natural selection, neighboring mutations 'hitchhike' with the mutation that is under positive selection, leading to a loss of genetic diversity near the favored mutation. iSAFE exploits signals in the neighboring sequences, the so-called "shoulder regions" to pinpoint the favored mutation.
"Finding the favored mutation among tens of thousands of other, hitchhiking, mutations was like a needle in a haystack problem," said Akbari, who works in the research group of computer science professor Vineet Bafna at the Jacobs School of Engineering at UC San Diego.
To test the algorithm, researchers ran iSAFE on regions of the genome that are home to known favored mutations. The algorithm ranked the correct mutation as the top one out of more than 21,000 possibilities in 69 percent of cases, as opposed to state of the art methods, which only did this in 10 percent of cases.
The algorithm also identified a host of previously unknown mutations, including five that involve genes related to pigmentation. In these cases, iSAFE identified identical mutations in multiple non-African populations. This suggests an early response to the onset of selection as humans migrated out of Africa.