The pursuit for a high quality genome begins with this rare bird
The flightless kakapo of New Zealand is in trouble.
The world’s heaviest parrot—representing one of the most ancestral branches of the parrot family tree—is nearly extinct, with barely 200 adults plodding the underbrush of four small islands. Whether the last of the kakapos have the genetic resilience to survive has long been unknown, and a question that only high-quality genomic analysis could answer.
But a high-quality genome assembly does not exist for the kakapo—nor for most of the 70,000 vertebrate species alive today. As a result, questions abound about how best to prevent the extinction of species like flightless kakapos and adorable vaquita dolphins.
Answers may come from the Vertebrate Genomes Project, which aims to generate high-quality reference genomes for every extant vertebrate species. In a flagship study in the journal Nature, the team presents methods and principles for sequencing and assembling high-quality reference genomes.
The team has applied this approach and principles to produce 16 high-quality reference genomes, one of which was the endangered kakapo, to help reveal if it is hardy enough to rebuild its population. The researchers found that extremely small populations of the endangered kakapo and vaquita have been able to survive their low numbers in the past since the last ice age over 10,000 years ago, by purging deleterious mutations that cause disease from inbreeding.
As long as humans do not kill off more of the last remaining animals, findings from the high-quality reference genomes give hope that these species could survive even with less than 100 individuals each.
“We call it the ‘kitchen sink approach’—combining tools from several biotech companies to make this one high-quality genome assembly pipeline,” says Rockefeller University’s Erich D. Jarvis, chair of the Vertebrate Genomes Project. “Endangered species were the first to benefit from the new technology because, even though conservation is not my area of research, I felt it was a moral duty.”
Genomes full of errors
High-quality reference genomes only exist for the celebrities of laboratory science—mice, fruit flies, zebrafish, and, of course, humans. For less popular species, there is often no reference genome or, perhaps worse, messy genomes stitched together from sequences obtained via quick and dirty methods. Compared to the new VGP genomes, up to 60% of the genes in such genomes have missing sequences, are entirely missing, or incorrectly assembled, the researchers found. It can take years to untangle the thousands of assembly errors per species.
Many false gene duplications were found, most caused by algorithms that do not properly separate out maternal and paternal chromosome sequences and instead interpret them as two separate sister genes. “We have thousands of genes in the literature that are false duplications. The genes are not actually there!” Jarvis says. “It is unconscionable to be working with some of these genomes.”
The Vertebrate Genomes Project arose from the frustrations of hundreds of scientists working in its parent organization, the Genome 10K consortium, whose mission was to generate genome assemblies of 10,000 vertebrate species. The initial genome assemblies that the G10K and other groups generated were based on short 35 to 200 base pair reads, but these assemblies were highly incomplete. The VGP goal is to build a library of error-free reference genomes for all vertebrate species, which researchers and conservationists will be able to use readily, without dedicating months or years to fixing individual genes.
“We said, let’s do some hard work on the front end, so that we can get high quality data on the back end,” Jarvis says.
Vertebrate Genomes Project rollout
Many companies approached the Vertebrate Genomes Project, promising a single sequencing technology that would solve every problem with messy reference genomes. The Vertebrate Genomes Project assembly team tested each method on a single hummingbird, chosen both for its relatively small genome and because of Jarvis’s research interests in vocal learning among bird species (“two birds with one stone,” he quips). But every technology fell short. “None had all of the necessary components to make a high-quality assembly,” Jarvis says. “So we combined many tools into one pipeline.”
Their approach works. Organizations including the Earth Biogenome Project, the Darwin Tree of Life Project, and the New Zealand Genome Sequencing Project are already using the most advance version of the novel pipeline. Reference genomes that once took years to generate are now rolling out in weeks and months—all without the false duplications and other errors endemic to previous assemblies.
Scientists are already using the new data to study genes that render bats immune to COVID-19, and question long-standing conventions in basic science, such as whether there are meaningful differences among oxytocin and its receptors found in humans, birds, reptiles, and fish.
All told, 20 studies and 25 high-quality vertebrate genomes accompany the rollout of the novel pipeline. “The first high-quality genomes that we sequenced taught us so much about the technology and the biology that we decided to publish in these initial papers,” Jarvis says. But plenty of work still lies ahead. “The next step is to sequence all 1,000 vertebrate genera, and then all 10,000 vertebrate families, and eventually every single vertebrate species.”
Source: Rockefeller University
Original Study DOI: 10.1038/s41586-021-03451-0
Reprinted with permission of Futurity. Read the original article.