7 January 2020

Reading the DNA of life on Earth

Advances in genome sequencing have opened up new ways of understanding, using and conserving biodiversity. Kew is playing a key role in this work by participating in a global project to read the DNA of all complex forms of life on Earth, starting with 2,000 species in Britain.

Blue image of DNA strands

Genome sequencing is the process of determining the order of components that make up all the DNA in a living organism.

The technique has come on leaps and bounds since 1995, when Haemophilus influenzae - a common cause of pneumonia and other infections - became the first bacterium to have its genome sequenced.  

Since then, scientists have gone on to read the genomes from an increasingly wide range of organisms, including that of our own species, Homo sapiens, in 2003.  

Now, scientists worldwide have launched an ambitious project to sequence and catalogue the genetic code of all known species of complex life on Earth within the next 10 years.

The Earth Biogenome Project (EBP), of which Kew is a founding member, could have as profound an impact on the understanding of our natural world as the Human Genome Project has had on human biology.

Gloved hands holding frozen tube of DNA
DNA in Kew's Jodrell Labaratory © RBG Kew

Protecting species with genomics

So what could we do if we had complete genome assemblies for every species?

For a start, we would know exactly what is specific to each species. We would be able to identify those unique genomic features (including genome size, genes and their regulatory elements, and chromosomal structure) that are directly responsible for making each species distinct from others.

What's more, we could infer information from absence as well as presence – something that is not possible with incomplete assemblies.

In other words, we could literally observe the full record of the complex evolutionary history of both species and genes, as written in their DNA.

Once we have this treasure trove, we could also use a complete genome assembly as a reference framework to interpret less complete sequence data taken from large samples of natural populations.

And we could look, in any species, for correlations between the occurrence of genes and genetic variants with interesting characteristics, then statistically surmise the genetic determinants of each trait, potentially identifying new useful biomolecules and processes.

We could also look at the genomic repertoire of all the species interacting in an ecosystem.

All this means we could accurately assess the in situ diversity of threatened species, and ensure that any ex situ resources - like our Millennium Seed Bank  - contain sufficient molecular diversity to support their reintroduction and protect their futures.

Shelves of seeds in jars contained in the Millennium Seed Bank
Seeds in the Millennium Seed Bank vaults © Luis Salazar/RBG Kew

Technological advances

Until now, this has just been a pipe dream. But a step change in sequencing technology, driven by miniaturisation and the ability to run processes in parallel have slashed the cost and increased the speed of reading DNA.

Today the cost of deciphering a human genome is just one one-hundred-thousandth of the cost of the original human genome project.

These tremendous advances now make it possible to contemplate producing a high-quality genome sequence for every eukaryotic species (i.e. animals, plants and fungi) on Earth, particularly if supported by international cooperation.

So far just a few thousand eukaryotic species have a sequenced genome out of an estimated nine million species globally.

Many of the sequences that are available are still relatively incomplete and fragmented. As a result, scientists working on many species have to infer genomic information by similarity to better-studied species, or in some cases, have no access to genomic information at all.

Large genomes and many unknowns

There are still many challenges beside the overall size of the endeavor and the development of revised laboratory techniques to extract good quality DNA.

Some plants, for example, have extraordinarily large or repetitive genomes making them difficult and expensive to sequence and assemble. The Japanese flower Paris japonica has a genome ~50 times larger than that of humans.

Moreover, in many domains of life, we still don’t know what is out there: many species are not just un-sequenced, but undescribed. In the case of fungi, it is likely that the availability of more sequence data will challenge the notion of species as an appropriate way of describing biodiversity.

But once the EBP is completed, we will have a comprehensive description of biodiversity at the molecular level - a catalogue of what is found on our planet and an objective framework in which to assess our taxonomy of life.

Big white flower
Paris japonica has a genome ~50 times larger than that of humans © Alpsake/Wikimedia Commons.

Reading the genomes of species in the British Isles

Excitingly, the Wellcome Trust has recently awarded £9 million to a consortium of 12 institutions, including Kew, to begin work on the Darwin Tree of Life project - in effect the first phase of the EBP in the UK.

The aim is to determine the sequence of 2,000 species found in Britain and Ireland, out of an estimated 66,000.

One of Kew’s roles in the project is to identify material from plants and fungi to be sequenced, and to preserve portions of the sequenced material.

This is crucial as a DNA sequence is only meaningful in the context of the life-form it was extracted from. In maintaining this record, Kew’s scientific collections are acquiring a new purpose.  

In the future, all research across species is likely to be influenced by the availability of high-quality genomic information.

Just as one is foolish to go to the train station without consulting a timetable, so it may be in future that scientists will not begin to address a biological problem experimentally without first considering the genomic data.

Pink fungus
Kew will be identifying plants and fungi, like Russula torulosa (pictured), to sequence for the Darwin Tree of Life project © Lee Davies/RBG Kew.

Data for the future

Some scientists are still wary of 'mega-projects', like EBP, and are sceptical of science that is driven by data rather than hypotheses.

But with international cooperation to spread the cost, and the wealth of potential areas of study that the data will open up, I am confident that, in 50 years time, the scientific community will look back and thank us rather than accuse us of wasting money.

By then scientists will have learnt to take these data for granted; and use them routinely in their basic research, conservation and bioprospecting activities.

Pale white fungi growing on a rough tree

Sign up to the Kew Science newsletter

Want to stay up-to-date with our latest research news, events and opportunities?

Read & watch