1001 Genomes Plus
Cataloging genetic diversity within a single plant species, and building a computational framework to support its analysis and visualisation.
A genome is the entirety of the inheritable material present in a living organism. At its heart are molecules of DNA, which comprise sequences of components (known as nucleotides) that contain information in a manner analogous to sequences of letters in a sentence.
In recent decades, new technologies have been developed that can determine the entire sequence of nucleotides in a genome. Originally, these were used to produce one reference genome per species, but are increasingly now being applied to sequence many individuals, in order that we can understand the genetic causes of intra-specific diversity.
The 1001 Genomes project in the model plant species Arabidopsis thaliana was a ground-breaking example of this approach. However, due to the technologies used, the 1001 Genomes project could only reveal small differences between individual genomes, not large scale rearrangements or duplications. And while we have good computational models, and visualisation tools, that allow us to work with a single genome, it is less clear how to represent, and present, the genome of an entire species.
In this project, we will determine the sequence of ~250 specimens of Arabidopsis using the latest technologies that allow us to identify even large scale differences between genomes; and develop a computational environment, and visualisation tools, to facilitate the exploration and understanding of these data. Although we are starting with a well-defined model system, these tools should be applicable to any other species where similar sequence data have been generated.
- Reference quality genomes for 250 Arabiopsis specimens.
- A computational model allowing the analysis and annotation of the data set.
- A new visualisation tool, allowing scientists to explore the structural variation present in the species.