DNA barcoding

Project Rationale

Although the mitochondrial gene region, CO1 ( cox1 ), has already been used with considerable success across a range of animal groups and shows promise in at least some algal groups, it is characterized by relatively low rates of sequence divergence in land plants. Mitochondrial DNA in land plants also: undergoes rearrangements, exhibits incorporation of foreign genes and frequent transfer of some genes to the nuclear genome. It is therefore desirable to find an alternative region or, if necessary, regions from one of the other genomes that would be suitable as a barcode.

Desirable attributes of such a barcode include that it is;

• variable enough to allow identification of species but with a comparatively low level of intraspecific variation,
• universally amplified/sequenced with standardarized primers,
• technically simple to sequence (not plagued by regions of long, simple sequence repeats),
• short enough to sequence in one reaction with present technology,
• easily alignable (i.e., have many more single nucleotide polymorphisms than insertions/deletions),
• readily recoverable from herbarium samples and other degraded DNA samples (e.g. forensic material).

Regions such as the internal transcribed spacer regions of nuclear ribosomal DNA (ITS), although often highly variable in angiosperms at the generic and species level, are not a practical option in several groups owing to peculiarities in their evolution and the fact that divergent copies are often present within single individuals. Single or low-copy nuclear regions are technically difficult to sequence and hence often not recoverable from degraded DNA. Phase 1 of this project therefore aimed to assess a large number of plastid regions, both coding and non-coding, for their potential as a land plant DNA barcode, both taking into account the above attributes and striving to overcome some of the limitations inherent in regions such as ITS.

Advantages of plastid DNA include:

• monomorphy - separation of different alleles is not required
• high copy number - successful amplification is often possible even from highly degraded DNA
• highly diagnostic, in spite of a reputation as being relatively conserved it is contains highly variable regions

Phase 1

The strategy used for this phase was to:

• identify suitable loci (>100) on the basis of in silico screens using Nicotiana plastid sequence
• design universal primers (sets of 4 primers/locus)
• perform initial screen for universality
• screen twice for sequence variation using diverse species-pairs
• improve universality (e.g. use all primer combinations)
• use statistical modelling approaches to identify optimal primer sets

During the initial stages of this phase, it was found that non-coding plastid regions were either not sufficiently variable to distinguish sister species or if they were variable, they were also highly length-variable. Further attention was then paid to only coding regions.

At a meeting of the scientists on the project held in December 2005, the five most promising regions from the phase 1 investigation, based on variability and universality of primers, were selected for more intensive trial in phase 2. The regions and protocols for phase 2, as well as an update of progress during this phase of the project, are now available.

Feedback on successes and problems encountered when trialing any of these regions is welcome: barcoding@kew.org.