GrassBase - The Online World Grass Flora
W.D. Clayton, M.S. Vorontsova, K.T. Harman & H. Williamson
© Copyright The Board of Trustees, Royal Botanic Gardens, Kew.
Use of DELTA files
A DELTA format database has many applications, but the same character set will not do all jobs equally well. It is therefore important to decide on priorities before designing the set.
Data retrieval / Queries
The database provides an open ended store for taxon descriptions.and observed characters. This enables querying of the distribution of the varous characters between taxa for further data analysis. This however requires consitency in terminology and homology, with the characters broken down into two–state or ordered characters if possible. This requirement can however get out of hand unless some destinction is drawn between significant characters and descriptive embellishment which would be better coded as simple comments.
While a data matrix of characters and states is good for analysis, it is less successful at conveying a clear picture of the overall structure of a taxon. A traditional description does this better, and morphological information can be presented coherently by imitating its concise narrative style.
Careful wording and a homogeneous taxon set will produce very good, though somewhat bland and long–winded descriptions. Heterogenous taxon sets inhibit fluency because the wording has to make sense over a wide range of structures and withstand the effect of missing characters in the sequence of phrases. Consequently using DELTA for direct output to a typesetter is often unsatisfactory. It is better to aim at a consistent, strictly comparative, factual foundation for subsequent word–processing.
A multi–access identification key is invaluable in comparison to traditional binary keys for numerous reasons (see http://delta–intey.com for a discussion of this). Traditional descriptions should contain all necessary diagnostic information, and a database simulating them can be used for identification. The problem is to distinguish reliably diagnostic characters from the spurious differences that are difficult to eliminate from subjective or incompletely recorded variable characters. An experienced taxonomist has the background knowledge to counter these traps, but a definitive solution is akin to key–writing and requires a separate character set.
An open–ended database intended for descriptions is not particularly amenable to the suppression of spurious differences. Nor does it allow any leeway for anticipating a user's misinterpretation of features as it is being based on a critical interpretation of homology. Keys, therefore, are best written for a closed group of taxa, whose diagnostic characters can be fully enumerated and represented in a special character subset.
Multistate and numerical characters must first be converted to two states, and weights allotted according to their reliability. The DELTA program KEY may then be used to find the shortest key and to experiment with different character conversions and weights. In practice the optimum settings often vary in different parts of the key, requiring a cumbersome process of piecemeal construction.
In a large database key writing involves a great deal of preliminary manual processing to determine the best settings; in fact the key must almost be written in order to do so, leading to a catch–22 situation. An alternative approach is to write the key manually, using INTKEY to explore the efficiency of proposed dichotomies and keep track of exceptions. In fact it is the only way to implement two other desirable features of keys: a quasi–taxonomic structure which the experienced user can easily learn to short–cut; and bringing critical groups out together.
Distance and cladistic matrices
Descriptive and discriminatory characters are rather erratically related to taxonomic or phylogenetic distance. A different character set is required for anything more than a pilot study.