Economic Botany Data Standard
Implementing the Economic Botany Data Standard
This document describes one way that the Economic Botany Data Collection Standard (Cook 1995) has been implemented. It is aimed at software developers who want to see an example of how the published standard can be translated into a relational data structure. It assumes some familiarity with the standard itself, and follows the same terminology.
The implementation referred to here is that used for the Survey of Economic Plants for Arid and Semi-Arid Lands (SEPASAL). SEPASAL is a database of useful dryland plant species, and is content-managed within the Centre for Economic Botany at Kew.
2. THE DATA STANDARD
The first stage in the process was to create a data structure which could hold the data standard itself. Figure 1 is an Entity-Relationship diagram and shows the 6 tables used in SEPASAL for storing the use data, the data fields they contain, and the relationships between them. Fields, which make up the table primary key, are shown in bold. All of the relationships are 'one-to-many', with the 'many' end indicated by an infinity symbol (¥ ). Each table is described below.
This table stores the level 1 and level 2 states (see Tables 1 and 2 in the published standard), together with the published four-digit codes which identify them.
L12Key: The 4-character code, e.g. '0105'
This table stores the level 3 descriptors, which are the categories of states which may be applied at level 3. See Tables 3,7,11,13,15,17,20,22,25,28,30,36 and 38 in the published standard. The descriptors that may be used vary depending on the level 1 or 2 state in question. It should be noted that some descriptors appear more than once in the standard, where they apply to more than one higher state - 'Plant Parts' is a good example of this.
Descriptors may also have a slightly varying title and varying content depending on the higher context. For instance, see these examples of the plant parts;
However, the published standard records that there is a 'master' list of 'Plant Parts' (see table 6), which incorporates all of the states applied in the table above. This table stores these master lists, so that each descriptor is entered only once, and is named from the master list. It links to all of the level 3 states which make up the master list.
DescriptorKey: A key generated by the software and used to uniquely identify each descriptor/category
This table stores all of the level 3 states, regardless of which descriptor they belong to. The lists of states in the published standard include up to 3 indentations. These are used to signify a 4-level hierarchy of broader through progressively narrower states. See the following example, based on part of the ''Preparations Used In' descriptor, table 8;
We can see, for example, that 'beverages' are divided into 'alcoholic beverages' and 'non-alcoholic beverages', while 'alcoholic beverages' are further divided into 'wines', 'beers' or 'spirits', and so-on. A software implementation must preserve this hierarchy, and in addition will need to preserve the sequence of states since the order they appear in was perceived as significant. To do this, an 8 digit code has been adopted, which is shown to the right of the states. The code consists of 4 pairs of digits, each pair representing a level in the hierarchy. For example, the state 'beverages' is coded 09000000. This indicates that it is the ninth broad state (check table 8 to see!). The state 'alcoholic beverages' has a code of 09010000, and is thus the first narrower state under beverages, while 'non-alcoholic beverages' is 09020000, and so-on.
Note that the coding system applies within a descriptor. In the Level3States table, uniqueness is achieved by taking the descriptor key and the code together.
DescriptorKey: This records the key to the descriptor within which the state belongs.
Each descriptor is stored once in the Descriptors table, and links to all of the states on the master list. However, as noted above, not all of the level 3 states may be recorded for some level 1 or 2 states. In effect, in these situations only a subset of the master list will be valid. To deal with this situation, the concept of a descriptor alias was introduced. Several aliases may exist per descriptor, as many as are necessary to define the subsets that appear in the standard.
AliasKey: A key generated by the software and used to uniquely identify each alias.
This table links Level1and2States and DescriptorAliases. It records which aliases are applicable for a given level 1 or 2 state (or, to look at it another way, which level 1 or 2 states lead to a given alias).
L12Key: The key to the appropriate level 1 or 2 state.
This table links Level3States and DescriptorAliases. It records which level 3 states are applicable for a given alias (or, to look at it another way, which aliases contain a given level 3 state).
DescriptorKey: The key to the appropriate level 3 descriptor.
3. RECORDING DATA
The next stage was to link this static representation of the data standard to taxon records, and to enable data recording. Figure 2 shows this happening, with the new tables created to hold these data outlined in a box. They are described below.
We assumed that data recording might occur at two levels. Firstly, and most simply, individual level 1 or 2 uses might be linked to taxa. Secondly, level 3 uses, grouped into 'phrases', might be added to these links. This concept of 'phrases' requires explanation - a phrase consists of one state from each available level 3 alias. Not all aliases which are available need be included, so a phrase could consist at minimum of a single level 3 state.
For example, you could record that taxon A is used for 'MATERIALS - Fibres', a level 2 use. This would require an entry into table Level1and2StatesLinkToTaxon. You could also amplify that statement by adding one or more level 3 phrases. For example, you could combine the states 'paper' from the 'Specific Material Types' descriptor, and 'leaves' from the Plant Parts' descriptor into a phrase. This means that fibres from the leaves of taxon A are used to make paper. It would require an entry into Level3Phrases and two entries into Level3StatesInPhrase.
This allows a level 1 or 2 use to be linked to a taxon. This is the simplest method of recording use information, but does not of course allow for the richness of the level 3 descriptors.
The table or tables which store the taxon data are not shown in figure 2, but you should assume a 'one-to-many' link extending from the taxon table to field TaxonKey. This is the only avenue by which use data at any level is connected to the taxa it describes.
TaxonKey: The key of a linked taxon
This table stores the level 3 'phrases'. It provides a holder for the individual level 3 states to connect to, and links them in turn to the appropriate level 1 or 2 / taxon statement.
TaxonKey: The key to the linked taxon record, as held in the Level1and2StatesLinkToTaxon table.
This table stores a level 3 state, and links it to the phrase, which encompasses it.
PhraseKey: The key of the phrase record, as held in the Level3Phrase table.