Economic Botany Data Standard

Implementing the Economic Botany Data Standard

1. INTRODUCTION

This document describes one way that the Economic Botany Data Collection Standard (Cook 1995) has been implemented. It is aimed at software developers who want to see an example of how the published standard can be translated into a relational data structure. It assumes some familiarity with the standard itself, and follows the same terminology.

The implementation referred to here is that used for the Survey of Economic Plants for Arid and Semi-Arid Lands (SEPASAL). SEPASAL is a database of useful dryland plant species, and is content-managed within the Centre for Economic Botany at Kew.

2. THE DATA STANDARD

The first stage in the process was to create a data structure which could hold the data standard itself. Figure 1 is an Entity-Relationship diagram and shows the 6 tables used in SEPASAL for storing the use data, the data fields they contain, and the relationships between them. Fields, which make up the table primary key, are shown in bold. All of the relationships are 'one-to-many', with the 'many' end indicated by an infinity symbol (¥ ). Each table is described below.

2.1 Level1and2States

This table stores the level 1 and level 2 states (see Tables 1 and 2 in the published standard), together with the published four-digit codes which identify them.

L12Key: The 4-character code, e.g. '0105'
L12Text: The level 1 or 2 state, e.g. 'FOOD - Galls'

2.2 Descriptors

This table stores the level 3 descriptors, which are the categories of states which may be applied at level 3. See Tables 3,7,11,13,15,17,20,22,25,28,30,36 and 38 in the published standard. The descriptors that may be used vary depending on the level 1 or 2 state in question. It should be noted that some descriptors appear more than once in the standard, where they apply to more than one higher state - 'Plant Parts' is a good example of this.

Descriptors may also have a slightly varying title and varying content depending on the higher context. For instance, see these examples of the plant parts;

Level 1 or 2

Descriptor title

Content

0105 FOOD - Galls

Specific Plant Parts Used

Parts which further define galls

0106 FOOD - Stems

Specific Plant Parts Used

Parts which further define stems

0500 INVERTEBRATE FOOD

Plant Parts Used by Invertebrates

A selection of plant parts

0601 MATERIALS - Fibres

Plant Parts Used

All plant parts

However, the published standard records that there is a 'master' list of 'Plant Parts' (see table 6), which incorporates all of the states applied in the table above. This table stores these master lists, so that each descriptor is entered only once, and is named from the master list. It links to all of the level 3 states which make up the master list.

DescriptorKey: A key generated by the software and used to uniquely identify each descriptor/category
DescriptorText: Is the name of the descriptor, e.g. 'Plant Parts' or 'Food Types'

2.3 Level3States

This table stores all of the level 3 states, regardless of which descriptor they belong to. The lists of states in the published standard include up to 3 indentations. These are used to signify a 4-level hierarchy of broader through progressively narrower states. See the following example, based on part of the ''Preparations Used In' descriptor, table 8;

Broad State

---through---

---to---

Narrowest State

Code

beverages

09000000

Alcoholic beverages

09010000

Wines

09010100

Beers

09010200

Spirits

09010300

Non-alcoholic beverages

09020000

Potable water

09020100

Juices

09020200

Cordials/squashes

09020300

Infusions/tisanes

09020400

coffee substitutes

09020401

tea substitutes

09020402

Milk substitutes

09020500

We can see, for example, that 'beverages' are divided into 'alcoholic beverages' and 'non-alcoholic beverages', while 'alcoholic beverages' are further divided into 'wines', 'beers' or 'spirits', and so-on. A software implementation must preserve this hierarchy, and in addition will need to preserve the sequence of states since the order they appear in was perceived as significant. To do this, an 8 digit code has been adopted, which is shown to the right of the states. The code consists of 4 pairs of digits, each pair representing a level in the hierarchy. For example, the state 'beverages' is coded 09000000. This indicates that it is the ninth broad state (check table 8 to see!). The state 'alcoholic beverages' has a code of 09010000, and is thus the first narrower state under beverages, while 'non-alcoholic beverages' is 09020000, and so-on.

Note that the coding system applies within a descriptor. In the Level3States table, uniqueness is achieved by taking the descriptor key and the code together.

DescriptorKey: This records the key to the descriptor within which the state belongs.
L3Code: The state code, as described in the text above. For example, 09010100 or 09020500.
L3Text: This is the state itself. For example, 'wines' or 'tea substitutes'.

2.4 DescriptorAliases

Each descriptor is stored once in the Descriptors table, and links to all of the states on the master list. However, as noted above, not all of the level 3 states may be recorded for some level 1 or 2 states. In effect, in these situations only a subset of the master list will be valid. To deal with this situation, the concept of a descriptor alias was introduced. Several aliases may exist per descriptor, as many as are necessary to define the subsets that appear in the standard.

AliasKey: A key generated by the software and used to uniquely identify each alias.
AliasText: A descriptor name which conveys the context of the published standard, e.g. 'Specific Plant Parts' rather than 'Plant Parts' where appropriate.
DescriptorKey: The key to the appropriate descriptor record.

2.5 DescriptorsAvailable

This table links Level1and2States and DescriptorAliases. It records which aliases are applicable for a given level 1 or 2 state (or, to look at it another way, which level 1 or 2 states lead to a given alias).

L12Key: The key to the appropriate level 1 or 2 state.
AliasKey: The key to the appropriate alias.

2.6 Level3StatesAvailable

This table links Level3States and DescriptorAliases. It records which level 3 states are applicable for a given alias (or, to look at it another way, which aliases contain a given level 3 state).

DescriptorKey: The key to the appropriate level 3 descriptor.
L3Code: The key to the appropriate level 3 state.
AliasKey: The key to the appropriate alias.

3. RECORDING DATA

The next stage was to link this static representation of the data standard to taxon records, and to enable data recording. Figure 2 shows this happening, with the new tables created to hold these data outlined in a box. They are described below.

We assumed that data recording might occur at two levels. Firstly, and most simply, individual level 1 or 2 uses might be linked to taxa. Secondly, level 3 uses, grouped into 'phrases', might be added to these links. This concept of 'phrases' requires explanation - a phrase consists of one state from each available level 3 alias. Not all aliases which are available need be included, so a phrase could consist at minimum of a single level 3 state.

For example, you could record that taxon A is used for 'MATERIALS - Fibres', a level 2 use. This would require an entry into table Level1and2StatesLinkToTaxon. You could also amplify that statement by adding one or more level 3 phrases. For example, you could combine the states 'paper' from the 'Specific Material Types' descriptor, and 'leaves' from the Plant Parts' descriptor into a phrase. This means that fibres from the leaves of taxon A are used to make paper. It would require an entry into Level3Phrases and two entries into Level3StatesInPhrase.

3.1 Level1and2StatesLinkToTaxon

This allows a level 1 or 2 use to be linked to a taxon. This is the simplest method of recording use information, but does not of course allow for the richness of the level 3 descriptors.

The table or tables which store the taxon data are not shown in figure 2, but you should assume a 'one-to-many' link extending from the taxon table to field TaxonKey. This is the only avenue by which use data at any level is connected to the taxa it describes.

TaxonKey: The key of a linked taxon
L12Key: The key to the appropriate level 1 or 2 use

3.2 Level3Phrases

This table stores the level 3 'phrases'. It provides a holder for the individual level 3 states to connect to, and links them in turn to the appropriate level 1 or 2 / taxon statement.

TaxonKey: The key to the linked taxon record, as held in the Level1and2StatesLinkToTaxon table.
L12Key: The key of the appropriate level 1 or 2 use, as held in the Level1and2StatesLinkToTaxon table.
PhraseKey: A sequential computer-generated key to uniquely identify a level 3 phrase

3.3. Level3StatesInPhrase

This table stores a level 3 state, and links it to the phrase, which encompasses it.

PhraseKey: The key of the phrase record, as held in the Level3Phrase table.
DescriptorKey: The level 3 descriptor, as held in the Level3States table.
L3Code: The code for the level 3 state, as held in the Level3States table.