Skip to main content

You are here

Facebook icon
Pinterest icon
Twitter icon

Bringing Kew's Archive Alive

The Trading Consequences Team
2 May 2013

In this guest blog from the Trading Consequences team, find out how digital data produced by Kew's Directors' Correspondence team is being brought to life and can be used to visualise the British Empire's 19th Century trade networks

Trading Consequences

TRADING CONSEQUENCES is a Digging into Data project that analyses how automatic text mining of large quantities of historical text can assist environmental historians in their work of researching the effects of 19th century trade in the British Empire. The text mining technology recognises mentions of commodities, locations, diseases, disasters and dates in historical text. It also enriches this information, for example, by geo-referencing the extracted locations and identifying which commodity mentions are related to which location mentions. When the mined information is visualised in different ways we are able to provide interesting views of historical collections which so far only tend to be accessible by historians through key word search. 

System Architecture of Tranding Consequences

 System architecture of TRADING CONSEQUENCES 

Using Kew's Data

One of the collections we are processing in TRADING CONSEQUENCES is the Directors’ Correspondence Collection from the Archives at Kew Gardens. It contains hand-written, scientific letters and memoranda received by Kew’s Directors and senior staff from the 1840s to 1928, as well as correspondence received by Sir William Jackson Hooker prior to 1841. It provides first hand accounts and observations on botany, ethnobotany, history, natural history, science and politics around the world. In Trading Consequences, we are working with letters specifically relevant to Africa, Asia and Latin America. We are not processing the letters themselves but the meta data attached to each document: particularly a written summary of the content of each piece of correspondence.

This collection contains meta files for more than 24,000 letters and is accessible via JSTOR Global Plants. Other historical text collections, which we process in TRADING CONSEQUENCES include the House of Commons Parliamentary Papers from ProQuest, the Early Canadiana Online data archive, Adam Matthew’s Confidential Print collections, a sub-part of the Foreign and Commonwealth Office Collection from JSTOR, and a number of books relevant to trading in the 19th century. 

Text Mining

The text mining is developed by computer scientists at the School of Informatics at the University at Edinburgh. We first convert the meta information from Excel into an in-house XML format, thus creating one XML file per letter. We treat the title and description of each letter as textual information and retain all other information, including creator (i.e. the author of the letter) and date of creation (i.e. when the letter was written) as meta information. Each file is then processed by a series of steps. At first the stream of text is automatically split into its words and sentences. Then several syntactic processing steps are carried out, for example to determine the lexical category of each word (noun for cinnamon, verb for imported, preposition for through, adjective for fresh etc.) or to determine the canonical form of each word (e.g. export for exported or exports). Subsequently, we extract all commodity, location, date, disease and disaster mentions from the text. This is done in various ways, depending on the type of entity mention. In the case of commodities, we use a manually created commodity ontology and combine it with an automated bootstrapping techniques to identify other commodity mentions in the text. We also geo-reference each extracted location mention with an adapted version of Edinburgh Geoparser by linking them with a latitude and longitude. Finally, we extract commodity-location relations whenever a commodity is associated in some way with a location. All this information is stored in the Trading Consequences database.

Visualising the data

The database allows us to query for all commodities that were associated with different locations as mentioned in the historical collections analysed. We can also search for a particular commodity with respect to dates or locations, or for all commodities mentioned in relation to a specific location. For the following analysis, we extracted all commodities mentioned in the Directors’ Correspondence Collection and identified a subset of frequently mentioned ones (rubber, palm, coffee, cotton, bamboo, Liberian coffee). For each commodity in this subset, we extracted all commodity-location relations along with the year of publication date of the letter they occur in and the latitude and longitude for each location. The result is a list of “year,commodity,location[lat,long]” triples which can be visualised on a timeline or map. We identified 360 triples for rubber, 276 for coffee, 176 for palm, 164 for cotton, 63 for Liberian coffee and 51 for bamboo. A further step counts the identical triples, allowing us to display the more frequent occurrences with larger symbols.

The following video shows all locations each of the six commodities is associated with in the Directors’ Correspondence Collection over time. The yellow dots represent all locations mentioned in this collection over time, irrespective of whether they are related to any commodity. These yellow dots provide an interesting mapping of the British Empire during the 19th century and show how the reach of Kew Gardens expanded well beyond the formal empire. Look at the particular interest in South America during the first few decades as an example. We know economic botanists helped identify and transfer numerous South American plants, such as cinchona and rubber, so they could be grown on British plantations in places like Sri Lanka (Ceylon). Visualising locations from 24,000 letters, however, provides new insights into the scale of this project. (It will look best if you expand the video) 

Bringing Kew's Archive alive from Jim Clifford on Vimeo.

Liberian Coffee

The second video focuses in on coffee and Liberian coffee. When coffee rust disease started to spread between coffee growing regions in the world during the second half of the 19th century, economic botanists worked to find alternative crops. In this video we see the letters mentioning Liberian coffee appear frequently after 1873, after the identification of this alternative type of coffee. While this example only confirms the history of coffee production we already know, it does demonstrate the potential of using text mining to explore large collections of documents. 

Two Coffees from Jim Clifford on Vimeo

Future developments

In the near future historians and interested members of the public will be able to explore the TRADING CONSEQUENCES database through a dynamic visualisation website. The following screenshot is a sneak preview for this website, which is currently being developed by visualisation experts at the University of St. Andrews. In TRADING CONSEQUENCES, we process a number of different historical collections. The visualisation shown in the image below is limited to the Kew Gardens’ Directors’ Correspondence Collection. The image shows a map with bubbles in locations associated with the commodity Liberian coffee. The Seychelles and Sri Lanka are the most significant locations for this commodity. A timeline with the distribution of relevant documents per decade is shown underneath the map.

Liberian coffee data from Trading Consequences

Locating Liberian coffee and related commodities in the Directors’ Correspondence Collection

Download larger image

Similarly to the information shown in the video, the commodity Liberian coffee appears around 1870. Any commodities related to Liberian Coffee, i.e. ones that appear in the same summary of the original letter, are listed on the righthand side of the page. The title of the the top 50 most relevant documents containing mentions of the commodity Liberian coffee are listed in order of relevance at the bottom of the screen. Each document title links back to the original images on JSTOR Global Plants.

- The Trading Consequences team: Bea Alex, Jim Clifford and Uta Hinrichs -    

 


 

Related links

 

Add comment

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
4 + 14 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.

Comments

24 September 2013
Comment: 
Thank you for your comment Janet. As you can imagine, the digitisation process has uncovered many 'small nuggets of information' within the Directors' Correspondence archive. It is a great pleasure for us to be able to contribute our digital data to the Trading Consequences Team: they have, indeed, developed an excellent data mining tool.
24 September 2013
Comment: 
Thank you both , I will pm Bea to continue the conversation. Looking at photos of the new library facilities I truly appreciate the opportunity to have seen it before the renovation. I hope the new surroundings will encourage more people to look at RBG Kew as a national treasure and understand the importance of digitisation projects.
24 September 2013
Comment: 
Dear Janet Wilford, Thank you very much for your interest in our post on the Trading Consequences project. Our project is made up of four key partners, the historians at York University in Toronto (Clifford and Coates), ourselves who are responsible for the text mining at the University of Edinburgh (Klein and Alex), the visualisation team at St.Andrews (Quigley and Hinrichs) and database specialists at EDINA (Reid and Osborne). We were really pleased that Kew Gardens' Digitisation team allowed us access to their data. Our web-based user interface will be launched at the end of this year. Our language processing technology primarily focusses on identifying commodities and relevant locations and dates but we can also extract person names. So I got curious about Charles Wilford and conducted a person name search in our output. The person name Wilford appears 48 times in the Letters of Correspondence dating between June 18 1857 and Aug 21 1861. It does sound like he was a troublesome character: "Veitch seems to think Wilford may be on board some of the ships in the harbour, but Robinson fears he is seldom fit for work or to be seen." There is mention in 1860 that he had gone to Japan: "In reply, he has heard that Wilford has gone to Japan either to buy horses or to purchase food for them." The letters are all accessible on Jstor Global Plants (http://plants.jstor.org). Best, Bea (balex at staffmail.ed.ac.uk)
24 September 2013
Comment: 
The idea of bringing Kew Archives alive is a favourite of mine. I am delighted that the digitisation project has made such dramatic progress in the way information can be shared. When I first visited the archives about five years ago I was a total neophyte with a particular interest in one of the last plant collectors Charles Wilford. It struck me after my second visit about three years ago that there was large political and trade subtext to what I was reading about in his letters and those about him. He was sent to China about 1857 and disappeared for a time eventually becoming much reviled for his poor efforts, and probably came home in disgrace. The period when he disappeared fascinates me and it strikes me that Jim and his team have hit on an ideal tool to visualise all the small nuggets of information available.

Browse by blog team