CALLING IT WHAT IT IS . THESAURI IN THE FLANDERS HERITAGE AGENCY : HISTORY , IMPORTANCE , USE AND TECHNOLOGICAL ADVANCES .

Heritage organizations in Flanders started using thesauri fairly recently compared to other countries. This paper starts with examining the historical use of thesauri and controlled vocabularies in computer systems by the Flemish Government dealing with immovable cultural heritage. Their evolution from simple, flat, controlled lists to actual thesauri with scope notes, hierarchical and equivalence relations and links to other thesauri will be discussed. An explanation will be provided for the evolution in our approach to controlled vocabularies, and how they radically changed querying and the way data is indexed in our systems. Technical challenges inherent to complex thesauri and how to overcome them will be outlined. These issues being solved, thesauri have become an essential feature of the Flanders Heritage inventory management system. The number of vocabularies rose over the years and became an essential tool for integrating heritage from different disciplines. As a final improvement, thesauri went from being a core part of one application (the inventory management system) to forming an essential part of a new general resource oriented system architecture for Flanders Heritage influenced by Linked Data. For this purpose, a generic SKOS based editor was created. Due to the SKOS model being generic enough to be used outside of Flanders Heritage, the decision was made early on to develop this editor as an open source project called Atramhasis and share it with the larger heritage world.


INTRODUCTION
The digital erahardly a new concept anymorehas brought the possibility of gathering and disseminating more data than imagined.The more data can be gathered, the larger the need to create some kind of structure to find direction in this overwhelming amount of information.Structuring data in a field such as heritage is most commonly found by classifying, grouping, bringing it together in a logical sense.Thesauri are our best friend in this process.They not only allow structuring of data and uniform use of vocabularies, but also querying of data.
Historic England defines a thesaurus as follows: "A thesaurus is a structured wordlist used to standardise terminology.It is used to assist in indexing and retrieving information within databases that make use of the same terminology." 1A thesaurus consists of a tree-like structure with branches.This hierarchical build-up distinguishes a thesaurus from an unstructured list of terms.Each concept may contain multiple narrower concepts, which are all in a hierarchical (child) relationship to the broader (parent) concept.In addition to the hierarchical relationships, a thesaurus also provides the possibility to add equivalence and associative relationships.If the same concept is expressed by more than one term, one of these is selected as the preferred term.The others will have the status of non-preferred term and will have an equivalence relationship with the preferred term.Scope notes describe what the term means in the context of the thesaurus in question.(Ballew et al., 1999) In this paper, an overview of the origin and evolution of thesauri in the Flanders Heritage databases will be presented 1 http://thesaurus.historicengland.org.ukfirst.After that, we will focus on the benefits of thesauri: why we use them, what advantages they bring, and how they work.Additionally, lessons learned and choices made will be discussed, as well as the basic guidelines used when assigning terms to each heritage item.To conclude, a more technical chapter on the process towards open source software and future endeavours will be outlined.

A shy but intensive start
As previously discussed (Van Daele et al., 2016), around the mid-nineties the printed books containing the inventory of architectural heritage in Flanders were converted to a digital database.In the initial phase, this database contained an identical copy of the books that were scanned with optical character recognition software (OCR).Except some locationdata such as municipality no other structured data was available.Almost all searches had to be done full-text.It soon became clear that a lot of questions about heritage remained unanswered.
To solve this problem, a project was initiated in the early 2000s to assemble the first thesauri for Flanders heritage databases.These consisted of the following lists: typology, construction period and style.We cannot call them thesauri in the true sense of the word because at first, these thesauri were not directly linked to the database, resulting in typos, use of capital letters where they were not wanted and vice versa etc.They were a sort of index used to somewhat streamline input of data, but were not composed in a truly hierarchical manner, nor were there any equivalence or associative relationships.The terms were also not used to query the database as a real thesaurus allows you to do (see infra).A definition for each terma so-called scope notedid not exist at the time.However, these initial databases were a solid starting point for the thesauri used today.
The construction of these initial thesauri was not a quick solution to a big problem.On the contrary, the use of thesauri also implied having to allocate terms from the thesauri to each item in the database.In this initial phase, it meant reviewing 70000 items in the database.Furthermore, it soon became clear that using a predefined thesaurus to an already existing database was not as easy as it seemed: some concepts just could not be matched with a term in the thesaurus, which led to a constant process of rethinking the thesaurus.Also, discussion arose about what a certain concept in the thesaurus actually covered and how it should be interpreted, due to a lack of scope notes.

Add, adjust and mix
The next phase in the thesauri history involved the construction of a thesaurus for archaeological heritage.The Central Archaeological Inventory (CAI)an access database using simple drop down lists of keywords with each item linked to a GIS-layercontains all known archaeological sites in Flanders (Van Daele et al., 2004).The old database was to be transformed to a new database using online data input and all the advantages that come with it, including the possibility of integrating a thesaurus.The first step towards a thesaurus for archaeological heritage was already taken in 2003 with a thesaurus for archaeological periods (Slechten, 2004).In 2009 the project was reinitiated and completed.This resulted in five separate thesauri: archaeological periods, typology, cultures, archaeological objects and events (leading to the find of an archaeological site e.g.research, excavation, aerial photography).
In the meantime, Flanders Heritage decided to integrate its heritage databases in one heritage portal2 .The basis for this portal would be the inventory of architectural heritage.
Gradually other heritage databases would be added to the portal.With this in mind, the thesauri needed to be integrated, to ensure terminology was used consistently throughout the entire database and to enable searching all heritage datasets at once.
In Table 1.A schematic overview is given of the existing initial thesauri.Style and cultures are somewhat similar and instinctively led to one thesaurus, which was the easiest part of the integration process thus far.Following this first step was a process pertaining patience, learning and understanding.When working in a certain area of expertise one becomes accustomed to using certain terms for certain items.When talking with someone in a different area of expertise, those same terms may not necessarily have the same meaning.For example 'a castle' to an architectural heritage researcher would mean something in the line of a grand estate house on a large domain (Figure 1.), while a castle to an archaeologist means some kind of fortification, a defensible stronghold with a moat and ramparts (Figure 2.).Once the main thesauri were created in the heritage portal, adding the other datasets to the portal and expanding the thesauri was not quite as time consuming.First, the heritage parks and gardens, along with heritage trees and shrubbery were added.This involved the adding of a substantial set of terms to the typological thesaurus, but since most of these terms were significantly different and separate to the terms used in the architectural and archaeological thesauri, this process went quite smoothly.The adding of these datasets also led to a new 'thesaurus', the thesaurus of (plant and tree) species.This thesaurus is somewhat different from the other thesauri, since it is more of a scientific classification system or taxonomy that we have put in the same format as a thesaurus to accommodate querying and assigning of terms to items in the database.

Architectural thesauri Archaeological thesauri
The next dataset to be integrated was the dataset of heritage ships and boats, followed by the dataset of heritage landscapes.This last dataset again led to some of the discussions as we had seen with the integration of the architectural and archaeological thesauri.A lot of terms used by landscape experts meant something significantly different to an archaeologist or architectural heritage expert.To complicate the matter, there was a difference in the way a heritage landscape expert approaches and describes heritage compared to archeologists.
Where an archaeologist or an architectural heritage expert would describe one object at a time, a landscape heritage expert approaches a landscape as a whole, creating concepts containing architectural, archaeological as well as natural elements.The term 'houses' for example was not a term that could be used for describing a heritage landscape, because it lacked the historical and contextual component.The same problem occurred for the term 'abbeys' for example.An abbey in itself contains the buildings we commonly associate with an abbey: church, dormitory, refectory etc.However, from a heritage landscape point of view, the concept of an abbey also contains the landscape in which the abbey was built, the importance of nearby rivers and forests, as well as the location of nearby farms possibly founded by that particular abbey.These interesting in-depth discussions led to the expansion of the thesaurus with concepts that could be used for 'wholes' or 'ensembles', i.e. compositions of heritage types.For example, we added concepts for settlements and settlement patterns, as well as concepts for abbey and castle domains.
During the years 2013 to 2016, Flanders Heritage developed a new database for designated heritage (i.e.heritage that has been attributed legal consequences in order to guarantee preservation and heritage management) that was incorporated in the heritage portal.Some thesauri were developed for this project specifically, focusing on heritage values (reasons for designating heritage) and decree types (official documents used to designate heritage).
This resulted in following thesauri to this date (

Focus on Flanders
In building the thesauri, we selected terms and concepts relevant to heritage that is commonly found in Flanders.After all, it does us little good to define what a pyramid is, if there are none to be found in Flanders.
Although this might seem as though we have created an 'island' in terms of thesauri, we based about 95% of our thesauri on existing thesauri in order to maximize interoperability.The first thesauri used by Flanders Heritage were based almost entirely on the Art and Architecture Thesaurus (AAT) (Harpring, 2010).They were largely a selection of terms that were also used in the AAT.The concepts that were added for archaeological heritage were again largely based on the AAT, but also on the Thesaurus of Monument Types by English Heritage and for very specific archaeological terms we looked at the Dutch Archaeological Information System (ARCHIS) and some thesauri used by museums.
Over time, our thesaurus management system was adapted to allow linking concepts to other thesauri.This means we can link a term such as 'hotels' in our thesaurus to the concept ID in the AAT corresponding with hotels.In accordance with the SKOS specification (Miles & Bechhofer, 2009), we can also specify whether the link is an exact match, a broader match (almost the same concept, but our concept contains more) or a narrower match (almost the same, but our concept contains less) our just a related match (it has something to do with the other concept, but it is not the same).

THESAURI: A MEANS TO AN END
While building a thesaurus is interesting and rewarding work, it should not be forgotten that these thesauri are to be used in our information systems to help retrieve information.A thesaurus does this in several different ways.
First, because we have reduced the set of options for a certain attribute of our data (e.g.styles, typology, …), we make it clear to a user that certain values will not provide any search results.
As already noted, the Flanders Heritage thesaurus of heritage types does not include the concept of a pyramid.When a user tries to search for heritage that is a pyramid, they will notice that pyramid is not in the thesaurus and thus that querying for it will produce no result.
Second, our thesaurus provides related concepts.When a user looks up "forten" (fortresses) in the thesaurus, they will see there is a related concept called "citadellen" (citadels).Consequently, if they do not find what they were looking for under fortresses, they could also try citadels.
While the related concepts are interesting, they do require a lot of active participation of the user.They have to understand that they might want to search for a different concept and have the thesaurus guide them there.When it comes to broader and narrower relations, a lot more automation can be done to help the user in his quest for knowledge.For example, in our thesaurus we have a concept "duifhuizen" (houses for pigeons, both within a building or as a separate building).This concept has two narrower concepts detailing separate buildings: "duiventillen" (a construction that rests on top of a post) and "duiventorens" (a towerlike construction).When a user searches "duifhuizen", we want them to find heritage that has been indexed as "duifhuizen", "duiventillen" as well as "duiventorens".
Although this may seem like a simple feature, it is not a trivial matter on a technical level.Thesauri are tree-like data structures that can have infinite depth.Selecting all narrower concepts of a certain concept can be a computationally expensive operation when the concept is close to the root of the thesaurus.In a naive solution one could use a recursive algorithm.Select all narrower concepts of a concept, select all narrower concepts of those concepts, select all narrower concepts of those concepts, etc… until no more concepts are found.While this works, it offers rather poor performance.To solve this problem, we implemented a nested set model (Kamfonas, 1992).This model works very well for 'for read' access, but is less suited for often changing tree structures.As a compromise, the nested set model of our thesauri is not recalculated every time the thesaurus changes, but is only recalculated once every night.This means that changes to the thesaurus structure might take 24 hours before they have an effect upon user's searches in our systems.
In practice, the structure of our thesauri does not change very often and when it does, the thesaurus manager takes this limitation into account.
Apart from this technical problem, there is also a more semantic problem with composing the narrower relations of a certain concept (Alexiev et.al., 2015).In fact, there are not one but three types of hierarchical relations: broader generic (BTG), broader partitive (BTP) and broader instantial (BTI).Broader generic can be used to indicate that one concept is a kind of a second concept, e.g. a cathedral is a kind of church.Broader partitive can be used to indicate that one concept is a part of another concept, e.g. a transept is a part of a church.Broader instantial can be used to indicate that one concept is an instance of a second concept, e.g. the church of Our Lady in Bruges is instance of a church.While these different types of hierarchical relations make sense in isolation, they lose their transitivity when combined.E.g. if we state that a transept is a part of a church and a church is a kind of religious building, both statements make sense when separated, but we cannot state that a transept is a kind of religious building.However, we can state that a transept is a part of a religious building.In Flanders Heritage thesauri we have so far only characterized hierarchical relation as broader/narrower relations as defined in SKOS (Miles & Bechhofer, 2009), not the more specialized versions defined by ISO 25964 (De Smedt et. al., 2013).While this could pose problems when calculating the narrower relations of a certain concept, so far this has not really been the case due to the nature of our thesauri.To start with, we have no BTI relations in our thesauri.We generally feel that a concept that is unique, such as a certain building, landmark or historical person, should not be part of a controlled vocabulary, but rather part of a database of unique resources.Secondly, we have severely limited the number of BTP relations in our relations.
At the time of writing we only have one real example of these kinds of relations.Our thesaurus of heritage types contains a collection of parts of buildings and structures that has a broader relation to a collection of buildings and structures.When searching for all buildings and structures in our database of heritage objects, one will also find all heritage objects indexed with a part of a building or structure.Queries using our thesauri generally use a concept further down the hierarchy of the thesaurus, so it rarely happens that a user is confronted with this semantic ambiguity.So far, we have chosen to overcome the problems inherent in large thesauri by carefully composing relations and always seeing them as part of the bigger picture, not just as an immediate relation between two concepts.
Both semantic and technical issues had to be overcome in order for the searches to function properly, but in the end the efforts have paid off.Fairly complex queries can now be made from the dataset with very little effort on the user's part.

LESSONS LEARNED, CHOICES MADE
Making a thesaurus is not to be done overnight.In the following paragraphs an overview will be given of some of the lessons we learned and some of the choices we have made while assembling our thesauri.
A thesaurus should be created by someone with the appropriate knowledge and experience because constructing a thesaurus is a complicated task.The structure of the thesaurus needs to be considered thoroughly: how do you want to use it, what is the goal.The basic principles as mentioned throughout this paper should always be followed.At Flanders Heritage a thesaurus manager was appointed after the initial thesauri were created.New proposals for terms and concepts are welcomed, but always reviewed by the thesaurus manager.This allows maintaining a consistent line in decision making.It is also the job of the thesaurus manager to make note of the motivation of the decisions (change of hierarchies or preferred terms etc.) to avoid that after some time the decision is unknowingly reversed.
The thesauri created by Flanders Heritage are monohierarchic, meaning each concept is only used once in the thesaurus.For example, the term 'farms' could be interpreted as a place to live (in the branch 'houses') as well as a place to conduct an agricultural business (in the branch 'economical buildings and structures').We cannot however put the term in both branches.
According to what we feel is the predominant factor, we assign terms to a specific branch, in this case 'economical buildings and structures'.Deciding what branch to choose, involves thinking about what the concept means, and thus writing a scope note that clearly states what a farm is to be.This also implies the importance of composing scope notes along with composing a thesaurus.Making a thesaurus without thoroughly thinking about what each concept means and clearly defining it in scope notes, will lead to a lot of lost time and retracing of steps already made.
Clearly, the importance of a scientific approach to choosing, classifying and describing concepts cannot be overstated.The terms and concepts we chose for the Flanders Heritage thesauri are the result of research into the best terms to describe a certain phenomenon.This can be dependent on historical period, region, evolution in styles, specific context etc.A burial mound (tumulus) in the Roman era is different to a burial mound in the Neolithic era.An eighteenth century house in a small countryside town will be something significantly different to a house of the same period in a big city.While putting together (parts of) the thesauri experts are always involved in reviewing of scope notes and accurate use of vocabulary.
This brings us to the next point, namely the importance of a thesaurus linked to a database.It brings so many advantages: it makes data input easier, it allows querying of data (see infra) and it makes sure the thesaurus stays up to date, not only on user needs, but also on the meaning of a concept.Additionally, a thesaurus should be dynamic.A database is also dynamic, so changes are inevitable.Understanding of certain concepts also evolves throughout time.Possibly adding a new term to the thesaurus makes another term obsolete, or redefines the boundaries of the meaning of another concept.When users assign terms they will immediately notice if a term or concept is not right anymore or needs to change, or when a new definition or concept is needed.User input will always improve the quality of your thesaurus.This does not mean however that a certain amount of stability is not wanted, to the contrary.It is the task of the thesaurus manager to keep in mind the total picture and make sure the thesaurus remains logical.New concepts should not be added on a whim, but should be weighed against other concepts: is it indeed a different concept, is it enough to provide a new alternative term, could we maybe make changes to the scope note so that the concept broadens and allows the incorporation of the item we want to assign the concept to etc.

ALLOCATING TERMS
Once the thesaurus is built, one needs to figure out how to actually allocate terms to items in the database.In the of Flanders Heritage this was mostly a case of trial and error.At first allocating terms was done too selectively: a maximum of one term was allocated from each thesaurus.For example an 18 th century farm could only have the concepts 'farms' and '18 th century'.It quickly became clear that this method did not do justice to the complexity of heritage.A farm can consist of different components and can have different construction periods.A farm as a concept can consist of an 18 th century farmhouse, but also have a stable going back to the 17 th century and maybe have a more recent addition of a bake house in the 19 th century and so on.These nuances were totally lost in the first method that was applied.
Next step was to allocate every term we possibly could: a church built in the 16 th century with minor alterations in the 17 th , 18 th , 19 th and 20 th century be allocated all these dates.Searching for 19 th century churches would also give this item as a result, even though it is essentially a 16 th century church.
Therefore, we decided to retrace our steps: what is our essential goal using a thesaurus?What do we want to find when we query the database and more significantly what do we not want to find?When we search for 20 th century heritage we want to find all heritage from this period, but we do not want to find each and every alteration in a building from the 20 th century.This pollutes our search results.So we came up with a set of guidelines that allows allocating terms in such a way that only the most relevant terms are assigned to an item.
These guidelines, though too extensive to describe in complete detail within the scope of this article, can be traced back to some basic rules: As a first and now evident rule, terms should be allocated according to their relevance and kept as 'clean' as possible.
When allocating terms we avoid using concepts that are already 'covered' by other allocated concepts.For example: a moatand-bailey castle will be allocated the concept 'moat-and-bailey castles' but not the concepts 'moats', 'baileys' and 'castles' which are also concepts in the thesaurus.Same goes for the example of the 16 th century church we already mentioned above.This item would only be allocated the concepts 'churches' and '16 th century'.We would not allocate concepts such as 'choirs' or 'church towers'.
If the text in a database item describes a certain part of the heritage item extensively, the term could be allocated because of its significance in this case.For example, a graveyard near a church obviously has a lot of tombstones and graves, so normally the concept 'graves' would not be allocated when the concept 'graveyards' is already allocated.When, however, the text really describes an extraordinary grave on this graveyard, we would consider allocating the concept.
As a rule terms are only allocated according to what is actually described in the text of a database item.If the text only states that the 18 th century house has a garden, but nothing else is stated about this garden, the concept 'gardens' will not be allocated.If however, the garden is described as a e.g. an English style garden with a nice pond and some specific trees, the concept will be allocated.
Another important guideline is to not to allocate dates relating to minimally significant alterations in buildings.Only when an alteration has led to a significant change or a completely new building, the date will be added.
As a final rule terms are mainly allocated according to the original function of a heritage item.For example a building that was originally built as a small school, but is now a normal house (without significant changes to lay out etc.) will only be allocated the concept 'schools', not 'houses'.

LEAVING THE NEST: ATRAMHASIS
Our vocabularies and the technology used to host them were very much created as a result of our Inventory of Immovable Heritage.As our organisation started evolving away from a research institute to a more typical government agency, the types of applications and databases in use changed as well (Van Daele et. al., in press).This change entailed rethinking our technical architecture.Where first there were a few, big systems, we now have a network of smaller systems and databases.Also, where our thesauri used to be an integral part of the Inventory, they have now moved to a standalone system3 .In moving to this standalone system, we also switched from a term based thesaurus to a concept based one, aligning ourselves with the SKOS model.This alignment to SKOS also brought with it a focus on Linked Data that was not previously present.All controlled vocabularies can be exported as RDF downloads and a triple pattern fragments server (Verborgh et. al., 2016) will soon be available.
We felt that a general purpose online SKOS editor was well suited to be developed as an open source project.Controlled vocabularies are not unique to our business processes and might be useful to other organisations, both in and out of the heritage field.The editor, written in Python with a Javascript frontend, was released as a project named Atramhasis.Not only the code was opened, but actually the entire project was run as an open source project with issues, milestones and releases on Github4 .To make customisations as easy and painless as possible, the software is highly adaptable to an organisation's needs.Out of the box, controlled vocabularies can be created, edited, searched and browsed.Editing takes advantage of other linked open data vocabularies since concepts can easily be imported from e.g. the Getty Vocabularies (Cobb, 2015) or the Historic England thesauri.Due to the software's modular architecture, it would be easy to write import capabilities for other vocabularies.Finally, a whole range of software libraries was written and released as open source to help in integrating vocabularies powered by Atramhasis in other Python projects.

CONCLUSIONS
During the last 15 years Flanders Heritage has known a lot of changes in the way we handle information.These changesinitialized by the need to digitalize and disseminate our dataled to different ways of working with data and different needs, but also created interesting possibilities.Thesauri are one of the things that really revolutionized our data management.Everyday simple questions that could not be answered previously now can be answered thanks to the use of thesauri.
Though we cannot underestimate the time and effort it takes to build a thesaurus and the need for guidelines and rules when working with thesauri, the advantages are numerous: data input gets easier, advanced querying of the database suddenly becomes possible, use of vocabularies and terminology is more uniform and exchanging information with other organizations using similar vocabularies is now conceivable.
Flanders Heritage continues to seek new ways in bringing our data to the public.Using open source software and open data in the case of thesauri adds to this evolution.

Figure 1 .
Figure 1.Castle as defined in architectural heritage (Castle Den Brandt, copyright Flanders Heritage).

Figure 2 .
Figure 2. Castle as defined in archaeological heritage (Feudal castle of Beersel, copyright Flanders Heritage).

Table 2 .
Overview thesauri today