GEOSPATIAL WEB SERVICES FOR LIMNOLOGICAL DATA: A CASE STUDY OF SENSOR OBSERVATION SERVICE FOR ECOLOGICAL OBSERVATIONS

The present work aims at designing and implementing a spatial data infrastructure for storing and sharing ecological data through geospatial web services. As case study, we concentrated on limnological data coming from the drainage basin of Lake Maggiore in the Northern of Italy. In order to establish the infrastructure, we started with two basic questions: 1) What type of data is the ecological dataset? 2) Which are the geospatial web services standards most suitable to store and share ecological data? In this paper we describe the possibilities for sharing ecological data using geospatial web services and the difficulties that can be encountered in this task. In order to test actual technological solutions, we use real data of a limnological published study.We concluded that limnological data can be considered observational data, composed by biological (species) data and environmental data, and it can be modeled using Observation and Measurement (O&M) specification. With the actual web service implementation the geospatial web services that could potentially be used to publish limnological data are Sensor Observation Services (SOS) and Web Feature Services (WFS). SOS holds the essential components to represent time series observations, while WFS is a simple model that requires profiling. Both, SOS and WFS are not perfectly suitable to publish biological data, so other alternatives must be considered, as linked data..


INTRODUCTION
Nowadays, it has become more and more important to collect, share and access data of all kind, not only for research but also in the public field, as data sets can potentially provide a deeper understanding of both nature and society and open up many new possibilities of research (Science, 2011).For the understanding of environmental issues through data, globally there are initiatives such as the Shared Environmental Information System (SEIS); the Global Monitoring for Environment and Security now known as Copernicus (GMES); the Global Earth Observation System of Systems (GEOSS), or the Data Observation Network for Earth (DataONE) all of them aiming to concentrate and analyse data for the public administration and research purposes.
In Europe, the legal framework for the Infrastructure for Spatial Information in the European Community (INSPIRE) was created in 2007, and since then some projects have been developed ensuring its implementation, for instance NatureSDI and the Environmental Quality and Pressures Assessment Across Europe Project (EnvEurope) that was started by the Long Term Ecological Research European Network (Europe LTER).All these projects and initiatives are trying to fulfil the environmental community and policy makers need of interoperable infrastructures for environmental data sharing and reuse (Craglia et al., 2007;Donlon et al., 2012;Hennig et al.;Hebíek and Pillmann, 2009).In the near future these kinds of initiatives will not be voluntary but mandatory, and the challenges and opportunities that they represent are out there in the open, waiting to be discussed.
Even so, when it comes to ecological data, there are big problems of interpretation and analysis that pose particular difficulties for re-use and sharing (Zimmerman, 2008).
Instrumented data collection is relatively new, especially for ecological data.Effective data discovery is particularly problematic in ecology, where traditionally small, focused studies use largely data management solutions, often consisting of flat files or spreadsheets with minimal formal structure and little or no metadata documentation.(Jin and Lin, 2012).
For ecological data, the field of Ecoinformatics has been providing the informatics toolsincluding GIS technologiesfor data collection, storage and sharing; in such a way to understand, predict and confront the actual environmental issues (Boyd and Foody, 2011;Brunt et al., 2002;Dengler et al., 2011;Hale and Hollister, 2009;Michener and Jones, 2012).Some of these informatics tools includes models, workflows and web services for data sharing.This can be seen in the Ecosystem Location Visualization and Information System ELVIS (Parr et al., 2006), Kepler project (https://keplerproject.org/),The Global Biodiversity Information Facility Mapping and Analysis Portal Application (GBIF) that has developed a web GIS application to discover biodiversity data from global portals and then perform data analysis (Flemons et al., 2007); and the Research infrastructure for Biodiversity and Ecosystem Research (LifeWatch) which is setting up the LifeWatch-conformant Service instances based on publishing software (e.g.protocol interfaces, conversion models) (Frenzel et al., 2011).In particular, for the European ecologists involved in LTER network, EnvEurope LIFE+ Project was a good driver to understood the importance of metadata collection to discovery the data and web services -describe after -to store, publish, share and download the dataset (Oggioni et al., 2012;Kliment et al., 2013).All these projects have in common the development of web services for data interchange.
In general, web services are software systems designed to facilitate machine to machine interaction over a network (Sample, 2008).In the context of Spatial Data Infrastructures (SDI), services are used to manage, analyse and distribute geographical data, and therefore are called Geospatial Web Services (Zhao et al., 2007).The Open Spatial Consortium (OGC) has created web services standards (named OWS: OGC Web Services) for the exchange of different geospatial data typologies, in order to guarantee data interoperability worldwide.These services have become the main tools for all kinds of spatial data, especially in the environmental field.Despite the existence of these standards, some of the projects above mentioned are not using OWS to share the data, others use some beta implementations (GBIF) and others are in a phase of evaluation and implementation (LTER Community).The conclusion is that up to now in the ecological field, these services are not widely used.
The possible use of OGC standards for ecological data leads to the following questions: 1) which are the characteristics of an ecological dataset?2) which are the OCG web services standards most suitable to store and share ecological data?In this paper we describe the possibilities for ecological data to be shared using geospatial web services and the difficulties that can be encountered in this task.As a case study, we used the ecological data coming from the drainage basin of Lake Maggiore -site of Long Term Ecological Research European Network (Europe LTER) -in the North of Italy; that have been monitored continuously since 1960 and monthly since 1978 by Institute of Ecosystem Study (CNR-ISE).A lake was chosen as a case study because lakes have been considered simple ecosystems, always used by ecologists to test complex ecological theories.They are reasonably contained ecosystems in which data from water, atmosphere and soil are related (Forbes, 1925;Holling, 1973;Odum, 1983;Peters, 1991).When data is taken from lake ecosystems it is called limnological data, that represents an interesting case to test OGC services because of its biological (i.e.related to species) and long-time nature with also a water depth component.Some studies have been done to test the viability to share ecological data related to species (Best et al., 2007;Dubois et al., 2013;Frehner and Brändli, 2006;Wong et al., 2007) but they focus on species distribution or presence and do not enter into the details of modelling those data with the standards related to observations, as we will see in this paper.

LIMNOLOGICAL SPATIAL DATA DEFINITION AND CHARACTERISTICS
Limnological data are collected at locations (stations) on a lake with a latitude and longitude, at different depths.It is produced both by fieldwork sensors (insitu) and as a result of the analytical methods applied to the collected samples (exsitu).
All limnological data can be classify as qualitative or quantitative observation.An observation is "an action whose result is an estimate of the value of some property of the feature-of-interest (e.g.station), at a specific point in time, obtained using a specified procedure" (Observations and Measurements, 2011).
For the purposes of this work, ecological data is defined as the set of data that includes observations of different chemical and physical variables, and presence or abundance of organisms: therefore it is the combination of environmental and biological data; both kind of data in a lake can be collected in the same confined environments.Two important aspects characterize limnological data: usually limnologists measure biological and environmental variables in the same station, at the same water depth, and using the same water sample; often different depth could be sampling in a same station.
In order to facilitate the comprehension, data collected from a lake was divided in three domains: aquatic, atmospheric and terrestrial.The aquatic domain holds all the biological variables (hydrobiology), and the other domains contain data that influences such biological variables.The atmospheric domain contains data about meteorological conditions such as precipitation, solar radiation and wind velocity.The terrestrial domain, which consists in the land part of the lake basin and its subsurface, contains atmospheric deposition and paleolimnological studies data.The aquatic domain holds (1) a large variety of chemical variables related to water quality (e.g.conductivity, alkalinity, pH concentration of sulfate and nitrate; (2) physical variables such as lake level, discharge from tributary rivers or water temperature; (3) biological variables (species abundance, density, coverage, biomass).
Limnological data are time series data, meaning a sequence of data, typically measured at successive times spaced at uniform time intervals.We identified a general structure of limnological data, simplifying its heterogeneity, as it can be seen in figure .1.For biological data, in one locationthat normally is only one in the centre of the lake -in a time instant, in a day of sampling, a measure is taken of a specific species, at a particular depth.With limnological data definition, attributes and characteristics just described, we can now discuss the concept of data sharing through geospatial web services.

SPATIAL DATA INFRASTRUCTURES AND GEOSPATIAL WEB SERVICES
When it comes to data sharing, specifically spatial data, since the early 1990s the term Spatial Data Infrastructure (SDI) is often used to denote the relevant base collection of technologies, policies and institutional arrangements that facilitate the availability and access to spatial data.The SDI provides a basis for spatial data and metadata storage, discovery, publishing and use (Infrastructures, 2004).
SDIs are data and service networks, and networks depend on open standards to guarantee interoperability.For an SDI, standards can be partitioned into three parts: data (message encoding), interface (transport protocol and web services) and metadata (ontology), covering all aspects of interoperability (Zhao et al., 2007).At data level, the standards specify the message encoding and data formatting that are used for communication between web services and applications.At interface level, the standards define common interfaces for applications /web services and human users.At metadata level, a set of consensus metadata types and descriptions are associated with each web service or data.In this paper we focused only on data and services; as for metadata in the ecological domain, there is the Ecological Metadata Language EML.(http://knb.ecoinformatics.org/software/eml/).

Geo web services and data specifications
To understand which OWS are most suitable to store and share limnological data, we made a comparison between the main OWS: Web Map Service (WMS), Web Coverage Service (WCS), Web Processing Service (WPS), Web Feature Service (WFS) and Sensor Observation Service (SOS).A summary of this comparison can be seen in table 1.These services have different operations that retrieve specific responses -an image, a shape file or numerical data -related to data visualization or download, metadata discovery and service discovery.

Table. 1 Characteristics and functionalities of geospatial web services
1 its definition is entirely at the discretion of the particular WFS implementation that is describing its feature types.
According to the services comparison, OGC has 2 standards that could potentially be used to publish limnological data: SOS and WFS.In terms of use, WFS has spread more than SOS.Nonetheless, using the Google engine (e.g.inurl:service=SOS inurl:request=GetCapabilities -Kliment, 2013) now has been possible to found 864 different SOS services, 456 of them with reference to the aquatic environment.
In the case of SOS (Na, Priest, 2007), an Observation is modeled as an event which produces a result whose value is an estimate of a property of the observation feature of interest, in a particular time instant.An observation instance is classified by: (1) eventTime: Time period(s) for which observations may be requested; (2) featureOfInterest: Geographical region that contains the features that are the subject of the sensor observations; (3) observedProperty: Phenomena that are being sensed; and (4) procedure used: Specific sensor systems that report the observations.
On the opposite, WFS is based on a generic definition of a geographic feature that covers any real-world entity, using GML schemas to define the feature type (Bröring et al., 2012).Therefore, for interoperability purposes, WFS requires communities to agree on domain-specific GML application schemas.This means that the implementation of a SOS instead of a WFS does not require make or maintain schemas: SOS implementation is based on the Observation & measurement (O&M) specification (Bermudez, 2009).Common WFS services deliver geographic features, such as point (our main interest in limnological field), polygons or lines.These features can have associated properties, whose values could change in time, exactly like observations performed in a geographic location of interest (station).However, WFS encoding observations often provide the latest observation and not time series measurements.

SOS WEB SERVICE FOR LIMNOLOGICAL DATA
According to what was stated in paragraph above, we decided to make a study case using SOS.Comparing the general attribute structure of limnological data presented in figure 1, and the parameters of an observation instance in SOS we get the data structure mapping shown in figure 2. This contribution has been peer-reviewed.The double-blind peer-review was conducted on the basis of the full paper.doi:10.5194/isprsannals-II-4-9-2014SOS specification is made in such a way, that users can query information by time instant, meaning that each value of an observed property must be associated with a specific time instant, namely a one to one time -property relationship, to allow the service to retrieve a response.Figure 2 may imply that both biological and environmental data fit perfectly into the SOS specification; however, this doesn't apply to biological data, where the observed property (e.g.abundance, biomass, density or water depth) is measured for each species in one time instant, so there is a one to many, time -property relationship.
In other words, the value of an observed property can be as many as species are in the water at that moment.
In order to adapt biological data to SOS specification, and after the study of all the possible mapping options, we decided to set species + observed property together as one property, in such way that there are n sets of properties (Abundance or biomass of species x, Density of species x and Water Depth of species x), that change in each time instant, as it can be seen in table 2.
From the point of view of a RDBMS, this is not the best solution, taking into account that there is a great amount of species present in a lake ecosystem.Table 2. Mapping solution between biological data and OGC SOS specification

Parameters of an observation instance in SOS Main
With this mapping proposal, we implemented an SOS, as an example of data download to local facilities from Lake Maggiore LTER site, for typical limnological analysis purposes.
The user case was created with data from the study: Resource ratio and human impact: how diatom assemblages in Lake Maggiore responded to oligotrophication and climatic variability (Morabito et al., 2012), specifically phytoplankton's biomass and density observations, and records of transparency, silica, water temperature and phosphorus.From these data, we made some possible combinations (most specifically for the SOS GetObservation request) trying to consider different scenarios of researcher's data requests.

PROPOSALS FOR LIMNOLOGICAL DATA
To store and share data within the ecological community we identify three options, taking into account the difficulties encountered due to the characteristics and the attributes of limnological data: Option I see figure 3 consists in the use of SOS exclusively for environmental data and the creation a WFS only for biological data.For WFS it would be necessary to create one point feature for each species, or create just one point feature with an attribute table containing as many columns as species present (n sets of properties).With WFS it will be necessary to deal with the definition of domain-specific schemas on species data, to common understanding of data structure.
Due to the fact that the SOS and WFS will retrieve different responses, the first one is a series of observations and the second one is a point feature with an associated attribute table, it is necessary to integrate this data, using for example a Free Open Source GIS Software like QGIS, or GRASS.Limnologists commonly use R (http://www.r-project.org/) for statistical analysis of their data, and QGIS uses a plug in to connect data with R (http://www.ftools.ca/manageR/).Alternately, SOS data can be analyzed connecting directly SOS service to R software through SOS4R (http://www.nordholmen.net/sos4r/)which is an OGC's SOS Client for R ( Nüst, 2012).2, connecting the SOS service to R software through SOS4R.This option was the one implemented in the user case.
As it was mentioned before, in this case data management can be difficult if the amount of species data is large.(4) Include links to other related data (using their URIs) when publishing data on the Web.
Therefore, for option III, SOS is proposed to share limnological data as it is in option II; but also including linked data.This option could be useful for communities that already have SOS implemented and want to make data available using linked data.Another possibility could be to use SOS for environmental data, and Linked Data for biological data.This service allows creating a real connection between data, information and knowledge using a unique service that would have a more general approach than SOS.Some applications of Linked Data for ecological data have been done (GuanShuo et al., 2011) that demonstrate that this service can be used in this field, so a proposal for further research is test option III in a user case.This initiative has a more general approach than SOS, but both of them can perfectly interact.There is an implementation available of an OX RESTful SOS proxy that can provide Linked Sensor Data without any modifications to existing OGC services, independently of the server software (Janowicz et al., 2011).
In general, the recommended approach is to follow the SDI architecture.The typical three layers structure that includes database, services and client, is the one that better consents to create, discover and publish data and metadata within a scientific community.For the specific case of limnologist we suggest to use the structure in figure 6: For data storage PosgreSQL software can be used with the PosGIS extension; many SOS servers come with a data model schema ready to be filled.For data publishing, we recommend option III, using SOS4R and OX RESTful SOS.
Figure. 6. SDI schema for limnological data

CONCLUSIONS AND RECOMMENDATIONS
Limnological data includes observations of environmental variables (including chemical and physical variables) and the variation in the presence of organisms on a lake ecosystem.It is produced by fieldwork sensors (insitu) and as a result of the analytical methods applied to the collected samples (exsitu), indicating that this kind of data is a result of observations and/or measurements.
Of all the OGC standards exposed, the only ones that could be used to share limnological data are SOS and WFS.SOS holds the essential components to represent time series observations, while WFS is a simple model that requires profiling.As it was seen before, species data is not a perfect fit for SOS specification, just for the fact that it was made for observations, meaning that for one time instant only one value of the property is allowed (i.e. in a second, the air can only have one temperature value, not ten).
The only solution found to fix this problem was to create as many properties (Density and Biomass) as species present.If an attribute table of a station (point feature) is imagined, the density of each species will be a column in the table, so at the end the table will have as many columns as species exist.Therefore, it is easy to understand that unfortunately, this will not be the best solution from a practical point of view, because it would require a lot of time to translate the data from the collection structure to this new structure.. Due to the problems listed before, other alternatives must be considered like linked data.

Figure 1 .
Figure 1.General attribute structure of limnological data.

Figure 3 .
Figure 3. Option I SOS + WFS Option II see figure 4 is to use SOS for both biological and environmental data, with the mapping solution proposed in table2, connecting the SOS service to R software through SOS4R.This option was the one implemented in the user case.As it was mentioned before, in this case data management can be difficult if the amount of species data is large.

Figure. 4 .
Figure. 4. Option II.Only SOS Option III see figure 5 is a combination between option II and Linked Data.Linked Data: is basically a service that enables computers to search structured information about all types of data (not only observations) across the web.Linked data methodology is based on Semantic Web principles: (1) Data are uniquely identified using Uniform Resource Identifier (URI); (2) Data are made accessible by computer programs through HTTP URIs; (3) Information about data is expressed