REAL TIME SEMANTIC INTEROPERABILITY IN AD HOC NETWORKS OF GEOSPATIAL DATA SOURCES : CHALLENGES , ACHIEVEMENTS AND PERSPECTIVES

Recent advances in geospatial technologies have made available large amount of geospatial data. Meanwhile, new developments in Internet and communication technologies created a shift from isolated geospatial databases to ad hoc networks of geospatial data sources, where data sources can join or leave the network, and form groups to share data and services. However, effective integration and sharing of geospatial data among these data sources and their users are hampered by semantic heterogeneities. These heterogeneities affect the spatial, temporal and thematic aspects of geospatial concepts. There have been many efforts to address semantic interoperability issues in the geospatial domain. These efforts were mainly focused on resolving heterogeneities caused by different and implicit representations of the concepts. However, many approaches have focused on the thematic aspects, leaving aside the explicit representation of spatial and temporal aspects. Also, most semantic interoperability approaches for networks have focused on automating the semantic mapping process. However, the ad hoc network structure is continuously modified by source addition or removal, formation of groups, etc. This dynamic aspect is often neglected in those approaches. This paper proposes a conceptual framework for real time semantic interoperability in ad hoc networks of geospatial data sources. The conceptual framework presents the fundamental elements of real time semantic interoperability through a hierarchy of interrelated semantic states and processes. Then, we use the conceptual framework to set the discussion on the achievements that have already been made, the challenges that remain to be addressed and perspectives with respect to these challenges.


INTRODUCTION
The recent technological advances in geospatial data gathering have resulted in a growing number of geospatial data producers.Combined with the increased pervasiveness and availability of various kinds of networks and Internet, the final result is that very high volumes of geospatial data are made available to endusers.At the same time, geospatial data remains costly to produce and maintain, so sharing the existing geospatial data is often put forward as a solution instead of producing more data.From this principle, the concept of geospatial data reusability has emerged, and with it, the need to assess whether geospatial data that was produced for a specific need and in a given context, is suitable for a geospatial data user that may have different requirements and operates in a different context.For geospatial data sharing and reuse to be meaningful, the parties must be aware of the meaning of their exchanged data.That is, semantic interoperability must be ensured.
Several semantic interoperability frameworks have been proposed in recent years, both in the geospatial domain (e.g., Bishr 1998;Brodeur et al. 2003;Kuhn 2003;Kavouras et al. 2005;Bakillah et al. 2006;Lutz and Klien 2006;Hess et al. 2007;Cruz and Sunna 2008) and in the larger information system community (Kalfoglou and Schorlemmer 2003;Park and Ram 2004;Keeney et al. 2006;Euzenat and Shvaiko 2007).In ad hoc networks, the data sources that have to interoperate are not known in advance and are dynamically changing.In addition, ad hoc networks now integrate non-traditional types of data sources, including mobile devices and sensors networks, which raise additional challenges that were not considered for traditional geospatial databases.Therefore, traditional semantic interoperability systems that were dedicated to a static and limited number of known sources are no longer appropriate for the new distributed and heterogeneous environments, such as ad hoc networks, due to their thigh coupling and their lack of flexibility.The objective of this paper is to propose a conceptual framework for real time semantic interoperability in ad hoc network of geospatial data sources.The framework identifies, through a hierarchy of interrelated semantic states and semantic processes, the requirements that must be met to achieve semantic interoperability in this new, but already widespread, type of environment.The framework is meant to delineate requirements for future research and, as such, we discuss the achievements that were already made toward the development of such a framework, the challenges that the framework raises, and the perspectives to address these challenges.
This paper is organized as follow: in Section 2, we review related work and discuss existing semantic interoperability approaches with respect to their suitability for ad hoc networks of geospatial data sources.In Section 3, we present our framework.In Section 4, we discuss achievements, challenges and perspectives.Finally, in Section 4, we conclude this paper.

RELATED WORK ON SEMANTIC INTEROPERABILITY
In the geospatial domain, a well-known definition of interoperability is given in ISO TC204, document N271: interoperability is "the ability of systems to provide services to and accept services from other systems and to use the services so exchanged to enable them to operate effectively together."Bishr (1998) have identified six levels of interoperability between spatially distributed independent geographical information systems, among which semantic interoperability is the highest.Semantic interoperability is defined as the "knowledge-level interoperability that provides cooperating databases with the ability to resolve semantic heterogeneities arising from differences in the meaning and representation of concepts" (Park and Ram 2004).
The main obstacle to semantic interoperability, therefore, is heterogeneity (Brodeur et al. 2003).Heterogeneity may be classified as syntactic, structural and semantic (Brodeur et al. 2003).Syntactic heterogeneity occurs when different geospatial databases use different formats.The standards that where developed by the Open Geospatial Consortium (OGC) aim at resolving syntactic heterogeneity by providing common formats, for example, the Geography Mark-up Language (GML) which establish standard geometrical primitives in the ISO19107 standard.Structural heterogeneity occurs when data is structured differently.For example, the level of granularity may be different (e.g., regions vs. countries); or the same real world feature (e.g., lake) may be represented with a different construct, for instance, as a class or as the value of the attribute "type of water body."At the spatial level, the same geographic feature may be represented with different geometric primitives (e.g., a road being abstracted as a line or as a polygon), and at the temporal level, the same event may be associated with different temporal primitives (e.g., as a date or as a period).Finally, semantic heterogeneity is the difference in the intended meaning of concepts.For example, "geometry of building" may represent the "roof of the building" or the "foundation of the building."One of the well-know solution to address the issue of semantic heterogeneity is the ontology.The ontology is "an explicit specification of a conceptualisation," where the conceptualisation is "a combination of concepts, and other entities that are assumed to exist in some area of interest and the relationships among them.… It is a simplified view of the world that we wish to represent for some purpose" (Gruber 1993).Ontologies are recognized as a major component of a semantic interoperability approach and of the Geospatial Semantic Web (Agarwal 2005).They are widely employed to define the semantics of resources, such a geospatial databases (Fonseca et al. 2002;Mostafavi et al. 2004;Brodaric 2007;Hess et al. 2007;Cruz and Sunna 2008) and functionalities of geo-services (Lutz 2005;Lemmens 2006;Lutz and Klien 2006).Ontologies support various semantic interoperability tasks, notably data and service discovery.They can be used to provide a description of available data and services, so that users' search queries can be matched against these descriptions through ontology mapping.Kalfoglou and Schorlemmer (2003) define ontology mapping as a morphism, consisting of a set of functions assigning the symbols used in one ontology to the symbols of the other ontology.More concretely, the ontology mapping process consists in taking as input two or more ontologies, and return the semantic relations (also called alignments) between the ontology components.The emergence and spreading of ontologies have given rise to numerous ontology mapping approaches, which were thoroughly reviewed by Euzenat and Shvaiko (2007).
However, static environments where the set of data sources that have to be integrated are known in advance and remains static is less likely to be the norm with the increasing pervasiveness of mobile and wireless devices, sensor networks, etc.According to Zafeiropoulos et al. (2009), it is not unrealistic to expect that in a somewhat near future, the pervasiveness of sensor networks will significantly increase, with these sensors producing data that will be accessible over the Web.Ad hoc network is a computing paradigm that enables the rapid, on-the-fly formation and dissolution of networks with short existence; it is formed with a mobile platform, which is composed of nodes that represent autonomous systems (Hafsia 2001).Ad hoc networks requires no fixed infrastructure, their nodes are self-organizing into temporary configurations for often short-term purposes.The resolution of semantic heterogeneity is not the only concern that must be address to achieve semantic interoperability in ad hoc networks.
Firstly, ad hoc networks are likely to include data sources which semantics is poor.For example, the standards to describe the semantics of sensors (e.g., Sensor Model Language, or Sensor ML), which play an increasingly critical role in capturing and distributing observations of phenomena in our environment, are currently not sufficient to support semantic interoperability of sensor data (Jirka et al. 2009).Semantic poorness makes differences in intended meaning of data undetectable (Farrugia 2007), and refrains from finding accurate semantic mappings between semantic representations of data sources.
Secondly, because ad hoc networks are dynamic, the ontology mapping process must be automated (Keeney et al. 2006).However, as of today, there is only a few ontology mapping approaches that are automated (e.g., Montanelli and Castano 2008;Bakillah and Mostafavi 2010), but they still require some user input.Also, the challenge of designing an ontology mapping approach that would be suitable for ad hoc networks is to strive for a balance between the cost of the ontology mapping process and its capacity to process rich and complex semantic structures in order to preserve semantics and enable to user to correctly interpret shared data.
Another obstacle to semantic interoperability is related to the fact that ad hoc networks are decentralized.In comparison, for example, to portals, there is no central server, agent or "authority" where information on sources being available in the network can be accessed, and that can be responsible for identifying relevant query recipients as well as forwarding the queries submitted by users.Consequently, queries must be forwarded, or "propagated," from node to node in a decentralized manner, i.e., each node that receives the query is responsible for identifying to which of its neighbours it will further forward the query.The issue of finding the nodes of a decentralized network that can process a given query has been investigated in query propagation approaches that were mainly targeted at peer-to-peer networks.To select query recipients, these approaches rely on existing mappings between ontologies of peers (Mandreoli et al. 2006) or ontology mappings that are computed at run-time (Montanelli and Castano 2008).However, this requires ontology mappings to be computed between a significant numbers of nodes, which can be very costly.
Finally, the dynamicity of the network raises additional challenges.The nodes of ad hoc networks are autonomous, i.e., they are free to move, they can be available or unavailable at any time, and they can quit or enter the network at any time.Therefore, the members of the network and its configuration are not predictable.As a result, semantic interoperability strategies must constantly adapt to the currently available nodes and topology.The concept of "real time" semantic interoperability, in this context, goes beyond the concept of "run-time" semantic interoperability.While the latter refer to the idea that available data can be integrated "on demand," or "on-the-fly" (Keeney et al. 2006), real time semantic interoperability means that a semantically interoperable system is reactive to the changes that occur in the network.According to Kopetz (2011), in a realtime computer system, the correctness of the system behaviour, which is the sequence of output of the system, "depends not only on the logical results of the computations, but also on the physical time when these results are produced."This means that a real-time-semantically-interoperable system is a system where the outputs of the components, such as responses to users' queries, are time-dependent.For example, the results of a query that was submitted in a relatively near past could be modified by the arrival of a new node in the ad hoc network.In the following, we will present a conceptual framework that summarizes the requirements for real time semantic interoperability in ad hoc networks of geospatial data sources.

FRAMEMORK FOR REAL TIME SEMANTIC INTEROPERABILITY IN AD HOC NETWORKS OF GEOSPATIAL DATA SOURCES
In this section, we propose a framework for real time semantic interoperability in ad hoc networks of geospatial data sources that attempt to answer the requirements that were highlighted in Section 2. The idea of the framework is to model real time semantic interoperability in ad hoc networks of geospatial data sources as a set of interrelated states and processes.Firstly, the framework specifies the different semantic states of the ad hoc network.A semantic state can be seen as layer of semantics over the network.Semantic states are organized into a hierarchy.This means that each semantic state adds a layer of semantics with respect to the previous semantic state, and reaching the upper semantic states can only be achieved by going through the intermediary semantic states.
In parallel, the framework specifies the semantic processes that are required to achieve these semantic states.Each semantic process allows passing from one semantic state to the next semantic state: it takes as input the semantics that are available in the first semantic state, and its output is the additional semantics that are available in the following semantic state.Figure 1 illustrates the framework with the semantic states and processes.In the following, we explain these different semantic states and processes, while in the next section, we discuss the achievements and challenges with respect to this framework.We assume that the ad hoc network is populated with nodes, where each node holds a single source and is autonomous and can be semantically independent from other nodes.
The lowest state is the ad hoc network of geospatial data sources.At this level, since there is no semantics, it is not possible to achieve semantic interoperability.The second state is the ad hoc network with basic semantics.At this state, geospatial data is described with basic semantics.For example, it could be basic conceptual models of geospatial databases, or a set of keywords or tags describing sensor data.The second semantic state is achieved through the process of basic semantic specification, which could include for example user tagging (i.e., a process where users are associating keywords to data sets).At this level, meaningful data sharing is also difficult because of poor semantics, and the risk of misinterpretation and misuse of data is high.The third state is the ad hoc network with enriched semantics.At this state, the basic semantics are enriched with, for example, synonyms from lexicons (Su 2004); dependencies between properties of concepts, or contexts of concepts; or additional spatial or thematic properties (Kavouras et al. 2005), through a semantic enrichment process.Semantic enrichment is performed through knowledge extraction, which includes a range of techniques such as data mining, clustering, classification, semantic information extraction from texts, sequential pattern mining, association rule mining and social network analysis (Ding and Sundarraj 2007).At the enriched semantics state, the semantics of geospatial data sets are specified with rich and comprehensive ontologies that formalize the meaning of geospatial data.The semantics of different data sets can be compared and semantic heterogeneities between them can be resolved.In addition, at this state of semantic interoperability, sources have been enriched with a description that helps to identify the data they contain and the context in which these data were created.For example, these descriptions can include the application domain, the functions and tasks to be performed with the data, the geographical coverage of entities, etc. (Wiegand and Garcia 2007).
Since nodes are independent and semantically heterogeneous, each node holds their own enriched semantics.However, if there exist nodes in the network that are semantically dependent (e.g., they share the same ontology), a federating, local broker node for such groups of nodes can hold the common enriched semantics.This local broker node acts as a local access point (gateway) to the other nodes of the group, and permits access to enriched semantics, which is being used in the computation of semantic mappings with these other nodes.However, because the ad hoc network is large, it is not yet fully semantically interoperable: in the absence of a central repository that could be browsed to identify the available data sources, users are still unable to discover the relevant data sources that could fulfil their needs.This will be achievable only in the upper semantic states of the network.
The first step towards the discovering of relevant data sources is the clustering of the network, which is achieved at the state named clustered ad hoc network with enriched semantics.At this state, the network is partitioned into groups of sources that have similar features.For example, it could be partitioned into groups of sources with similar or complementary functions, groups of sources that contain data on the same application domain, etc.The purpose of the clustering is to facilitate the identification of groups of sources that are relevant with respect to a user's query.The achievement of this semantic state is done through the semantic clustering, or semantic grouping (Kantere et al. 2008) process, which consists in mining the network to find groups of similar sources according to selected criteria.Once such a semantic group is formed, a leader node is designated.This leader node holds a description of the group.This description can be a set of attribute-value pairs, describing, for example, the application domain(s) of the group's sources, the intended use of data stored at nodes, etc. Similarly to the local broker node, this leader node acts as a local access point to the other nodes of the semantic group.Other nodes of the network can access the description of the group by requesting it from the leader node, in order to discover if the group contains sources that may hold data relevant to their needs.The fourth state in the stack is only a preliminary state to support the source discovery in the ad hoc network.Since the network is decentralized, the discovery or relevant sources must be done through query propagation from node to node in the ad hoc network.To support query propagation from source to source, we argued in Section 2 that it is not efficient to rely on semantic mappings between concepts of ontologies at nodes, since it would then be necessary to compute and store a large number of semantic mappings.Therefore, query propagation could more efficiently be supported by semantic mapping between descriptions of sources.This is achieved at the semantic state named source-linked clustered ad hoc network with enriched semantics.This state is qualified as source-linked because semantic mappings are established at the source level.To reach this state, the semantic mapping at source level process must be performed.In this process, a semantic mapping engine is leveraged to automatically detect similar or complementary source descriptions and resolve the semantic heterogeneities between them.Each leader node offers a semantic mapping service, which receives descriptions of sources and performs the matching between the attribute-value pairs of different sources.The resulting semantic mappings are stored at the leader node as well.Although computing and storing semantic mappings at individual nodes would reduce the risk of failure at the leader node, in ad hoc networks, it cannot be assumed that every node has sufficient storage and processing capacity to support complex semantic mapping tasks.
In addition, because the network is dynamic, the query propagation approach needs to be reactive to changes in the network (addition or removal of a source).To do so, reasoning techniques are needed to deal with changes and assess how they must be reflected at the semantic level.
The higher semantic state of the ad hoc network is the sourceand-ontology linked clustered ad hoc network with enriched semantics.At this state, semantic mappings are established both at the source level and at the ontology level, i.e., between ontology components of the different sources.Semantic mapping at ontology level can be performed on-demand, i.e., when relevant sources were identified through query propagation and the user needs to retrieve which of the concept(s) in a source's ontology points to the needed data set.Performing semantic mapping between heterogeneous ontologies is a very complex task, because of the wide variety of heterogeneities.Syntactic heterogeneities can be resolved if the different ontologies use a same standard to express semantics.However, resolving structural and semantic heterogeneities is far from straightforward.Structural heterogeneity occurs when a feature is represented with different ontological components; for example, the feature "street" can be represented as a class or as a value of the attribute "road type" (Brodeur 2004).To resolve structural heterogeneities, we need a semantic mapping system that can compare and match heterogeneous types of ontological components, such as proposed in Ghidini and Serafini (2006).Semantic heterogeneity occurs when there are meaning differences.To resolve semantic heterogeneities, a semantic mapping system must combine several strategies, including the following: In Bakillah and Mostafavi (2010), we have described such a semantic mapping system, where in addition, semantic mappings can be computed considering different perspectives, resulting in context-dependent semantic mappings.In addition, in Bakillah and Mostafavi (2009), we have developed a Description Logic-based semantic similarity measure for ad hoc networks.As for semantic mappings between descriptions of sources, semantic mappings between concepts of ontologies are stored at leader nodes and can be accessed to support propagation and translation of queries.At the highest semantic state of the ad hoc network, it is possible to discover relevant sources, resolve semantic heterogeneities, retrieve relevant data sets and avoid misinterpretation and misuse of shared data.

ACHIEVEMENTS, CHALLENGES AND PERSPECTIVES
To achieve the above-described framework, there exist some technologies that can be leveraged, but there remain some challenges that must be addressed and for which solutions are still inadequate or lacking.In this section, we highlight some of these challenges.The first challenges are related to the basic semantic specification process.Traditionally, basic semantics of geospatial data is specified though the conceptual model of geospatial databases.However, real time data provided by mobile devices and sensor networks does not necessarily obey to this traditional schema.We need to automatically annotate the features and phenomena that are captured by these devices to a formal vocabulary (ontology) in order to formally identify their meaning (Bröring et al. 2011).Interesting avenues towards achieving this goal include rule-based strategies, where Semantic Web Rule Language (SWRL) rules can be employed to verify if a feature respects pre-defined constraints that define a concept (Klien 2007).
Other challenges still need to be addressed regarding the semantic enrichment of geospatial data.First, we have to deal with new types of semantic specifications, for example, userdefined tags and folksonomies.Because they are created by users, the quality of these semantic specifications is likely to vary.This can affect semantic enrichment, because it is difficult to automatically enrich semantics which are already ill-defined.
In other words, the efficiency of semantic enrichment also requires that the input semantics meet minimal quality requirements.Therefore, there is a need for a well-defined framework for the creation of these types of user-defined semantics, as well as of a framework for the assessment of their quality.Interesting avenues regarding semantic enrichment also include the integration of context information through context-aware systems principles (van Kranenburg et al. 2006).Finally, standards for encoding the semantics of sensor data need to be improved in order to support improved semantic interoperability.For example, Jirka et al. (2009) explain that SensorML is a generic standard that allows specifying the same information through different structures, which makes it difficult to process SensorML descriptions automatically (either for semantic enrichment or semantic mapping).
With respect to semantic clustering, already much work has been achieved in this area (Giunchiglia and Zaihrayeu 2002;Khambatti et al. 2002;Crespo and Gracia-Molina 2002;Lumineau and Doucet 2004;Kantere et al. 2008).However, research still needs to be done to integrate geospatial aspects into these approaches, since they are mainly targeted at nongeospatial applications, such as generic peer-to-peer networks.Similar considerations can be formulated regarding query propagation approaches (e.g., Montanelli and Castano 2008), which do not integrate geospatial features explicitly.In addition, although very few query propagation approaches have update mechanisms, they are not reactive to changes in the network.With respect to this issue, the integration of the publish-subscribe paradigm in semantic interoperability framework is an interesting research perspective.
Finally, with respect to semantic mapping, whether it is performed between descriptions of sources or between the components of their ontologies, work still needs to be done to make semantic mapping systems truly automatic, as well as to improve precision and recall in less controlled and nonpredictable environments.To do so, the challenges that remain to be addressed include the following: Develop appropriate external resources (lexicons, global ontologies, common vocabularies and thesauruses): semantic mapping is often based on the assumption that a shared vocabulary is available (Euzenat and Shvaiko 2007).This shared vocabulary is meant to ensure that mapping systems are able to compute semantic mappings across different domains and heterogeneous communities.Different external resources can be used depending on the targeted application domain.For example, the Semantic Web for Earth and Environmental Terminology (SWEET) ontology developed by the NASA can be used to reconcile semantics of sensor data on environmental phenomena.However, the development of appropriate terminological resources that are accepted across different communities and domains is a huge challenge.
Improve reasoning capabilities to enable the identification of appropriate external resource to be used for a given semantic mapping task.As discussed below, one can think that the development of a global and universal ontology is not possible.Consequently, we must rather explore the issue of which external resource to use in which situation (e.g., depending on the application domain).To do so, semantic mapping systems must be able to identify the relevant characteristics of the compared semantic description and use reasoning engines to infer which external resource(s) must be accessed to resolve the mapping problem.In addition, semantic mapping systems must be deployed to determine semantic relations between the different reference ontologies, in order to support cross-domain reasoning.
Improve reasoning capabilities with respect to spatial and temporal aspects of concepts.While spatiotemporal reasoning languages and methods exist to reason with spatiotemporal entities, these still need to be fully integrated into semantic mapping systems to improve the capability to discover and resolve heterogeneities with respect to spatiotemporal features at the conceptual level.
Integrate more advanced natural language techniques into semantic mapping systems.Firstly, natural language techniques, if integrated into semantic mapping systems, can help to improve the precision and recall of these systems (i.e., the ability to retrieve all relevant and accurate mappings).As such, most difficulties in semantic mapping are caused by a lack of formal semantics and the use of natural language in semantic descriptions; as of now, existing semantic mapping systems are not able to process and exploit such non-formalized semantics.Secondly, at the moment, semantic queries are directly formulated in formal languages (according to the predefined vocabulary), in order to be easily processed by semantic mapping engines.However, users should be able to formulate their queries in natural language, not only according to the predefined formalized terminology.Current semantic mapping systems do not have sufficient reasoning capabilities to process complex natural language expressions, therefore significantly reducing the ability of users to express their information needs.
Then, misinterpretation and misuse of geospatial data can be related to (1) missing, insufficient, or ambiguous descriptions of semantics; (2) inability of systems to process and exploit, propagate preserve semantics.The effect of misinterpretation and misuse of geospatial data include unsound decision making.Consequently, the development of a comprehensive real time semantic interoperability framework is fundamental to ensure that human knowledge can be extracted from data.

CONCLUSION
This paper has addressed the issue of real time semantic interoperability in ad hoc network of geospatial data sources.We have proposed a framework that is composed of six main semantic states and five main semantic processes.The semantic states include basic semantics, enriched semantics, clustered network, source-linked clustered network and source-andontology-linked clustered network.The study of existing work demonstrates that the semantic processes that would support the achievement of these semantic states are not sufficient yet.In particular, research still needs to be done to integrate and deal with new type of semantic descriptions arising for example from the fact that data users are also become data producers, and from sensor data descriptions.Also, we discussed the impact of these new types of semantic descriptions on semantic enrichment of geospatial data.The analysis presented in this paper also show that the concept of "real time semantic interoperability" is at an very early stage of development, since existing semantic interoperability approaches are not reactive to changes in the network.Finally, we have highlighted the limitations of semantic mapping techniques with respect to the presented framework.Globally, the future developments in line with the presented framework will also contribute to the development of the Geospatial Semantic Web.

Figure 1 .
Figure 1.Framework for real time semantic interoperability in ad hoc networks of geospatial data sources Linguistic techniques to compare the terminologies being used (Giunchiglia et al. 2004; Euzenat and Valtchev 2004)  Structure-based techniques, which use the structure of the compared ontologies (taxonomic relations, constraints, etc.) to discover matches (Giunchiglia et al. 2004; Hu and Qu 2008; Bakillah and Mostafavi 2010)  Techniques based on external resources, using for example global or domain ontologies and thesaurus (Massmann et al. 2006; Bakillah and Mostafavi 2010)  Formal matching techniques based on a reasoning engine (Giunchiglia et al. 2004; Bakillah and Mostafavi 2010).