DECENTRALIZED ORCHESTRATION OF COMPOSITE OGC WEB PROCESSING SERVICES IN THE CLOUD

Current web-based GIS or RS applications generally rely on centralized structure, which has inherent drawbacks such as single points of failure, network congestion, and data inconsistency, etc. The inherent disadvantages of traditional GISs need to be solved for new applications on Internet or Web. Decentralized orchestration offers performance improvements in terms of increased throughput and scalability and lower response time. This paper investigates build time and runtime issues related to decentralized orchestration of composite geospatial processing services based on OGC WPS standard specification. A case study of dust storm detection was demonstrated to evaluate the proposed method and the experimental results indicate that the method proposed in this study is effective for its ability to produce the high quality solution at a low cost of communications for geospatial processing service composition problem. * Corresponding author


INTRODUCTION
The evolutions in computing science and Web technology offer the geoscience community with continuously expanding resources for geospatial data collection and processing.Geoscience theories and technologies have improved over the last three decades, and Geographic Information Science (GIS) has evolved from traditional desktop to a network based, multiple-tier and service-oriented architecture (SOA) (David, 2005).Web services are geographically dispersed and are used by a wide array of companies and government organizations because of the reusability, flexibility, and platform independence (Shen et al., 2007).An increasing amount of geospatial resources and processing functions are available in the form of online Web services (Castronova et al., 2013;Hofer 2014).One of the most important values of employing web service technologies is the composition of web services to create value added service (Tong et al., 2011).Web service composition refers to the mechanisms that promote the collaboration and interoperability of individual web service to create software applications with a functionality that is the result of integration of the individual functionalities provided by each participating service, which has the potential to reduce development time and effort for new applications (Rao and Su, 2004).Chaining interoperable geospatial models and data resources is particularly interesting because such a chain can potentially answer more questions than the individual models alone, allowing users to achieve complex tasks in a variety of different contexts (Dubois, 2013).Web-based distributed geospatial computing and large networks of collaborating applications are the next step on the evolution of web-based geoprocessing platforms (Kiehle et al., 2007).
Many attempts to integrate OGC services into service-based geospatial workflows have been proposed within the geospatial and environmental domains (Granell et al., 2010;Goodall et al., 2011;De Jesus et al., 2012;Mullerm et al., 2013;Wang et al., 2013).However, using web services to build an asynchronous geospatial workflow is still a challenge.First, the traditional approach for composing web services is mainly constructed in a centralized manner, whereby an orchestrator component running on a single server is responsible for the execution of all process instances, while all relevant data are maintained at a single location.These centralized web service composition approaches suffer from performance bottleneck (high communication burden and high computational load) Furthermore, both the geospatial sciences and the cloud computing environment are spatiotemporal intensive.Earth science phenomena are complex processes and Earth science applications often take a variety of data as input with a long and complex workflow.It becomes then a critical challenge to deliver such complex applications to cloud as a transparent service to support massive numbers of users.
Thus, a new geospatial model service composition method is needed to address these challenges.In this paper, a Hypercube Geospatial Service Framework (HyperCGSF) is proposed to solve the handling large volume of data and dynamic and complex interaction problems in geospatial model services.This article is organized as follows: section 1 introduces the background and literatures of this study.Section 2 introduces the study area and data used.Section 3 discusses the design of the proposed geospatial model service composition approach.The model validation and case study are introduced in Section 4. The conclusions and recommendations outlook are summarized in Section 5.

Geospatial process service and service composition
OGC has issued a series of service specifications ranging from data service and processing service to catalogue service, making great progress on the use of Web services to publish and access geospatial resources.For data processing over the internet, the Web Processing Services (WPS) specification was released on 2005 by Open Geospatial Consortium (OGC) (Schut, 2007), in order to provide spatial processes through a standardized service interface based on the Hypertext Transfer Protocol (HTTP) (Foerster and Stoter, 2006).Ideally, a Cloud-enabled system where any geospatial process could be exposed and executed through a WPS implementation and the involved spatial data conform to the standards of a Spatial Data Infrastructure (SDI) cloud satisfy on demand provision of valuable geoinformation.In the first OGC compliant Cloud service ever presented (Baranski et al., 2009), the scalability feature of Cloud Computing is evaluated through a comparison of a WPS implemented in the Cloud with one implemented locally.Also, significant effort is being spent regarding SDI integration with Cloud Computing: a thorough analysis of the basic concepts along with the design and test of a Cloudenabled SDI is examined by Schä ffer et al. ( 2010) while Baranski et al. (2011) 2007).The term Composite-WPS that is used to invoke all other services involved is introduced in the case of a bomb threat scenario (Stollberg and Zipf, 2007).
However, current web-based GIS or RS applications generally rely on centralized structure, where the geospatial data is stored on one single server.To get the required geospatial information, it is necessary to collect data and processing resources from multiple service nodes spreading over the network, composite these services as a workflow, and execute the workflow on a centralized controller.This approach has inherent drawbacks such as single points of failure, network congestion, and data inconsistency, etc.The inherent disadvantages of traditional GISs need to be solved for new applications on Internet or Web.

P2P-based technologies and their application in Geospatial science
P2P networking is a paradigm where a set of user machines at the edge of the Internet communicates with one another to share resources without the help of any central authority (Sukumar, 2014).For a P2P network, the geographical boundaries become irrelevant, and the failure of any central authority promises spontaneous growth, as well as freedom form censorship. Peers include friends, collaborators and competitors, and the resource sharing has to be implemented through decentralized protocols.Scalability is an integral part of this concept, which means that no P2P system is worth looking at unless it scales to millions of machines around the globe.Regardless of the legal ramifications of ethical issues, P2P has led to users to a new form of freedom in collaborative resource sharing.One typical application of P2P network is the generation of genomic data about newly discovered proteins by collaborating hundreds of small laboratories all over the world.In addition, Facebook and Twitter also started using P2P technologies for content distribution.
Several researches have been conducted to apply P2P technologies to construct distributed GIS and RS systems.Guan et al. (2004) explored the techniques of enabling GIS services in a P2P environment to overcome the inherent shortcomings of current GISs and presented an implementation called BP-GService.Puppin et al. (2005) applied Globus package to develop a grid information service based on P2P network, which offers a fast propagation of information and has high scalability and reliability following the OGSA standard.Lee et al. (2006) proposed a method of applying P2P network to collaborative GIS environment, particularly targeting exploratory spatial data analysis for small-group brainstorming.Gianluigi et al. (2010) proposed a grid portal for solving geoscience problems using distributed knowledge discovery services by integrating workflow technologies with data mining resources and a portal framework in unique work environment called MOSÈ.

THE HYPERCUBE GEOSPATIAL SERVICE FRAMEWORK
The HyperCGSF consists of a multifunctional geospatial service provider agent model, an underlying networking topology called 'hypercube', and a set of distributed algorithms to support efficient publishing, sharing, managing, and accessing the geospatial service resources (data and processes) distributed over the cloud and orchestration of geospatial processing services in a decentralized manner with the features of security, load balancing, and fault tolerance.

The Geospatial Service Provider Agent (GeoSPA) Model
GeoSPA was designed as a geospatial services hub through which the geospatial model developer and data producer can deploy standard-based geospatial services onto cloud computing system.GeoSPA supplies a series of algorithms for managing and discovering the geospatial services, as well as orchestrating the service composition execution.To achieve these functionalities, three GeoSPA service models were defined including Earth Observation (EO) data service model, processing service model, and computing service model.These three models make GeoSPA as a one-stop solution for building SDI in cloud computing environment.Figure 1 illustrates the internal structure of GeoSPA.
Figure 1.Internal structure of GeoSPA As shown in Figure 1, several functional components are embedded in GeoSPA, which are introduced below.First, the GeoSPA Request Listener acts as the entry point of GeoSPA for processing the incoming requests sent by service consumers or other service agents.The GeoSPA Request Handler processes the incoming request in a simultaneous manner.Each GeoSPA request handler corresponds to a user-specified request.Second, a GeoSPA is equipped with a Knowledge-base, through which the GeoSPA can determine which kind of service model needs to be used to handle the user-specified request.Furthermore, the embedded GeoServer component is responsible to offer the actual web-based geospatial processing services based on the OGC WPS specification.The geoscience problems are always complex and several geospatial services need to be cooperated and coordinated to achieve complex tasks.To support this functionality, each GeoSPA was equipped with an Agent Communication Module, which is responsible for communicating with other GeoSPAs to exchange information.
Finally, each GeoSPA is embedded a Node Database which is responsible for storing all information that is needed for service agent communications and geospatial workflow execution.Based on the hypercube network topology introduced above and GeoSPA model introduced in Section 3.1, this research proposed the Hypercube Geospatial Computing Framework (HyperCGSF) as a scalable and elastic geospatial service framework.The HyperCGSF contains a scalable architecture and a set of distributed algorithms to enable the efficient discovery, composition and execution of geospatial processes persisted by multiple GeoSPAs.Figure 2 illustrates the architecture of HyperCGSF.

Figure 2. Illustration of Hypercube topology with b=2
As shown in Figure 2, central to this framework is the binary hypercube topology for organizing an arbitrary number of available GeoSPA nodes (represented by circle in Figure 2).Based on the hypercube network topology introduced above, there is no centralized execution engine in HyperCGSF, and the output of geospatial process (GP, represented by square in Figure 2) is directly transferred amongst distributed service nodes.The GP can migrate between different HyperCGSF nodes for execution through the GeoSPA processing service and GeoSPA computing service.One of the most important functionalities of HyperCGSF is that it can map the static job workflow specification to the dynamic cloud computing resources on the fly for distributed job workflow execution, and cooperate with the distributed GeoSPAs to achieve complex geospatial processing tasks.

Decentralized Execution of Geospatial Service Composition
A service composition consists of a set of geospatial processes as well as their execution sequence.Figure 3 illustrates the procedure of the decentralized execution of geospatial service composition.In this case, there are five processes in the service composition.The GeoSPA 101 plays the role of geospatial process provider agent, while GeoSPA 100 , GeoSPA 110 , GeoSPA 010 , GeoSPA 011 , and GeoSPA 111 play the role of process worker agents (Figure 3 At the beginning of service composition execution, every worker agent will be dispatched a geospatial process for performing.The work agents need to request to GeoSPA 101 for the migration of geospatial process (Figure 3(b)).Upon receipt of the geospatial processing migration request, the GeoSPA 101 creates a new instance of the requested geospatial processes, which then migrates to the worker agent for execution.Then, every worker agent executes the geospatial process and cooperates with other worker agents to exchange process results.If a process execution completes successfully, the worker agent transmits the desired output variable to the process's immediate successor (Figure 3(c)).For example, in Figure 3(c), the GeoSPA 100 performs p 1 and returns the desired output parameter y 11 to its immediate successor GeoSPA 010 , where p 3 is executed using y 11 as one of its input parameters.

The Integrated Dust Storm Detection Model
An integrated dust storm detection model (IDDM) was designed in this study as a study case to evaluate the efficiency of the HyperCGSF.Dust storms are known to have adverse effects on human health and significant impact on weather, air quality, hydrological cycle, and ecosystem.Five models were involved in IDDM: (1) The Reverse Absorption Technique (RAT) which uses the Brightness Temperature Difference (BTD) of two or more wavelengths for retrieving dust storm region (Ackerman 1997;Zhao et al., 2008).( 2) The Infrared Difference Dust Index (IDDI) model developed by Legrand (2001) to detect the presence of desert dusts over Africa.(3) The Radiative Transfer Model (RTM) has been widely used to retrieve the aerosol optical thickness (AOT) effective radius (Reff) and altitude of dust layer (Shao et al., 2006) by means of look up table (LUT) based inversion calculations.(4) The land surface temperature (LST) model was taken based on a reference image at 11μm band, which synthesized from 2 previous week's clear sky 11μm in the same study area.(Zhang et al., 2006).(5) A 72hour forward trajectory analysis was performed using the NOAA HYSPLIT model with inputs from the National Centers for Environmental Prediction/the National Center for Atmospheric Research (NECP/NCAR) global reanalysis meteorological data.Figure 4 illustrates the workflow of the IDDM.

Experiment Environment
Performance tests were conducted to evaluate the potential computational costs introduced by the HyperCGSF.A prototype system was implemented using the Google Compute Engine (hereafter GCE).GCE is an infrastructure service provided as part of the Google Cloud Platform.GCE is made up of three major components: virtual machines, persistent disks, and networks.It is available at several Google data centers worldwide and is provided exclusively on an on-demand basis.GCE provides worldwide Cloud services, such as IaaS, PaaS, and SaaS.Five GCE virtual machine (VM) instances (instance-1 to 8) were purchased in this research and deployed the models of IDDM introduced in Section 4.1.Five Windows Server 2008 Cloud VMs were created from one Cloud instance image.Each VM has one virtual CPU of 2.50 GHz, with 3.75 GB of RAM, a 50-GB disk, and bandwidth of 4 Gb/s.

Response Time for Varying Request
To evaluate the efficiency of the algorithms in the presence of many simultaneous accesses, the average execution time of IDDM workflow was recorded and compared using both the HyperCGSF and the traditional BEPL-based WPS service composition (BPEL-WPS) approach (Meng et al., 2009).The objective of this experiment was to evaluate how the average execution time varies with the increment of domain size and request rate.The request rate is the number of incoming service composition requests every minute.Figure 6 shows the experiment results using different request rates from 1 to 60, over geographical scope of 10°×10° (Figure 6  Several conclusions can be drawn from Figure 6.First, the average execution time of BPEL-WPS and HyperCGSF increases dramatically with the number of current requests.For example, the response time for these two methods increases from about several minutes to approximately one hour when the number of requests per minute increases from 1 to 60.That is because before model execution, the service agent needs to read geospatial data with large volume from remote sites as the input parameters.
Second, the response time of HyperCGSF is less than the traditional BPEL-WPS approach for every request number, and the increasing rate of execution time for HyperCGSF is also lower than BPEL-WPS.The test result is reasonable.The BPEL-WPS approach applies the centralized manner that the interaction and data exchange movements are conducted through the orchestrator, or workflow execution engine.The geospatial processes can generate a lot of data that is irrelevant to the composite service, yet this data will be transferred to the coordinator node where it is discarded, thereby putting an unnecessary load on the network.Different from BPEL-WPS, the HyperCGSF applies the decentralized architecture in the way the service agents can communicate directly with each other to exchange processing results on demand.
One of the most advanced features of HyperCGSF is that it supports the migration of geospatial processes among various GeoSPAs.This feature is extremely useful for geoscience because the geospatial data is always Big Data and distributed on remote sites.Considering that the geoscience applications always need to process large volumes of geospatial data, transferring the geospatial processes rather than the geospatial data over a cloud computing environment is significant in that it can dramatically decrease the volume of data transmission and increase the computing efficiency.Several studies have shown the advantages of applying the migration of the service agent in geospatial model services (Tan et al., 2015).However, some security issues must be taken into consideration before migrating a geospatial process from one GeoSPA node to another.

CONCLUSION AND FUTURE WORK
This paper presents a decentralized P2P-based geospatial model service composition framework (HyperCGSF) for multiple geospatial model sharing and operation through Internet.The service composition has provided a promising computing paradigm for the automatic model service composition.Based on the OGC WPS standard, the HyperCGSF was developed to provide an explicit service and interactive interface for sharing/accessing geospatial models through model services.
The open standards help reduce the interoperability problems that may be encountered when using closed standards, such as commercial private standards.The HyperCGSF developed in this study can handle issues related to establishing the geospatial model workflow, allowing modellers to implement the programming interface without directly developing model services and, therefore, focus on model algorithm implementation.The experiment on the detection of dust storm presence demonstrates the feasibility, efficiency, and effectiveness of the proposed framework and its better fulfilment of geospatial model service.
Future work will be focused on the following aspects: (i) efforts will be made to improve the mobile capability of the agent in terms of movement and communication, and meanwhile make full consideration about the security issues.(ii) The Highperformance Computing (HPC) capability provided by Cloud computing platform will be investigated to develop high performance agent-based geospatial service chain.
Hypercube-based Network TopologySchlosser et al (2002) proposed a network topology called 'hypercube' to manage the peers in a P2P network.Hypercube is one of the most important structures which is regular, symmetric, shorter diameter, good fault-tolerant properties and so on.A complete hypercube graph consists of the number of nodes in each dimension and d is number of dimensions spanned by the cube.Figure2depicts the one example of hypercube topology drawn in 3D with the base b=2.Based on Schlosser, essentially every node can perform as the root node of a tree which spans all nodes in the hypercube.A complete hypercube topology has N=b^(L_max+1) nodes and has a Δ equals to log_b N, where L_max+1 is the number of dimensions spanned by the cube.There are (b-1)•(L_max+1) neighbours for each node in the hypercube.

Figure 4 .
Figure 4. Workflow of the integrated dust storm detection model

Figure 6 :
Figure 6: Comparison of average process execution time using different service composition methods for the IDDM.
made use of a hybrid Cloud by combining local public IT-infrastructure in order to meet Quality of Service requirements set by INSPIRE directive (European Commission,