A VISUAL ANALYTICS OF MOVEMENT DATA OF A WASTE COLLECTION SERVICE: A TOOL FOR SMART CITIES

Solid waste management is an important urban issue to be addressed in every city. In the smart city context, waste collection allows massive collection of data representing movements, provided by satellite tracking technologies and sensors on waste collection equipment. For decision makers to take advantage of this opportunity, an analytical tool suitable for the waste management context, able to visualize the complexity of the data and to deal with different types of formats in which the data is stored is required. The aim of this paper is to evaluate the potential of an interactive data analysis tool, based on R and R-Shiny, to better understand the particularities of a waste collection service and how it relates to the local city context. The User-centered Analysis-Task driven model (AVIMEU) is presented. The model is organized into seven components: database load, classification panel, multivariate analysis, concurrency, origin-destination, points of interest and itinerary. The model was implemented as a test case for the waste collection service of the city of Pasto in the southwest of Colombia. It is shown that the model based on visual analysis is a promising approach that should be further enhanced. The analyses are oriented in such a way that they provide practical information to the agents or experts of the service. The model is available on the site https://github.com/MerariFonseca/AVIMEU-visualanalytics-for-movement-data-in-R.


Urban movement data for waste collection
Municipal solid waste has become a fast-growing problem. Moreover, it is one of the main challenges of the smart cities because this public service is essential to improve the citizens' quality of life. Nevertheless, there are few studies that focus on how to improve waste management in a smart city data-driven context (Esmaeilian et al., 2018).
Solid waste management, when coupled with satellite tracking technologies and sensors on waste collection equipment, generate a significant amount of data. However, collection, preparation and analysis of such data is a time-consuming task, and specialized knowledge and new tools are required in order to generate relevant information for decision makers.
Visual analytics is an alternative for understanding urban phenomena using data. A considerable number of these proposals have been directed towards the analysis of transport and traffic problems ). However, these interactive visual methods can be applied to study different types of urban phenomena and services.

Problem definition and objectives
The development of user-friendly data-driven tools for exploring and analyzing movement data is a current need for local governments and companies that provide public services, such as solid waste management or urban mobility services.
In the case of waste collection services, the local conditions for each urban area and the complexity of the service imply that operators must make modifications to the service using field experience or by means of on-site adjustments. Given this situation, having tools to extract patterns and phenomena from data would be a relevant contribution to better planning the service, in particular to better adapt good operational practices to the specific characteristics of urban areas. The novel contribution of the work is that it is uses a data-based strategy to analyze the interaction between the waste collection service and the urban space. The user-centered, task-driven approach to arrive to useful interactivity is important in this work. This strategy could allow decision makers to have more comprehensive information about the service and the local context, in this case through a visual analysis of a large amount of data.
There are different proposals for carrying out visual analytics for movement data. However, most of them were created according to the source data and for a specific context, being this a limitation to implement the tools in other scenarios and case studies. Therefore, there is an interest in developing methods and tools that allow a greater flexibility and thus increase the possibility of implementation and acceptance by end users.
The aim of this paper is to evaluate the potential of an interactive data analysis tool, based on R and R-Shiny, to better understand the particularities of a waste collection service and how it relates to the local context of the city. Specifically, the work is guided by the following questions. Could visual analysis, applied to movement data, be a useful approach to create a more flexible tool for different data structures and formats, to better understand the operation of an urban waste collection service? Does the visual analytic approach in movement data of an urban waste collection service allow to spatially determine the location where the service is provided in an unusual way? Could such an approach be useful in exploring relationships between abnormal waste collection behavior and the urban structure?

Movement data
The notion of movement is associated with the change in the physical position of an object in relation to a frame of reference, a geographical space. The path formed by the movement of the object in a given time is called a trajectory (Andrienko et al., 2013). For the sake of simplicity, the trajectories must be represented by finite sequences of time-referenced locations. According to (Güting and Schneider, 2005), there are two types of spatial-temporal data: when objects vary discreetly in space and when the variation is continuous. The definition of motion data in this study applies only to the second case, when the spatial change is continuous.
Movement data (or space-time data) has been studied by several authors (Trajcevski et al., 2006). Peuquet (1994) proposed to understand space-time data through three aspects: where, what and when. This approach is useful in practice as it allows the data to be structured for further analyses. In addition, space-time data are associated with attributes of which analysis is central to understanding the phenomenon being represented.
For the purposes of this study, it is assumed that the movement data, required as input for a visual analytics approach, falls into two categories, proposed by (Kong et al., 2018).
Firstly, there is explicit data which is information that comes from sources that provide data directly about the time and location of the moving entity. This data corresponds to the path of an element from points sampled by GPS and defined time intervals.
The second category is the implicit data that is obtained by sources such as signals, sensors or networks, where the moving object is not necessarily the direct source of information. Several established points collect the data in specific areas. In this case, only the two spatial points at the beginning and end of the moving object's path are recorded.

Movement data mining
According to Dodge et al. (2016), to study the space-time data, the analysis of the trajectories must be prioritized. Therefore, for the purpose of this work, data mining is a process willing to extract useful information through descriptive and predictive analysis of movement data series. The purpose of the first analysis is characterizing the data by means of structure detection. Subsequently, predictive analysis allows making estimations of unobserved variables from those ones that can be observed, as well as making estimations for the future.
Data mining in trajectories has been done with clustering and classification methods, followed by pattern identification with aggregate calculations (Mazimpaka and Timpf., 2016). For example, there is the k-means algorithm for spatial data or the ST-DBSCAN, a density-based notion of clusters which is designed to discover clusters of arbitrary shape (Ester et al., 1996).

Visual analytics for movement data
The development of smart cities has created a suitable environment for the development of new spatial-temporal data visualization techniques in urban studies. Huang et al. (2016) proposes a model that integrates graph modeling and visual analysis to study urban mobility patterns. The proposed model is TrajGraph and its main contribution is the introduction of a control panel to identify regions of interest.
The COOC system is a model proposed by (Kong et al., 2019) to explore concurrent patterns in urban mobility. The model is based on establishing regional relationships and is designed to study taxi trajectories. The limitation of the COOC system is that it requires a lot of data processing. This step is important as it would allow it to be adapted to different study contexts and types of database.
Interesting proposals based on STC (Space-Time Cube) going to an immersive proposal (Wagner et al., 2019) in order to offer a more user-friendly interactive visualization. Difficulties on classical desktop or web application are always present in face of 3D interactions.
There are other proposals such as TripVista (Guo et al., 2011) which uses a spatial, temporal and multidimensional perspective to analyze vehicle traffic trajectories at road intersections. The multidimensional view allows to contrast the variables associated to the paths (selected according to the user's interest) by means of parallel coordinate plots. However, because it is designed to study urban traffic on a micro-scale, it is difficult to use it to study other types of movement data.
The models cited above are intended to facilitate the task of decision making. To accomplish this, the tool assists the user to understand concepts, ask questions and get answers from different points of view thanks to interactivity. Such an approach has been widely studied by Munzner (2013). The framework is based on three components: WHAT (describe data type), WHY (translate user questions into tasks) and HOW (display interaction data). There are also the proposals of (Keim et al., 2010) who highlights the process of visualizing and interacting with the data in an iterative way as an alternative to extract knowledge.

Complexity in urban waste collection
Solid waste management in urban areas is a complex task. At the operational level, there are various challenges that raise costs and make it difficult to carry out the task. Babaee Tirkolaee et al.
(2019) mention for example recurring problems resulting from spatial and temporal variability in the amount of waste generated, in truck maintenance and in fuel consumption.
There are several studies that are interested in optimizing trajectories using algorithms such as vehicle routing problem with time windows (Babaee Tirkolaee et al., 2019), multi-criteria decision analysis MCDA (Mondal et al., 2019), which are useful for urban service planning. If few data of the study territory are available, (Solano Meza et al., 2019) proposes to use machine learning (decision tree, support vector machines and neural network models) to forecast the generation of solid waste.
The above approaches are useful for decision-making. However, there is little work on developing methodologies for data analysis when produced in large quantities. This occurs when the urban service is monitored with sensors and collects data through Global Positioning System (GPS). To better extract information from such data, it must be integrated with other local and spatial data, which is a complex task that has been poorly explored. Howell et al. (2019) claims that there are challenges to achieving a data-based waste management planning. For example, there are no common guidelines for collecting and reporting data about waste. And as a result, data cleansing and analysis are difficult processes to accomplish.

Overview of the model approach
Movement databases capture and label the position of objects that are moving in space through time (Trajcevski et al., 2006). Thus, a movement database must have at least the variables that denote the time marker and the variables that express the spatial position of the entity at each moment in time. Additionally, in the case of solid waste the database should include the attributes relevant to the context of study, such as vehicle identification, speeds, truck capacity.
The first step is to establish a data structure (Figure 1). Define in an abstract way the tasks to be performed by the model.

Deployment of the model
The User-centered Analysis-Task driven model, called AVIMEU, presents an analysis framework organized in 7 components: database loading, classification panel, multivariate analysis, concurrency analysis, origin-destination, points of interest and itinerary analysis.
The data loading component allows for a selection according to whether the data is implicit or explicit, which is a problemdependent characteristic. For example, it is related to how the data were collected and the equipment available in the company or municipality that operates the service. The model automatically classifies the variables according to their nature (categorical, quantitative, temporal or spatial).
The second component of the model allows the user to modify variables, when not satisfied with the suggested automatic classification. In addition, it allows the user to make adjustments such as the removal of outliers or changes at his or her expert's judgement.
The multivariate analysis then presents the spatial coverage of the data, using the trajectories followed by the trucks during waste collection. Simultaneously, the quantitative variables are compared with radar charts. When a variable (attribute associated to the trajectories) is identified, descriptive statistical parameters are obtained at different time scales (year, month, days and hour), shown by a time-line chart. Thus, comparisons are made between sets of trajectories in terms of magnitude of the variable of interest, groups of trajectories with common characteristics are shown and trends are obtained. In the next step, a concurrency analysis is performed, whose objective is to determine the paths of the moving objects, which converge in the same place at the same time. The analysis is carried out by first showing all the trajectories without considering the time factor in order to delimit and focus on a precise analysis area. Then it is allowed to select the temporal window to be analyzed. The origin points of the paths are grouped by using the k-means clustering method. The results are analyzed by presenting scatter plot of co-occurrence ).
The next component consists of a process of transforming the data in order to construct an origin-destination (OD) analysis and a string diagram to assist in interpretation. The goal is to reveal the most frequent OD combinations and to reveal movement patterns.
This is followed by a phase of analysis of the points of interest based on high or low speed areas. First, the speeds of moving objects (trucks in this case) are established from the Haversine and Vincenty Ellipsoidal equations and a third calculation that considers geodesic distances. To analyze the movement of objects in terms of speed, spatial cluster analyses are performed to define nodes. Applying the equations, the average speeds are obtained and displayed by means of graphs with different colors and sizes. In this way, points of interest characterized by unusual speeds are determined.
Finally, a panel of tracks is proposed whose objective is to visualize paths, allowing filtering by dates and geographic scales. In addition, it allows contrasting trajectories in different color codes and to add new calculated variables.
The model was implemented in the statistical software R because of its ability to process complex data (Team, 2019) and it was complemented with the R Shiny library (Chang et al., 2019).

Description of the case study
In order to test the analysis capability of the AVIMEU model, to better understand a waste collection service, the city of Pasto was ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume VI-4/W2-2020, 2020 5th International Conference on Smart Data and Smart Cities, 30 September -2 October 2020, Nice, France selected. Pasto is a city in the southwest of Colombia and its urban area extends for 26.4 km. The estimated urban population is 365,651 inhabitants. The waste collection service is provided in the entire urban area and partially for the rural area, where according to data from the municipality, an average of 8,250 tons is collected monthly. The data used in this study is stocked in an unstructured text file. Each line of text corresponds to a specific record with 81 associated variables. It includes information on the geographical position, truck identification, time, date, speed tracking, truck odometer, truck operator, planned route (area to be covered, start and end time), effective start and end time, distance travelled, distance to base, distance with empty and loaded truck, effective working time, time in collection, amount of waste collected, fuel consumption (see complete list in annex 1). It is even possible to cross-reference information related to accidents and other events provided by the operator.
The use of the data in the model requires a pre-processing stage to adapt the format (Figure 4). A step is also required to associate the spatial data with the processes of the waste collection service, which was developed thanks to the collaboration of Veolia employees. These processes or collecting cycles are summarized in Figure 5. For this analysis, 27 routes were considered, with 5 trucks making the journey.

Using the model to better understand an urban waste collection service
In collaboration with the experts of Veolia company, a series of questions related to the performance of the service were formulated. The questions arise from real problems in the waste collection service. Using personal interviews with user-centered, task-driven approach and, based on the typology of abstract visualization tasks, (WHAT-WHY-HOW) (Brehmer and Munzner 2013), we identify and associate elementary visualization tasks (as Present, Explore, Lookup, Identify, Compare). Thus, tasks proposed were introduced into the model, as they can be associated with one (or several) of the analysis components of the model (see Table 1). Figure 5. Waste collection service processes identified for itinerary analysis. The model was then evaluated in relation to the ability to provide relevant information to respond to the request. Therefore, it is a test aimed at helping the operators or experts of the urban collection service. Additionally, it was proposed to evaluate the potential for using the model to look for relationships between the waste collection service and the city structure. The latter is proposed in the sense of an exploratory exercise.
The road network of Pasto urban area extracted from open street maps was used. Subsequently, an indicator was selected to locate the most important intersections in the network, and then compared with the result of the AVIMEU model for the same locations.
The betweenness centrality is a simple indicator that is useful for locating points of convergence of the flow of a network from a static point of view. For an intersection (or node) in the network, it is defined as the sum of all shortest paths that pass through the intersection, calculated from and to all nodes of the network (Kirkley et al., 2018). In the following section, the applicability of the data-based approach and the model for detecting anomalous or relevant patterns in the waste collection service are discussed.

Results and analysis
The main exploratory results of the AVIMEU model, obtained through the interaction with the waste service operator experts, are shown in the figure 6. Figure 6a shows an overview of all trajectories and identifies paths that are completely outside the coverage area. These paths are explained by the fact that other municipalities were exceptionally served (e.g. Municipality of El Encanto). However, the employed approach allows an easy identification of abnormal trajectories and thus a particular attention to these cases can carried on an individual level.
The multivariate analysis component made possible to identify two trucks with different efficiencies in relation to service planning. The diagrams in Figure 6b show that both trucks are on average 1.5 hours behind schedule, also a delay in returning to headquarters. The problem does not occur systematically on every day of the week, but it is a major effect. This analysis eventually allows operators to initiate a field inspection to determine the factors that are impacting the route performances.
Following the identification of non-compliant vehicles, Figure 6b shows that such (abnormal) trucks are collecting less than 12 tons on the road, which is a low value respect to the overall average. The other indicators that can be extracted from the model with anomalous values are the amount of waste collected per kilometer of route, the number of compaction processes, and the number of trips per month. For example, abnormal trucks make 13 and 15 trips per month versus 20 trips for other trucks. Such a description is useful for operators to take actions and make corrective actions.
According to Figure 6d, the time of arrival at the disposal site is most often at 8:00 am and 7:00 pm. In addition, the peak days are Fridays, Saturdays and Sundays. Such an outcome is important because it allows planning measures to be taken to prevent congestion from causing delays in service and negative impact.
The analysis of points of interest in Figure 6e allowed the identification of two zones with high speed intervals. Additionally, a third zone is observed when data is filtered to plot only the morning. This visual analysis allows us to focus on the Mapachico area where these events occur.
In addition, Figure 5f shows the sites whose speed exceeds the critical value of 4 km/h. For example, the SVR158 truck on the path called MP01R4 reaches a speed of 6 km/h. A detailed analysis should be conducted. Therefore, the model is meeting the objective of detecting service situations that are not easily observable without a data-based approach.
In order to explore if the visual and interactive data analysis approach can be used to compare structure of the city and behavior of trucks, a trial was conducted using data from the period 1 May to 31 May 2019. Figure 7 shows the central points of the road network of the city of Pasto, calculated from central location of the nodes. The sites located in the upper quartile are shown. The area marked within a red rectangle in the figure 7 is interesting due to its central role in the road network. Concurrency analyses were carried out to explore if there is indeed, from the data-based perspective, concurrence in these locations, to characterize the moment in which the events occur and if there are similarities in the origin of the trajectories.
The concurrency graph for case B suggests that, for the period and area of study, the trajectories of trucks with the same origin (same color) are converging mainly at 10 a.m. However, the zone A does not show relevant co-occurrences for the same period.
Even though the analysis was performed with a small data window, the tool shows that there is a difference between the centrality calculated from the road network (static analysis and path-based global measure of trajectories) and the convergence of trajectories from the spatial-temporal data of the waste collection trucks.

CONCLUSIONS
Movement data requires complex and time-consuming preparation and analysis steps in order to generate knowledge. The proposed model, based on visual analytics, serves both as a guide and as a tool for the experts to perform the data exploration.
The model, since it is built with the statistical analysis software R, becomes a low-cost alternative that requires few steps for installation. An important aspect of the model is that it provides flexibility in terms of the format and types of variables that can be treated. This feature is important because it allows to use the model in different contexts and case studies. In addition, the implementation of new analyses in the model on demand is a possible task, which gives value to the model.
The five steps of analysis proposed by the AVIMEU model can guide the user in organizing the process of data examination. As shown, through a multi-criteria analysis it is possible to identify trucks that operated outside the assigned sector, to know the units that are not complying with schedules, and to obtain indicators on the efficiency to execute the collection task. As an assistance in the planning of the service, the concurrency analysis allows the detection of congestion events and thus the implementation of corrective actions. Also, the analysis of points of interest allows to anticipate risks by evaluating high speeds in specific areas of the city.
Regarding the structure of the AVIMEU model, the analyses are oriented in such a way that they provide practical information to the agents or experts of the service. However, in future developments the filtering schemes should be improved, and the tool will be made more intuitive. This would improve end-user acceptance to incorporate it as a standard tool.
In relation to the method, temporal changes are better understood through segregation into various time gradients (dates and hours). However, representations should be improved, and alternatives explored that reduce cognitive strains for the final user.