EVENT DETECTION USING MOBILE PHONE MASS GPS DATA AND THEIR RELIAVILITY VERIFICATION BY DMSP/OLS NIGHT LIGHT IMAGE

In this study, we developed a method to detect sudden population concentration on a certain day and area, that is, an “Event,” all over Japan in 2012 using mass GPS data provided from mobile phone users. First, stay locations of all phone users were detected using existing methods. Second, areas and days where Events occurred were detected by aggregation of mass stay locations into 1-km-square grid polygons. Finally, the proposed method could detect Events with an especially large number of visitors in the year by removing the influences of Events that occurred continuously throughout the year. In addition, we demonstrated reasonable reliability of the proposed Event detection method by comparing the results of Event detection with light intensities obtained from the night light images from the DMSP/OLS night light images. Our method can detect not only positive events such as festivals but also negative events such as natural disasters and road accidents. These results are expected to support policy development of urban planning, disaster prevention, and transportation management. * Corresponding author. aki@iis.u-tokyo.ac.jp; Tel/Fax: +81 3-5452-(6417/6414); http://shiba.iis.u-tokyo.ac.jp/member/akiyama/


INTRODUCTION
Monitoring of momentarily changing population distribution, that is, "dynamic population," is a very significant task for urban planning, development, and management. In particular, in the policy development of transportation planning, disaster management, crime prevention, and so on, in urban areas, monitoring of population distributions that do not always conform to census population distribution, that is, a sudden population concentration on a certain day and area (Dodson and Gleeson, 2009), is very important. In this study, we call such a population concentration an "Event".

Previous Studies
Previous studies have attempted monitoring of dynamic populations by estimation using models based on daytime and nighttime populations from census data and collecting actual information through field and questionnaire surveys (Levin, 1976;Duncan and Kalton, 1987). However, they could not monitor dynamic populations adequately because collectable data are spatially and temporally restrictive. To counter this problem, in recent years, utilization of mobile phone GPS log data (hereafter, "GPS data") has attracted attention. Previous studies attempted monitoring of people movements in broad areas using GPS data (McKercher et al., 2012). In addition, continuous monitoring of mass people movement and distribution in broad areas, e.g., on a nationwide scale, has become possible in recent years using mass GPS data provided from mobile phone carriers (Calabrese et al., 2011;Sekimoto et al., 2011). Using such large data, we expect to detect Events.

Objective
This paper has two objectives. The first objective is to develop a method to detect the days and areas of Events all over Japan in 2012 using mass GPS log data obtained from phone users who permit using their information collected through mobile phone carriers. First, stay locations of all users were extracted from mass GPS data. Second, all stay locations were divided into locations according to whether each user stayed there frequently, in other words, "rare locations" based on the annual movement history of each user. Finally, Events were detected by aggregating rare locations of all users into grid polygons and determining the grids that contain many rare locations. How can the reliability of our Event detection method be verified? It is clearly unrealistic to manually check reliabilities of an enormous number of grids all over Japan because it is both labor and time intensive. Hence, the second objective is to check the reliability of the method by data fusion with satellite images of night time light intensity obtained from the Defense Meteorological Satellite Program's Operational Linescan System (DMSP/OLS) from the National Oceanic and Atmospheric Administration's National Geophysical Data Center (NOAA/NGDC) (hereafter, the "DMSP/OLS image"). Population distribution and human activities can be monitored based on light intensity using DMSP/OLS images (Christopher, 2010;Akiyama, 2012). Therefore, in this study, we attempt to verify the reliability of the detection method of Events all over Japan by comparing the spatial distribution of Event grids in 2012 with the light intensity obtained from the DMSP/OLS images in 2012. Light intensity is expected to be stronger in grids with constant human activities, e.g., high-density urban areas, and with many Events throughout the year, e.g., popular sightseeing spots and amusement parks, compared with that in grids with a limited number of Events in the year. This study verifies the reliability of the Event detection method by comparing with the light intensity of the DMSP/OLS images in consideration of annual Event frequencies.
Mobile phones are rapidly spreading in recent years not only in developed countries but also in developing countries. Richter (2013) predicts over 5 billion mobile phone users worldwide. Therefore, accomplishments of this study's objectives and methodologies are expected to be very valuable from the perspective of support for urban planning, development, and management worldwide in the near future.

DATA DEVELOPMENT
This study compares the GPS data with DMSP/OLS images for 2012 by aggregating them into the same spatial unit, which is an approximately 1-km-square grid polygon.

Study Area and Aggregate Unit
The study area is the whole of Japan in 2012. An aggregate unit is the "Japanese standard regional mesh," which is a 30arcsecomd latitude × 45-arcsecond longitude grid, an approximately 1-km-square grid ("Hyojun Chiiki Mesh" in Japanese; it is described the "Regional Mesh" hereafter). There are 395,503 grids over Japan. Many statistical grid data in Japan and the GPS data used in this study were aggregated into the Regional Mesh. The "Regional Mesh code" is defined for each Regional Mesh. The Regional Mesh code of a certain point can be calculated when the longitude and latitude of the point are known.

GPS Data in This Study:
This study uses aggregated data of mobile phone GPS logs in 365 days in 2012, called the "Congestion Analysis®" provided by ZENRIN DataCom Co. LTD. The source data of the Congestion Analysis® are disaggregated data of mass people flows collected by the auto-GPS function on mobile phones of NTT Docomo. The data was obtained from phone users who permit to be used their information collected by mobile phone carriers. These data contain only measurement time and location and person ID, which do not reveal any information about an individual. Because of this security processing, the movement history of a specific individual cannot be monitored. This is a large database that contains about 1.5 million users constructed from text data of approximately 9 billion records.

Detection of Stay Location from GPS Data:
First, the stay location of each user was calculated and classified into home location, work location, use of public transportation, and others, i.e., stay for shopping, eating, or sightseeing, as shown in Figure  1. Methods by Horanont (2010), Horanont et al. (2013), and Akiyama et al. (2014) were used for these calculations.

Detection of Stay Location with Events:
Stay locations with Events that need to be detected are areas where "sudden population concentration occurs in a certain day and area". First, stay locations denoted by green triangles in Figure 1 as "Other stay locations" were extracted as candidates of stay locations with Events. This is because mobile phone users stayed at home locations and work locations and used public transportation daily and frequently. However, the Other stay locations also contain daily shopping, eating, and drinking. Without removals of these stay points, stay locations with Events cannot be detected accurately. Therefore, we detected stay locations with Events by calculating Regional Mesh codes of all the Other stay locations in 365 days in 2012 and removing stay points located in the Regional Meshes that contain multiple Other stay locations. In the case that certain user stayed multiple times in the same Regional Mesh in one day, it counts as one stay in one day. Figure 2 shows this calculation process. Stay locations with Events of all 1.5 million users were extracted using this process. Figure 3 shows the number of stay locations, in other words, the number of visits in each Regional Mesh for a user who has the most number of stay locations in 2012. This user stayed in areas of the Regional Meshes with multiple numbers of visits for daily shopping, eating, and drinking. On the other hand, this user visited areas of Regional Meshes with only one visit because of Events.

2.2.4
Detection of Event Area: The number of users for the purpose of visit to Events in each Regional Mesh in the 365 days is calculated by aggregating all stay locations with Events for all users, extracted by the process explained in the previous section, in the Regional Mesh. Events are expected to occur in Regional Meshes where this number is large. Therefore, Regional Meshes where this number is very large were defined as the "Event Mesh"; this is calculated by the outliers of all Regional Meshes using the Smirnoff-Grubbs test (significant level P = 0.05) based on the number of users for the purpose of visit to Events in the 365 days. Figure 4 shows the number of Regional Meshes with Events by the number of visiting users with Events on May 5, on which the number of Regional Meshes with Events in 2012 was the largest. The number of visiting users in almost all Meshes is less; the number of visiting users in 19,404 Meshes (35.93%) is 1 and that in 50,587 Meshes (93.68%) is less than 20. On the other hand, 509 Meshes are over the outlier threshold. Here, we define these Meshes over the outlier threshold as the Event Mesh. Figure 5 shows the spatio-temporal distribution of Event Meshes in a week from April 29 to May 5 in 2012 around the Tokyo metropolitan area obtained by the space-time cube. In the spacetime cube, spatial information is expressed by a 2D plane and temporal information is expressed by the third dimension, e.g., height (Kraak, 2003). For example, in Figure 5, Mesh A contains the Tokyo Disney Sea, one of the most famous amusement parks in Japan. Clearly, there are Events on each day in this Mesh. On the other hand, Mesh B contains the largest colossal Buddha in Japan, called "Ushiku Daibutsu," around which events for children are held. There is only one Event in May 5, 2012 at Mesh B because May 5 is Children's day in Japan and various events related to children are held all over Japan.

Data Processing for DMSP/OLS Image
This section introduces only the resampling method of DMSP/OLS images into the Regional Mesh. For details about DMSP/OLS images, please refer to Elvidge et al. (1997). In this study, we used the DMSP/OLS images of average visible, stable lights, and cloud free coverage in 2012. The light intensity values are expressed as 0 to 63. The pixel size of the DMSP/OLS image is 30 arcseconds, which does not match the Regional Mesh. Therefore, the light intensity of the DMSP/OLS image was resampled into the Regional Mesh by the following equation of Akiyama (2012). (1) where SRi is the area of the Regional Mesh i, SDki are the divided areas of DMSP/OLS pixels by the Regional Mesh i, Vki is the DMSP/OLS light intensity in each divided area of DMSP/OLS pixels, and NVi is the resampled light intensity of the Regional Mesh i. Figure 6 shows the resampled light intensities over Japan in 2012. In addition, Figure 7 shows the comparison of resampled light intensities with the pre-resampled light intensities at the centroid of each Regional Mesh. The result shows that the original quality of the data was maintained after resampling.

RELIAVIRITY VERIFICATION
As shown in previous studies, there were brisk daily human activities throughout the year in the Regional Mesh, where the light intensities from the DMPS/OLS image are strong. It is also expected that there were many "Events" focused on this study throughout the year. On the other hand, we assumed that light intensities are weak in the Meshes where many people gathered only when Events occurred. It seems that this tendency is especially clear in suburban and rural areas. In other words, our method can detect Events when the light intensity of the Event Mesh is weak.

Comparison of Annual Number of Event Visitor with Light Intensity
First, we compared the annual total number of Event visitors in each Event Mesh developed in Section 2.2.4 with the resampled light intensity developed in Section 2.3. The result shows that 5,053 Regional Meshes contained at least one Event day in 2012. Figure 8 shows the spatial distribution in the Regional Mesh of both the annual total number of Event visitors and the resampled light intensities in the Tokyo metropolitan area. In addition, Figure 9 shows the results of the comparison of these two datasets. These results indicate a slightly weak positive correlation between them, which is contrary to our aforementioned hypothesis. However, this is a natural result because it seems that many Events were held continuously throughout the year in urban areas, where the light intensity was strong. Table 1 shows the locations of top 10 Regional Meshes of the annual total number of Event visitors and their resampled light intensities. The 1-, 2-, 3-, and 8-positions are famous amusement parks in Japan. In addition, the 4-, 6-, 9-, and 10positions are very popular sightseeing spots of Japan. The 5position is the Tokyo Sky Tree, which is the world's tallest broadcasting tower opened on May 22, 2012. It is a new sightseeing spot that has attractions such as observatory facilities, large-scale shopping facilities, and an aquarium.  These areas are famous sightseeing spots of Japan and continuously attracted people who visited only once a year. In addition, light intensities of Regional Meshes containing these areas are very strong because major Japanese cities are located in these areas. In fact, Regional Mesh A in Figure 4 is the top position Mesh in Table 1.

Elimination of Annual Event Influence and Event Detection
Next, we considered a method to eliminate influences of Events that occurred continuously throughout the year. Figure 10 shows examples of Event Meshes that have characteristic temporal transition of Event visitors. The top figure shows the result of the Mesh containing the Tokyo Disney Sea, shown in Figure 5 and Table 1. It indicates that Event visitors visited continuously throughout the year with repeated and periodic change showing increased numbers on weekends and decreased numbers on weekdays. The center figure shows the result of the Mesh containing part of Daisen City, Akita prefecture. August 26 is a day with extremely large number of Event visitors, while there were no Event visitors on other days. This is because this Mesh contains the site of famous fireworks display, known as the "Omagari National Japan Fireworks Competition," which attracts many tourists; the fireworks display was held on that day. In addition, the result shown in the bottom figure is the Mesh containing the Izumo Taisha Shrine, which is one of the oldest and largest shrines and many Japanese visit on the New Year and Golden Week holidays, which fall between the end of April and beginning of May. The event visitors increased from September to November in this Mesh because the weather condition during this period was mild and it can be expected that Japanese visit the shrine in this period because they wish to receive the good luck of Japanese Gods as they believe the Gods gather the shrine in October. The Event that we want to detect is "sudden population concentration in a certain day and area." For this detection, we need to consider an elimination method of annual event influences. Considering Figure 10, our objectives are to detect the day of fireworks display in Daisen City, i.e., August 26, and days with seasonal increase in Event visitors in the Izumo Taisha shrine and to ignore days with relatively small number of Event visitors. Therefore, this study defined an index for detection of days with Events, called an "event index", as given below.
( 2) where (vi)j is the number of Event visitors on date i in grid j and (Ei)j is the event index on date i in grid j. In addition, denominators of the right side of the equation indicate the total number of Event visitors in 2012 in grid j. Figure 11 shows event indices of three Regional Meshes shown in Figure 10. In the case of the Daisen City, the event index on August 26, when the fireworks display was held, is 1.0 and event indices on other days are 0. In addition, event indices in New Year and Golden Week holidays and the weekends in the autumn season are large for the Mesh of the Izumo Taisha shrine. On the other hand, event indices throughout the year are small in the Mesh of the Tokyo Disney Sea. Figure 12 shows a relationship between event indices and the number of Events throughout 2012, and Figure 13 shows a relationship between event indices and the number of Event Meshes. We can find the Meshes where the Event was held only one time in 2012 to extract the Event Meshes with an event index of 1.0. Moreover, higher number of Meshes is detected as Event Meshes by reducing the threshold of the event index. Figure 13 shows that the number of Meshes detected as Event Meshes increases rapidly when the threshold is set to less than 0.05. In addition, we can find Meshes detected as Event Meshes shown as area A in Figure 12 although there were many Events in the year. These are the Event Meshes on a specific day with many more visitors than other days, such as the New Year and Golden Week holidays at the Izumo Taisha shrine, as shown in Figures  10 and 11. For example, the day "a" in area A shows one of the busiest firework display at Sumida-ku, Tokyo, held on July 28, and the day "b" shows the Gion Festival held on July 15, which is one of Japan's three greatest festivals at Shijo-Karasuma district, located in the center of Kyoto city. The method can detect only Events with especially large number of visitors in the year even though they are famous sightseeing spots and there are many Events throughout the year.

Comparison of Number of Event Visitor by Difference of the Threshold with Light Intensity
Finally, we verify the reliability of Event detection by comparing the numbers of Event visitors of each Event Mesh with the light intensities from the DMSP/OLS images through the difference in the event index threshold. We have already built the following hypothesis. 1: There are many Events continuously throughout the year at Meshes with strong light intensity, while less numbers of Event visitors visit Meshes with weak light intensity and only when Events are held. From the results of Section 3.1, we can build the following hypotheses. 2: Light intensities become strong at Meshes where Events with many visitors are held and weak at Meshes where Events with few visitors are held. 3: Light intensities become strong at Meshes even with Events held only one time throughout the year if these Meshes are located in accessible areas such as urban areas. The number of visitors of such Events is expected to increase. Figure 14 shows 100% stacked bar charts for the differences in the number of Event visitors in each Event Mesh and the light intensity with respect to the differences from the event index threshold showing whether it is an Event or not. When the threshold is 1.0, that is, Meshes where an Event was held only once in the year, light intensities of all Meshes with the number of visitors over 1,000 are strong, i.e. over 50%. On the other hand, light intensities become weaker as the number of visitors decreases. This confirms hypothesis 3. Moreover, the rates of Meshes with strong light intensity increase especially in Meshes with large number of Event visitors, i.e., over 500. In contrast, the rates of Meshes with weak light intensity increase especially in Meshes with a small number of Event visitors, i.e., 100-500. This confirms hypothesis 2 that the light intensity of the Mesh with many visitors is strong and that with less visitors is weak. However, Meshes with strong light intensity increase for those with Event visitors between 0 and 100 when the threshold is 0.05. Our method thus could detect many small-scale events held in urban areas. Detection sensitivity of the Event rises to an excessively high level when the threshold is set to a very small value such as 0.05. Thus, over 0.1 is considered suitable for the threshold to detect the Event Mesh. In addition, this suggests that hypothesis 1 is not always correct especially in urban areas. Therefore, through our results, we have shown that the method of Event detection using the GPS data is reasonably reliable. In addition, we arrived at an appropriate threshold value of over 0.1. However, our method may detect Meshes without the Event Mesh of interest in this study, especially in urban areas, if the threshold is set to a small value because the detection sensitivity of the Event increases to a high level. Both results show the same tendency, i.e., the numbers of Events and Event visitors increased on weekends and decreased on weekdays. In addition, Events with particularly many visitors increased in the New Year, the Golden Week, and the summer holidays, when almost all students had holidays. Moreover, Events with the threshold 1.0, i.e., the Events that occurred only once a year with many visitors, were held especially in summer holidays. Moreover, some peaks are observed on the second and third weekends in April. This is because April is the beginning of a fiscal year in Japan and many welcome parties were held for freshmen who moved into urban areas from the countryside on these weekends. On the other hand, the result shows that both the numbers of Events and Event visitors decreased in seasons in which the weather conditions were severe such as the rainy season and winter season without New Year holidays. In addition, we found an interesting result that many Events were held although the number of Event visitors was not so large in autumn season. Many of such Events are small scale, i.e., the number of Event visitors is small, because students and young people do not have holidays in this season and those who enjoy the autumn color season are relatively elderly. Figure 16 shows the results of all Event Meshes. It shows the space-time cube for Event days when thresholds are 0. in region F. This is because many tourists were stuck due to the effect of a severe typhoon in this region and the days were holidays. In addition, a large-scale rest area on the highway, Ashigara Service Area on Tomei Expressway, is located in Mesh G. An extremely large number of Event visitors was noted on the first weekend of October, i.e., October 7 and 8, than other weekends, even though there were many Event visitors during the Golden Week and the weekends of summer holidays. This is because of the effect of a major road accident that involved six cars near the rest area on the night of October 7. Our method can detect not only the macro-scale information of the Events, such as collective Events that occur all over the nation (i.e., nationwide time-series monitoring of the numbers of Events and Event visitors) but also micro-scale information of Events (i.e., each Event location and day). In addition, it can detect not only "positive events" such as festivals and fireworks displays but also "negative events" such as natural disasters and road accidents. The information about occurrences of negative events is expected to be utilized for urban planning and traffic management if such information can be collected in the future and maps can be developed showing locations vulnerable to natural disasters and road accidents. In the future, we plan for further verification of the various types of Events that can be detected by our method.

CONCLUSION
This study developed a method for Event detection all over Japan in 2012 using mass GPS data provided from mobile phone carriers. Areas and days where Events occurred can be detected by aggregation of mass GPS data into 1-km-square grid polygons: the Regional Mesh. In addition, this method could detect Events with especially large number of visitors in the year by removing the influences of Events that occurred continuously throughout the year. By comparing these Event detection results with the light intensities obtained from the DMSP/OLS images, reasonable reliability of the proposed Event detection method was confirmed. However, Events that were not intended to be detected were also detected as the Event Mesh due to enhancement of detection sensitivity when the threshold was set to less than 0.1. Some planned future works are as follows. First, a method of setting the threshold will be developed. Since in this study, we set the same threshold throughout Japan, we plan to develop a method for flexible setting of threshold that depend on, e.g., urban and suburban areas and the number of visitors, and thus improve the Event detection sensitivity. Second, we plan to improve the mass GPS data processing method. This study calculates the number of Event visitors by the number of mobile phone users. However, they can be magnified into actual population by calculating the magnification coefficient of each user. We believe that such calculations can be realized because some previous studies have already estimated actual populations based on statistical samples; for example, Hoefer et al. (2011) and Akiyama et al. (2014) have already attempted to magnify mass GPS data into actual population. In addition, the method of reliability verification needs further improvement. We consider the following methods for this purpose. First, we plan to utilize VIIRS DNB Cloud Free Composites by NOAA (Elvidge et al., 2013) as verification data, because the image resolution from this source is 15 arcseconds, which is finer than DMSP/OLS images. Second, we plan to collect verification data without night light images. For example, we plan to archive past events information by web crawling and collection of SNS information (Sakaki et al., 2010) and compare with our event detection results.