REGIONAL DISPARITIES IN ONLINE MAP USER ACCESS VOLUME AND DETERMINING FACTORS

Abstract. The regional disparities of online map user access volume (use ‘user access volume’ in this paper to indicate briefly) is a topic of growing interest with the increment of popularity in public users, which helps to target the construction of geographic information services for different areas. At first place we statistically analysed the online map user access logs and quantified these regional access disparities on different scales. The results show that the volume of user access is decreasing from east to the west in China as a whole, while East China produces the most access volume; these cities are also the crucial economic and transport centres. Then Principal Component Regression (PCR) is applied to explore the regional disparities of user access volume. A determining model for Online Map access volume is proposed afterwards, which indicates that area scale is the primary determining factor for regional disparities, followed by public transport development level and public service development level. Other factors like user quality index and financial index have very limited influence on the user access volume. According to the study of regional disparities in user access volume, map providers can reasonably dispatch and allocate the data resources and service resources in each area and improve the operational efficiency of the Online Map server cluster.


INTRODUCTION
China has shown remarkable and unprecedented dynamics (Chen, A., Nijkamp, P., Tabuchi, T., & Dijk, J. V., 2014.), with the development and popularization of Internet technology, online map has become an indispensable tool application in people's daily life, which injected new vitality into the traditional geographical information industry.With the increasing number of online map users, the regional disparities in map access are catching scholars' attention.
After introducing the current situation on the research of regional disparities in the development of Internet, and the characteristics and access pattern that have been uncovered in online map users, this paper takes Map World as an example, statically analyzes its access log and demonstrate the regional disparities of user access volume.The result shows that there are obvious differences in the volume of users' access in different cities, provinces, and areas.With the method of Principal Component Regression, we established a regression model that explores the relationship between online map users' access volume and its determining factors.The proposed model shows that the regional scale is the most important determining factor for access volume, public transport development level and public service development level are subordinate factors, while the effect of population quality and GDP is inconspicuous.

Disparities of Internet Development
The financial development status, completeness of telecommunication infrastructure, urbanization level, and the stability degree of government have a significant impact on the development of the Internet (Hao, X., & Kay, C. S., 2004).
By analyzing the regional disparities of Internet usage rate, Beilock (Beilock, R., & Dimitrova, D. V., 2003) (Toyama, & Kentaro, 2016), understand the regional disparities and explore its determining factors provide us a way to better understand and advance the development of Internet services in different regions.As one of the Internet application, the development of online map also has some similar characteristics in regional disparities.Thus, the methods to explore the regional disparities of Internet development could also be applied to map access.

Online Map Users
Many consulting institutes and companies did research on the To further understand the regional characteristics users show when accessing Online Maps and explore the determining factors, we propose three study steps in this paper: first, describe the profiles of regional disparities with statistical indexes; then, explore the potential determining factors for regional disparities; finally, analyze the degree of influence of different factors by multivariate analysis.

VOLUME
The regional distribution of map users possesses certain characteristics, and the exploration of these macro features can help the map providers better understand the spatial distribution of their users, which matters for data center construction and server configuration.This research analyzed the characteristics of regional disparities that exist in Online Map user access.

Data Introduction
The data sample used in our study is the access logs of a public geospatial information service--Map World.The logs are from 1st Feb to 28th Feb in 2014.The access logs recorded the query information of users and the response status of servers.However, the original access logs need to be processed for there may be messy code, information gap, and some other problems when the server response abnormally.
The whole procedure includes data analysis, data preprocessed, data filtering, data verification, and data storage.
To investigate the regional distribution features of human users, it is necessary to get rid of the influence from machine users like web crawlers.The information recorded includes IP, access time, HTTP query type, query information (mainly about map tiles), response time and response status.

The Regional Disparities of Map World User Accesses
Through IP address resolution, we get users' geographical location and further analyze the regional distribution features of Online Map accesses.

Provincial Disparities of User Access Volume:
Where p = the number of variables, here p=14; =the correlation coefficient between   and   .
According to the calculation formula proposed by Karl Pearson (2) Where n = the number of sample data; ̅ = the mathematical expectation of   ; = the value of   for the k th city.
When   >0, there is positive relationship between   and   ; when   00,   is negatively correlated to   ; when   = 0,   is independent of   ; when |  | > 0.8 ,   are highly correlated with   ; when 0.3 < |  | 00.8,   has a moderate correlation with   .We can calculate the correlation between any variables, as shown in Table 2.
Where   = the contribution probability of component i,   = the cumulative contribution probability of .  .
Usually, when the cumulative contribution probability reaches 85% in the mth principal component [13] , we can take the first m principal components and abandon the residual component to reduce dimensionality, the result is shown in  Where Y = the city user access volume of online map.

Result Analysis
Table 7 shows the composition equation of three principal components.In  1 , all of the explanatory variables have high explanatory coefficients except transport capacity, GDP per capita, and tourist spots.So  1 can reflect city scale, city transport and service development level.In  2 , the GDP per capita reach the maximum, reflecting the influence of economy.
In  3 , the transport capacity and tourist spots have the highest explanatory coefficients and can reflect the circulation capacity of one city.Here is the calculating process of determination coefficient R 2 : A data set has n values marked   (i = 1,2, … , n), , ts , expecsaston,t ,  ̅ ,shen: (10)

CONCLUSION
Online Map services are playing a more and more important role in people's daily lives with the advancement of Internet maturity.
Analyzing the regional disparities of user access volume can help the map providers better understand the geographical distribution of their users and advance the dispatch and allocation the data resources and server resources.
At first this paper introduced the regional disparities that exist in the user access volume, then established regression model based on PCR to explore the influence of the economy, population, public transport development level and so on.
In future, we are going to verify the results we find in this study with more multivariate regression methods, make a comparison between them propose a determining model with higher prediction accuracy.

Fig. 1
Fig.1 shows the province ranking of the average daily access volume in February 2014.The graph demonstrates that the provinces with higher rankings are: Beijing, Guangdong, Shanghai, Shandong, Zhejiang, and Jiangsu; the provinces with lower rankings are: Tibet, Macao, Qinghai, Ningxia, and Hong Kong.From the perspective of user distribution, the east part of China has much more access volume than the middle part, while the middle part also surpasses the west part obviously.And users in the top six provinces produce more than half of the total access volume, while the 20 low-ranking provinces only take up 1/4 of the total access volume.Most provinces with remarkably high online map users also ranks high in economy and population.However, there are also some exceptions that show the existence of more determining factors for the access volume of a region besides economy and population.

Figure 1 .
Figure 1.Provincial distribution of user access volume

Figure 2 .
Figure 2. Zoning distribution of user access volume

Figure 3 .
Figure 3. Zoning distribution of user access volume user access volume The user access volumes of online maps have been taken from the statistics results of Map World access logs, including the average daily access volume of 28 key cities.The 14 explanatory variables are from the CHINA CITY STATISTICAL YEARBOOK and the thematic database from the official website of Map World.We first divided these 28 cities into modeling data and verifying data equally.The sample cities of modeling are: Beijing, Shanghai, Guangzhou, Hangzhou, Chengdu, Wenzhou, Ningbo, Wuhan, Jinan, Xian, Nanjing, Chongqing, Zhengzhou, and Wuxi, and the verifying cities are: Tianjin, Shenzhen, Suzhou, Qingdao, Lanzhou, Hefei, Fuzhou, Jiaxing, Taiyuan, Nanchang, Kunming, Changsha, Xiamen, Guiyang.Then, principal component regression (PCR) is used for data modeling to explore the relationship between user access volumes of online maps and determining factors.4.2 Determining Factor Analysis of User Access Volume Based on PCR Principal component regression is a method handling data with high-dimensional covariant.When the original variables are correlated, PCR can result in dimension reduction through substantially lowering the effective number.Here are the five steps when applying PCR model in our research: 1) Calculating the correlation coefficient matrix The first step of PCR is to check the collinearity between original variables and get the correlation coefficient matrix R: R = [ fitted value in the model  tot = the total sum of squares  reg = the regression sum of squares R2 = the determination coefficient, ranging from 0 to 1.The better the linear regression fits the data in comparison to the simple average, the closer the value of R2 is to 1.4.4.2 Model Evaluation:We calculate the qualified rate and determination coefficient for the PCR formula with the modeling data and the verifying data and get the following result shown in The Online Map users is male dominated, with the percentage of male reaches 68.4%, and 60% of the users have higher education; 72.8% of the users are within the age of 18-40.This demonstrates that the Online Map users in characteristics of Online Map users.According to a report published by CNIT-Research (China IT Research Center), the Online Map users shows features as below.1) From the perspective of users:

Table 1 .
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume IV-2/W4, 2017 ISPRS Geospatial Week 2017, 18-22 September 2017, Wuhan, China public transport development level, and public service development level as the five determining factors.14 variables in total are used to explain the user access volume of Online Map, as shown in Table 1.And the explanatory variables are expressed in set X={  ,   …  }.Determining factors and explanatory variables of

Table 2 .
Correlation coefficient matrix of determining factors of user access volume 2) Calculating the eigenvalue and eigenvector Establish the characteristic equation |λI − R| = 0 according to

Table 4 .
Contribution probability and cumulative contribution probability of each principal component of user access volumeAccording to Table4, when using 3 principal component, the cumulative contribution probability is already higher than 85%.= the jth component of the ith eigenvector   .And the principal component load matrix is shown in Table6.
Thus, we choose the first three principal component 1 ,  2 ,  3 as the final principal components.And Table5shows the variance of the original 14 variables after the extraction of the principal components.The loss degree of all variables is small except the GDP per capita factor and population quality factor, which demonstrate that the principal component well replaces the original variables   (i=1,2,…,14).

Table 8 .
The measured value refers to the actual user access volume of sampling cities, and the predicted value is the calculated value we get with the regression model.When the relative error is positive, the predicted value is bigger than the measured value.

Table 8 .
Relative error of sample data in pcr of user access

Table 9 .
Qualified rate and determination coefficient of pcr of ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume IV-2/W4, 2017 ISPRS Geospatial Week 2017, 18-22 September 2017, Wuhan, China user access volume As is shown in Table 9, when target error rate is 20%, the qualified rate of model is 100% no matter in modeling stage or verifying stage; when target error decreases to 10%, the qualified rate of PCR model is still 100% in modelling stage, but decreases to 86.7% in verifying stage; when the target error is 5%, the qualified rate in modeling stage and verifying stage become 93.4% and 80% respectively.The determination coefficients of these two stages are 0.990 and 0.456.And we can draw the conclusion that PCR shows high precision in the modeling stage and can explain the data well, but the precision is limited in the verifying stage compared with modeling stage.