ASSESSING THE RELIABILITY OF CONTRIBUTORS IN VGI USING IMPLICIT FACTORS

: VGI projects include geographic information, which are the product of many unorganized volunteers, making it a challenge to ensure the quality of their information. In this field of study, several researchers have suggested using intrinsic factors to evaluate the quality of VGI instead of using explicit methods such as comparing with real or reference datasets. In addition, the measurement of the reliability of VGI contributors as an essential intrinsic factor in determining the credibility of their contributions remains an open question. Various types of contributors’ activities and interactions are introduced and discussed in detail in this study at first. Then a comprehensive spatio-temporal contributor reliability model is proposed to assess their performance based on multiple implicit interactions between volunteers in their contribution process. Finally, several cities with different contribution rate (based on their population, number of users and area extent) are chosen and the proposed model is applied to the VGI data of selected regions, finally the results are compared and discussed.


INTRODUCTION
Volunteered geographic information (VGI) (Goodchild, 2007) is recently served as a growing source that refers to geographic data to be created, modified, or removed within a map by volunteers.However, the quality of VGI is questionable because it is heavily user-based and relies on community efforts.The community consists of non-professional users contributing to VGI with no pre-defined standards (Mohammadi and Sedaghat, 2021).Thus, contributor analysis provides a better overview of user-generated data and cannot be ignored (Rehrl et al., 2013).
There are a lot of activities or interactions between users in VGI that indirectly reflect their peers' feedback and evaluation relations.These implicit relations can be achieve for determining the user's reliability score, which changes over the contribution process time.This score is related to some factors, for example the user's previous performance, personal information; and feedbacks that receives from others (Yang et al., 2013).
Several papers suggest various models determine the reliability of VGI users from a diverse point of view (Bishr and Janowicz, 2010;D'Antonio et al., 2014;Lodigiani and Melchiori, 2016;Zhou and Zhao, 2016;Fogliaroni et al., 2018;Muttaqien et al., 2018;Zhang et al., 2021).However, they have gaps and do not take into account all effective parameters on the contributor's reliability level.
This research attempts to fill the gaps where computing the contributors' reliability level has been ignored by the researches of the community in VGI projects like OSM.To achieve this aim, a contributor's reliability model is suggested based on effective parameters that can be used to intrinsically assess the VGI quality.The proposed model is implemented to the regions with different numbers of users and populations to illustrate its scope of application.
The remaining of this document is organized in the following way.Section 2 introduces the different types of contribution processes in detail, and then a comprehensive reliability model for VGI users is presented.The first part of section 3 introduces the selected study areas and their datasets for experimental aims.The suggested model is implemented to the desired data in section 3.2.After that, in section 3.3, the results of the suggested model are evaluated and discussed in detail and finally, the conclusion is presented.

Various types of OSM activities
OSM is among the most popular VGI projects and has a vast community with over 8.3 million registered volunteers worldwide in 2022 (OpenStreetMap Wiki stats website).It enables its users to collaborate and modify the available data.The OSM dataset is the result of several interactions among its users, which implicitly reflect some feedback about them.
Each feature can be added to the OSM dataset with three shapes: node (for point features), way (for linear and areal features), or relation.In addition, some tags can be attached to them to express the thematic information.
When an individual registers in the OSM as a user.He/she observes a real world feature and its neighbourhood and wants to contribute about that feature to the OSM database.The different types of activities which the contributor can do are: Creation: Adding a new object to the OSM dataset.
Modification: Editing an existing object which is created or modified before.
Deletion: Removing an existing object from the OSM dataset.It happens for some reasons: that object no longer exists in the reality, it is false, or its user is vicious (Fogliaroni et al., 2018).
Confirmation: making some contribution to the neighbourhood of an object without changing it (Keßler and De Groot, 2013).

VGI User's Reliability Model
In VGI and other crowdsourcing projects, most customers do not know contributors and how they produce content.Therefore they must rely on contributors' reliability level, representing their trust.In other words, a higher reliability level can mean a higher degree of confidence (Hendrikx et al., 2015).As Tavakolifard and Almeroth (2012) mentioned, reliability level is related to a user's behaviour based on available information about their past operations.Therefore, the various interactions and rating relations between contributors in the VGI community can determine contributors' reliability levels.Generally, the measures and interactions which have effects on the reliability level of OSM contributors are:

User's Personal information (Yang et al., 2013)
The personal information of contributors is available when they complete the profile at the registration time.For example, gender, age, cell phone number, address, education level, field of study, and e-mail.Such information may assist in understanding the user's behaviour and skill level (Zhang et al., 2021).Further, fake profiles can be recognized by evaluating this information with machine learning techniques (Xiao et al., 2015).Furthermore, the contributor's profile can be used as the initial value for the reliability level.However, The OSM does not disclose this information such as education level and e-mail addresses for privacy reasons.Such information may assist in understanding the behaviour and skill level of the contributor (Zhang et al., 2021).Also, fake profiles can be recognized by evaluating this information with machine learning techniques (Xiao et al., 2015).Furthermore, the contributor's profile can be used as the initial value for the reliability level.However, The OSM does not disclose this information for privacy reasons.

Type of contribution (Fogliaroni et al., 2018)
As mentioned in section 2.1, users make various types of contributions.Each type of activity affects the contributor's reliability level.For example, in creation, the contributor tries to complete the dataset by adding new objects.Consequently, this contribution positively impacts user reliability and should be rewarded.When the next contributor modifies a version, it means that this version is not completely correct and requires some edition.As a result, modification harms the reliability level score of the user.

2.2.3
The similarity of an object versions (Zhou and Zhao, 2016) The effect of modification on the reliability level score of the user is based on the amounts of the changes (e.g.spatial or thematic similarity) done by the following user.For example, a major change shows a high reduction in reliability level.

Stability or time duration of each version (Severinsen et al., 2019)
Time between two consecutive versions of a feature also affects the previous user's reliability level.A longer time duration can show that a number of contributors see the latest version in this period and don't do any modification on it, so they implicitly confirm it (Severinsen et al., 2019).Indeed, this sentences is true in a specific extent of time (e.g., two years) as the content of information will be outdated after a time duration (Gusmini et al., 2017).

2.2.5
The number of modifications after each version of a feature (Keßler and De Groot, 2013) The number of edits and modifications on a version after it is created affects the reliability level of its user.So, the high number of changes has negative impacts.(Keßler and De Groot, 2013).

The number of vicinity confirmations (Keßler and De Groot, 2013)
Implicit confirmations from the contributors who collaborate in the neighbourhood of an object positively influence its contributor's reliability level (Keßler and De Groot, 2013).
The suggested algorithm of how these measures are combined to compute the users' reliability levels is illustrated in Figure 1.Whenever each activity is done in the dataset of OSM, the associated user's reliability level is updated as this algorithm.In this algorithm, L is a time-ordered list of interactions.R0, Ru, and R ' u are initial, current, and new reliability levels of the user u, respectively.The coefficients wc, wv, and wn are the reward of creating, the negative effect of later modifications, and confirmation of adjacent users, respectively.

Study Regions and Dataset
All of the activities in the OSM database associated with each user affect that user's reliability level.In addition, the users are allowed to collaborate in OSM globally.Thus, the contributors' reliability model must consider all changes made to OSM data globally.However, as such data is massive, in this study the effectiveness of the suggested model is evaluated in several regions with different rates of contribution.Therefore, ten cities are chosen from various areas of Iran.It is tried to select cities with diverse populations and built-up area extents at the time of the selection procedure.Figure 2 illustrates the location of the ten chosen regions in Iran.Table 1 summarises the cities' names, area extent, the population of each region, and their population density (i.e.population/area) information.
From the Planet web page (https://planet.openstreetmap.org),the complete OSM history data in .pbfformat with a size of 180 GB is downloaded in January 2022.OSM objects of each selected cities have been extracted.Each element has some properties, e.g., OSMID, username, user ID, timestamp, version, and some user-defined tags.
To obtain an overview of the state of the OSM dataset in the target cities, the number of contributions and unique OSM users in each city are shown in Figure 3 and Figure 4   Although the amount of population in a city may represent the number of contributors, a larger population is not a direct reflection of a greater amounts of activities (Mashhadi et al., 2015).Another parameters like extent of region and active users' number in addition to the population indicate the city's contribution rate.Therefore the number of users per population density can reflect the level of activity of a community within a region (Neis Pascal et al., 2013).In addition, the population density reduces the effect of the area size of the city on the results.Figure 5 shows users' number per population density in each city.This figure indicates that the highest value is related to the Tehran.In general, three different groups may exist:  The first group involves the cities with the most significant values of contribution rates, like Tehran in this study.


The second group consists of cities with average rate values, such as Esfahan and Tabriz.


The last group covers all other cities with the lowest rate values.
In order to avoid repetition and to evaluate the suggested model, three different regions are chosen from the above groups (Tehran, Tabriz, and Zahedan) as experimental data in this study.
Figure 5.The users' number per population density in cities.

Implementation details and results
The model suggested can be employed as an online contributors' reliability model for VGI projects.In this study, to determine the reliability level of each user, the parameters wc (the reward of creation), wv (punishment of versioning), and wn (the reward of confirmation from vicinity) are set respectively to the values 0.1, 0.01, and 0.15.
Measuring the similarity of two objects is also a common problem in GIS.There have been several studies on this issue that suggested a variety of methods to calculate the spatial or thematic similarity.(e.g.(Arkin et al., 1991;Masuyama, 2006;Fan et al., 2014)).However, in this paper, some existing methods are used to compute the similarity of two consecutive versions to complement this model.
The reliability levels of each contributor in the three cities of Tehran, Tabriz, and Zahedan are computed and updated in each iteration based on the proposed model.All contributors receive their reliability level according to the associated interactions.
Finally, to make it easier to understand, the reliability level values are normalized within the range [0,1], and the contributors are divided into ten groups based on their normalized values at identical intervals.Table 2 shows ten groups and their intervals, the proportion of contributions, and users in each category in three study regions.From Table 2, some information can be extracted.Most of the users (more than 95%) belong to the first group with a reliability level of 0-0.1.It demonstrates that most users are not too active and make few contributions to the OSM dataset or are judged negatively by others.On the other hand, only a few users take highreliability levels in the study regions.Table 2.The percentage of the selected cities users and their contributions in each group

Discussion
It can be concluded from the results shown in Table 2 that only a little number of OSM users take a high value of reliability level.For example: Only one user in Tehran with username "kiaraSh-Q" and userID 2693232 takes the maximum reliability level value and belongs to group 10.Furthermore, two contributors with usernames, Ali Behzadian Nejad (ID: 671793) and Kesler (ID: 13908), are in group 9 with 0.8-0.9intervals.These three volunteers contribute nearly 25% of the OSM features in this city.
In Tabriz, group 10 with the highest reliability level, consists of one user with username "Khalil_hz" and 1729935 ID.He contributes nearly 11.4% of OSM features of Tabriz.
The contributor Kesler (ID: 13908) is the only one with a highreliability level with a contribution near 21.8% of all features in Zahedan.
Neis (Neis P, 2015) provides the HDYC webpage (https://hdyc.neis-one.org),which presents detailed and comprehensive information on how a volunteer contributes to the OSM on a global scale.It means that the active users in a specific region may also be active and make some contributions in other areas worldwide.The properties of the four users who are very active in the three study areas are captured from the HDYC webpage and mentioned in the Table 3.Furthermore, randomly three users are chosen from the group one (with the lowest reliability level values) to present their properties in The suggested model is implemented to the selected cities Tehran, Tabriz, and Zahedan OSM data with various populations, numbers of users, and areas extent.A comparison of the results in these three regions shows that in the region with a small value for the number of users per density of population, the probability of the existence of contributors in the various groups with higher reliability levels is little.It may be because of the fewer activities of contributors.For instance, in Zahedan, the contributors are classified into only four groups that three of them have a low-reliability level.Generally, this leads to the conclusion that the higher contribution rate in an area shows a larger the variety of contributors in different categories.

CONCLUSION
VGI projects, through the collection and publication of geographical information worldwide, can be a source of valuable and helpful content for various services and applications.However, estimating the quality of this content remains a questionable and open research problem.Because of the nature of this data and some restrictions for actual datasets, using the intrinsic indicators is suggested.Assessing the reliability of contributors and identifying their levels can be a critical intrinsic factor for verifying the quality.Thus, it is essential to understand how each contributor behaves and gives feedback from others in VGI.At first, the contribution process types are explained in detail in the present study, and effective user interactions are described as implicit and indirect evaluation relationships.These relationships can show each user's performance as negative or positive feedback and can calculate their reliability levels via the suggested model.The proposed reliability model may lead to new insights into the quality assessment.The reliability level of the users in Tehran, Tabriz, and Zahedan are computed using the suggested model, and the obtained results are discussed.They show that the reliability level of most of the users (more than 95%) is a small value.This means that most users are less active or achieve negative judge from others.Furthermore, few users contribute most of the features and take higher reliability levels.Anyway, in this study, the authors attempt to propose a comprehensive contributor reliability model, consider most of the temporal and spatial relations in the database, and eliminate the other research gaps.

Figure 1 .
Figure 1.The algorithm of the suggested contributor's reliability model

Figure 2 .Figure 3 .
Figure 2. Location of ten chosen areas in Iran

Figure 4 .
Figure 4.The number of unique users in each city.

Table 1 .
, respectively.Population and area information of selected regions extracted from https://www.amar.org.ir/english

Table 3 .
Table3from this website.These table contents show the consistency between the results of the suggested model and the extracted information from the HDYC webpage.Summary of users' properties extracted from the HDYC webpage.