USER EVALUATION OF INTERACTIVE THEMATIC 3D CITY MODELS – APPLICATION OF ASYNCHRONOUS REMOTE TESTING METHOD

: Asynchronous remote usability testing is a method based on a software platform used to automatically record test participants' activities when they interact with a given product in their natural environment, for example, at home. This method has been frequently used in previous decades in web design and mobile application development but has rarely been utilised in geovisualization. The importance of remote usability testing has rapidly increased in 2021 due to the COVID-19 pandemic. The 3DmoveR (3D Movement and Interaction Recorder) application was used for asynchronous remote testing presented in this paper. 3DmoveR is a research tool designed for user testing of interactive 3D visualizations in web browsers using open technologies such as PHP, JavaScript, and the Three.js library. This study focuses on an evaluation of interactive 3D city models presenting thematic information expressed by colour scale. An experiment was designed as a within-subject study consisting of two simple questionnaires, a training task and six experimental trials. Finding a building of a given category (depicted as building colour) within an interactive 3D city model was used as the experimental task. Speed and accuracy of user performances were recorded, as well as user strategy, subjective evaluations, and possible intervening variables. The results were recorded from 110 participants, where 76 of them were correct and analysed further. It can be concluded that the tested colour scale (based on the Energy Performance Certificate) was not entirely appropriate. We further analysed and discussed intervening variables that may affect remote usability testing of 3D visualizations.


INTRODUCTION
Recently, 3D city models have been used for decision-making and communication in a wide range of applications. The use of 3D city models in various areas is described by many authors, including Shiode (2000) and Biljecki et al. (2015). 3D city models can be applied to the analysis of the current situation, the reconstruction of past situations, the prediction of future developments and the choice of multiple options for future development (Konečný, 2011). 3D city models can be used by both experts and the general public (Voženílek, 2001;Biljecki et al., 2015). Regarding the user aspects of 3D city models, the main goal is to allow users to locate and interpret 3D urban geospatial information necessary to make assumptions quickly and easily. The fundamental challenge for the design, development and implementation of 3D visualizations is to avoid complexity, too much detail and overly dense visualizations (Carneiro, 2008). Some applications require the realistic visualization of 3D city models. Such applications are, for example, landscape and urban planning, including the participatory approach (Lovett et al., 2015;Onyimbi et al., 2018;Judge and Harrie, 2020;Jaalama et al., 2021). On the other hand, many applications of 3D city models are related to non-photorealistic visualization: • modelling and presenting energy demands of buildings (Wendel et al., 2016;Mao et al., 2020), • analysis of the solar energy potential (Hofierka and Zlocha, 2012;Buyuksalih et al., 2017), • noise mapping (Law et al., 2011;Herman and Řezník, 2015), • air pollution visualization (Hudson-Smith and Evans, 2003;San Jose et al., 2012), • meteorology and research of urban heat islands (Congote et al., 2012;Nakata-Osaki et al., 2015), In general, these applications usually require the visualization of thematic information within 3D models, so it is not just a matter of presenting the given site as realistically as possible. As stated by Döllner (2007), non-photorealism provides sufficient means for visual abstraction as a primary technique to effectively communicate complex geospatial information. Nonphotorealistic visualization also allows to implement traditional cartographic methods and techniques as part of interactive 3D geovisualizations (Döllner, 2007;Jahnke et al., 2009). In summary, for any application of 3D city models to be successful, it must serve the needs of the target user group; therefore, user testing is an important part of developing a successful application. This paper focuses on user testing of interactive 3D city models and describes a pilot user study employing interactive nonphotorealistic visualization of thematic information within the 3D city models. The remote usability testing approach was chosen for practical reasons. The results are presented with respect to the chosen approach and, therefore, the advantages and limits of remote user testing are also analysed and discussed.

RELATED WORK
Several scientific studies dealt with the usability of 3D city models, often using the static perspective views (e.g., Zanola, et al., 2009;Popelka and Doležalová, 2015) or video (e.g., Lokka and Çöltekin, 2017) as presented media (stimuli). The user aspects of interactive 3D geovisualization is an issue that has only slowly been attracting attention and relatively little is known about it. Where user testing dedicated to the evaluation of interactive 3D geovisualizations was performed, it usually took place in laboratory conditions (e.g., Mckenzie and Klippel, 2016;Herman et al., 2021) where the intervening (extraneous or nuisance) variables as well as performances of the participants can be better controlled. However, the situation concerning the COVID-19 pandemic in 2021 does not favour the implementation of laboratory user experiments and, for this reason, the possibilities of testing outside a controlled environment were analysed.

User testing of 3D city models
The investigated user aspects of 3D city models relate mainly to the purpose of these 3D models. User testing of 3D models is most often related to their use in orientation and navigation. For example, Mckenzie and Klippel (2016) investigated navigation using a virtual environment. Lokka and Çöltekin (2017) also investigated the influence of realism on navigation in a built-up area and they found that of the three variants compared, the most suitable is a 3D model containing realistic (tested) significant landmarks and non-photorealistically depicted other buildings. Popelka and Doležalová (2016) also examined nonphotorealistic visualization of cities, specifically dealing with static perspective bird-eye views and their comparison with 2D maps using eye tracking. Recommendations regarding photorealistic visualization are formulated by Gatzidis et al. (2009). They identified that non-photorealistic shading and especially expressive rendering could provide more effective visual styles than photorealistic representations of built-up areas.  carried out two partial experiments in VR investigating the role of colour hues and the level of realism. They consider these types of visualizations better compared to monochromatic and symbolized visualizations of 3D models. The second and broader application area, which is related to the presented user testing of 3D models of cities, consists in urban and regional planning. Zanola et al. (2009) focused on stereoscopic visualization and evaluated the suitability of abstract and realistic rendering styles for urban planning purposes. They found that users subjectively preferred realistic visualization as they considered it the most credible. Rautenbach et al. (2014) compared non-photorealistic 3D visualization with a 2D map and 3D realistic visualization of a city model in the scope of spatial planning in South Africa. The participants were able to solve simple map-reading tasks with 3D visualizations with the same accuracy as with 2D maps. Similarly, Onyimbi et al. (2018) compared 2D maps and plans with realistic 3D models in participatory spatial planning. In their user evaluation, higher accuracy was achieved when working with a 3D model. Colour is an important component of each visualization and the same is true also for the various applications of 3D visualizations of city models listed in the Introduction section. However, there are only a few user studies that dealt with colour in a 3D environment. As mentioned above, this issue was partially addressed by , who confirmed that the colour hue of landmarks had a significant effect on the success of navigation within a virtual 3D model, but colour hue did not have any effect on the speed of navigation. Individual colour hues are characterized by similar results related to their memorability. Engel et al. (2013) focused directly on the visualization of thematic data in a 3D model of a city. They compared different colour scales depicting extra-terrestrial insolation (summed up over a whole year) and demonstrated that diverging colour scale (redblue) is characterized by fewer errors in value estimations.
For the above-mentioned reasons, this paper deals with user testing of interactive 3D city models. Specifically, it focuses on an evaluation of 3D city models for the purpose of presenting thematic information expressed by a colour scale.

Remote usability testing
Remote usability testing is a method based on an insight platform to record test participants' activities when they interact with a given product in their natural environment (e.g., at home). On the one hand, remote usability testing has been frequently used in previous decades in web design (e.g., Tullis et al., 2002;Rosenbaum and Kantner, 2008;Chynal and Sobecki, 2014;Sauer et al., 2019) and mobile application evaluation (Takahashi and Nebe, 2019). On the other hand, this approach has only been used rarely in user evaluation of geovisualizations (Ingensand and Golay, 2011;Mendonça and Delazari, 2012). Roth et al. (2017) considered this method a 'potentially fruitful research opportunity' and its importance is rapidly increasing these days (especially in 2021), when there are many problems with conducting user evaluations in laboratory conditions. This paper deals with asynchronous (automated) remote testing. Automated usability testing can record users' interactions and collect users' opinions even from large numbers of participants, but it does not offer insights into the reasons behind the user's decisions (Chynal and Sobecki, 2014). Juřík et al. (2018) described a pilot application of asynchronous usability testing of a simple interactive 3D visualization of a digital terrain model (DTM). The tool supports eliminating specific interferences (e.g., pop-up menu in the browser, optionally excluding some controls such as keyboard) and monitoring of some intervening variables (e.g., web browser colour depth, screen resolution, the controls used).

MATERIALS AND METHODS
Considering the lack of studies and methodological recommendations related to both usability testing of interactive 3D visualization depicting thematic information in 3D city models and asynchronous remote usability testing, exploratory approach was chosen. The main goal of the research was to check correctness of answers and the speed of user responses, as well as the strategies used when solving tasks. The second goal of this research was to determine the limitations of asynchronous remote usability testing of interactive 3D visualizations, e.g., the completion rate.

Experimental design
The experiment was designed as a within-subject study. The testing was conducted in April and May 2021, when the participants were recruited through Facebook. The test consisted of one training task and six tasks with spatial data presented in a randomized order (see Figure 1). The user study also contained two short questionnaires. The first questionnaire was to be filled in before the spatial tasks and it investigated basic demographic data and the users' experience with maps and 3D visualizations. The second questionnaire evaluated the clarity of the task instructions, loading speed of the 3D models and the colour scales used; participants could also report technical problems and offer additional comments on the 3D visualizations used.

Testing tool
We used 3DmoveR (3D Movement and Interaction Recorder), which is an application for recording user interactions with 3D geovisualizations. It is based on a combination of screen logging approach with online questionnaire engaging practical spatial tasks. Open web technologies (JavaScript, jQuery, WebGL and PHP) are used to implement 3DmoveR. All recorded data concerning user interactions and responses were stored on a server and could be easily analysed later. Previously, this tool has been successfully used in several user studies conducted in controlled conditions (Herman and Stachoň, 2016;Hájek et al., 2018;Herman et al., 2018a;Herman et al., 2018b;. Modifications that guaranteed satisfactory operation even for the purposes of asynchronous testing were made during the development of version 2.0 (Herman, 2019), the first attempt at asynchronous remote testing was also made using this version of the tool (Juřík et al., 2018). Further requirements resulting into improvements, which have been incorporated into version 2.1, included the following: • a function for randomizing the order of trials to avoid learning effect, • identifying the size of the web browser window and determining whether the user used full screen mode, • recording browser history, which allows checking the correct test procedure, • monitoring of defined keys, used mainly to identify function keys (e.g., Shift, Ctrl, Escape, F5, F11), • storing the value of the total number of clicks made while solving the spatial task, • better arrangement of CSV (Comma Separated Value) files, i.e., the files in which the recorded data is stored on the server.
The 3DmoveR 2.1 application is freely available to anyone interested under a BSD (Berkeley Software Distribution) license.

Task and stimuli
In all six trials, the participants were asked to explore interactive 3D city model and find a building of a given category based on the Energy Performance Certificate (EPC). Categories of buildings are represented by the colours used in the scope of EPC (Decree No. 78/2013 Coll). The building found was marked by clicking when an arrow (blue cone) was placed on the selected structure. This task was designed to have just one correct solution; there was always one building of a given category in the 3D model. The colour scale had seven levels according to the EPC scheme (Figure 2), one category was searched, and the remaining six categories were randomly assigned to all other buildings in the 3D model. All trials were fully interactive, and participants could move freely using the 'orbit' scheme of movement, which consist of three types of movementsdrag (rotation), pan and zoom. Dragging rotates the view around the point of interest. Panning moves the view up, down, left, and right. Zooming moves the virtual camera forwards and backwards. The 3D models were of the same size (920 × 615 m; area of approximately 57 hectares) and consisted of a DTM covered by orthophoto as a texture and 3D models of buildings in level of detail 2. The following spatial data were used to create the 3D models: • The 3D models were prepared using open-source software, namely QGIS 3.16 with Qgis2threejs plug-in (Figure 3).

Figure 3.
Preparation of a 3D model using QGIS 3.16 and Qgis2threejs plug-in.

Analysis
The correctness of user responses (effectiveness), their speed (efficiency), the strategies used, intervening variables were analysed and subjective evaluation. Participants were both actively participating subjects (effectiveness and subjective evaluations) and target of observations through user logging (efficiency, used strategies and intervening variables). Similar to the study by Herman et al. (2021), we applied two methods to analyse user strategies when working with interactive 3D city models: • Interactive activity data obtained directly from user logs, especially: o Sum of mouse clicks: the total number of mouse clicks during task solution. o Length of virtual trajectory: overall length of the movement trajectory travelled during task solution (metres). • Interactive activity from the user logs divided by the length of time to solve the task. o Mouse clicks per second (number of mouse clicks per second). o Average speed of virtual movement (metres per second).
Intervening variables for which it was possible to record specific characteristics were collected automatically (see section 3.2). These were operating system, type of web browser, screen resolutions, colour depth, web browser window and geospatial data loading time. Subjective evaluations were collected using Likert scales and voidable textual comments on the clarity of the assignment, the speed of 3D data loading, and the colour scale used.

Participants
For a total of 110 attempts to complete the test, there were 27 failures to finish (most often, the participants stopped participating after the training task). In other six cases, the participants repeatedly completed one of the trials (they returned to it in the browser history). These records are not considered valid, so they were excluded from the analysis. Another participant was excluded because (s)he reported a colour vision deficiency. Therefore, data from a total of 76 participants were used for further analysis, the results of which are described and interpreted below. There were 20 women and 56 men among the participants, aged between 16 and 69 years (median = 29, mean = 29.9, stdv = 7.6). A vast majority of the participants reported that they worked with PC daily (88.2%), the rest stated that they worked with a PC regularly. Participants stated that they worked with maps on daily basis (48.7%), regularly (42.1%) or occasionally (9.2%). There was a distinctive variation in answers regarding their experience with 3D models and visualizations (Figure 4). Most participants had a background in the field of 'geosciences' (93.4%), which was a consequence of the fact that the request to complete the test was spread through the web and via social media, especially in the geosciences community.

Effectiveness and efficiency
Differences between the correctness of user responses in individual trials were observed (Table 1). The lowest accuracy of answers was detected in trials A and D; in both cases a dark green building was the one to be found.  Similarly to effectiveness, which was the lowest in trials A and D, the efficiency was also low in these trials because it took the participants the longest time to solve them ( Figure 5). Trial D was particularly difficult in this regard.

User strategies
Length of the trajectory of movement in the virtual environment and average speed of virtual movement are depicted in Figure 6.  The numbers of mouse clicks (both absolute and relativized) are presented in Figure 7. From the point of view of the user strategy used, the solution of the 'D' problem differed the most, where the longest virtual trajectories were used and the highest number of clicks related to it. On the other hand, the number of clicks per second was the lowest, and the average movement speed in the virtual environment was also low (where it was not very different from other trials).

Intervening variables
Most participants worked with the Windows operating system (72 participants, 95%), two participant used Linux and two were macOS users. Regarding the web browser used, most participants worked in Google Chrome (a total of 60 participants, i.e., 75%, while version 90.0 was used by 56 participants and version 89.0 by the remaining four). Chrome was used in combination with the Windows operating system, as well as Linux and macOS. The second most used browser was Mozilla Firefox (11 participants; nine of them used version 88.0, the remaining two used version 87.0). Three participants used Microsoft Edge and two used the Opera web browser. Screen resolutions varied considerably, but the usual screen aspect ratios were 16:9 (61 participants; 80%) and 8:5 (12 participants; 16%). All participants used a display with a 24-bit colour depth. Seventeen participants worked in a full-screen mode, which was requested at the beginning of the test. Another 56 used at least a web browser window expanded to the full width of their computer screen. We also analysed the duration of 3D geospatial data loading, the results of which are presented in Figure 9. These times were subtracted from the total time it took to solve the trials, so that only the time when the given 3D scene was actually displayed is included in the solution duration. Loading times are mainly affected by the speed of the Internet connection; in the case of this paper, this effect should be minimized by the fact that individual trials are only compared among themselves. If the individual participants were compared with each other, efficiency would have to be analysed even more cautiously.

Subjective evaluation
Approximately 30% of the participants then considered the colour scale used to be inappropriate or rather inappropriate (Figure 8). In the textual comments, the participants most often stated that the green buildings were difficult to distinguish. In some cases, the loading of 3D geospatial data was not fast enough for users.

DISCUSSION
To summarise the results obtained, the tested colour scale seems not suitable for 3D visualization, especially the green half of this colour scale. The tested colour scale was a diverging one (from green, through yellow to red). Similarly, Engel et al. (2013) tested, among others, a diverging colour scale and evaluated it as the most appropriate. However, this research differed in the type of the spatial task. The tasks in the present study consisted in a search for a building of a given colour, so it was necessary to distinguish between the individual categories (colour classes). If the colour scale is to be used in 3D geovisualizations for a similar task, it would be better to use fewer classes with a more significant colour distance. In general, colour scales within 3D geovisualization should be carefully selected and tested, because this study, as well as previous research , Engel et al., 2013, demonstrate that colour is of great importance in 3D geovisualization, including interactive 3D urban models. We calculated the colour differences (ΔE*) between the searched colour classes and the nearest colours in the evaluated colour scale. This colour difference was the smallest in the case of green (ΔE* = 11.7), larger in the case of red (31.8) and the largest in the case of yellow (42.3 and 48.8). Deeb (2005) stated that another reason might be that approximately 8% of males and 0.5% of females among people with Northern European ancestry suffer from red-green colour-blindness. Although participants who reported a colour vision deficiency were excluded from the analysis, some of the remaining participants may be unaware of this deficiency. The second goal was to identify the limitations and other practical aspects of asynchronous remote usability. Regarding the software platform, the 3DmoveR 2.1 open web-based tool was successfully applied. There are two basic approaches to dealing with intervening (nuisance) variables that can influence the participant during completion of the test and affect his/her performance. In 3DmoveR, it is possible to disable some functions such as pop-up menus in the browser or disable some controls, such as the keyboard in this particular experiment. The second approach, which was also employed, is to record additional information regarding the completion of the test. For this reason, the resolution of the display, its colour depth and the size of the web browser window were recorded. Additional information was derived from a combination of several pieces of recorded data; for example, the fact that a user was completing some trials multiple times was recognized based on the recorded browser history.
Regarding the completion rate, we found that the completion rate was 75%, while an additional 5% had to be excluded during other problems in the recorded data (as mentioned above). No relations were identified between the completion rate and the reported demographic data or previous experiences and the automatically collected data on the intervening variables. The second questionnaire was included at the end of the test (as debriefing). Most of the participants who did not complete the test finished their work after the training task, they were not able to complete this second questionnaire. The completion rate was thus lower compared to the study by Juřík et al. (2018), where 85% of participants completed the test. This study also covered interactive 3D geovisualization but was considerably shorter (only two interactive trials). It follows that when designing remote user testing, it is necessary to consider the length of the test (the number of tasks and trials). There are also some other related issues such as investigation and improving participants' motivation, as described outside the field of cartography and geoinformatics by, for example, Rosenbaum and Kantner (2008) or Chynal and Sobecki (2014). As for the ethical issue of remote usability testing, there is not a comprehensive set of recommendations. The essential requirement for ethical testing is to inform participants that they are involved in usability testing. Current technologies make it possible to collect a range of data without users knowing. In general, during the remote usability testing, the researcher should respect the privacy, confidentiality, and anonymity of participants.

CONCLUSIONS AND FUTURE WORK
In the present study, the colour scale for the visualization of the energy performance of buildings in the scope of 3D city models was evaluated in a remote usability experiment. A diverging colour scale ranging from green through yellow to red that is defined in the Energy Performance Certificate (EPC) was tested. Especially the green part of the colour scale was demonstrably not suitable for use in 3D city models. Participants had difficulties identify 3D buildings coloured green. This user study also served to validate and further explore the possibilities of remote user testing of interactive 3D geovisualizations. One hundred and ten candidates were reached, but only 76 of them correctly completed the entire test. Therefore, it turned out that in the case of remote usability, it is necessary to consider that approximately 30% of possible participants will not complete the test or will not complete it correctly.
The chosen technological solution -3DmoveR 2.1allows two basic approaches to solving the problem of intervening variables that occur in remote usability testing and may affect the participants and their performance. The technological solution can be extended and modified for longer in the future. The following list of functions suitable for extending the presented approach (and remote user testing in general) was identified: • Sound recording (Web Real-Time Communication Interface -WebRTC, getUserMedia Application programming Interface -API, Web Audio API) • Video recording (WebRTC, getUserMedia API) • Screen capture (Screen Capture API) • Eye tracking integration (e.g., Webgazer.js, GazeCloudAPI.js, TurkerGaze), • Adaptation for mobile devices o Responsive design (Bootstrap) o Integration with Geolocation API (Global Navigation Satellite System) and Orientation API (accelerometer) • Integration with WebVR and available head-mounted displays, especially low-cost ones (A-Frame).
Note that these functionalities have potential applications in general remote usability testing, not only in the field of cartography. Examples of these technologies (APIs, libraries) show that these functions can be implemented within a web browser. Appropriate integration into existing testing tools such as 3DmoveR remains an outstanding issue, however. Another topic for future research includes non-technological aspects of usability experiments. A combination of the described technological solution with synchronous (moderated) user testing procedures remains an open issue. Another problem consists in investigating user motivation to increase the completion rate and/or applying certain 'gamification' principles for this purpose.