WHERE ARE WE NOW ON THE ROAD TO 4D URBAN HISTORY RESEARCH AND DISCOVERY?

From 2016 to 2021 the research group HistStadt4D investigated and developed methods and technologies to transfer extensive repositories of historical photographs and their contextual information into a 3D spatial model, with an additional temporal component. The aim was to make content accessible to researchers and the public, via a 4D browser and location-dependent augmented reality representation. Against this background, in this article we present the achievements of the project, lessons learned, and current state of 4D urban history research and discovery based on historical photographs.


INTRODUCTION
Imagine you are exploring the historic center of a city with its impressive town houses, churches, and monuments. What if you could just use your mobile device to find out about the historic buildings around you, with detailed visual information about how they were built and the story behind them, making history come alive before your eyes? Photographs and plans are an essential source for historical research (Burke, 2003;Paul, 2006;Pérez-Gómez and Pelletier, 1997;Wohlfeil, 1986) and key objects in the digital humanities (Kwastek, 2014). Numerous digital image archives, containing vast numbers of photographs, have been set up in the context of digitization projects. These extensive repositories of image media are still difficult to search. It is not easy to identify sources relevant for research, analyze and contextualize them, or compare them with the historical original. The research group HistStadt4D, funded by the German Federal Ministry of Education and Research (BMBF), has been investigating and developing methods and technologies for this. Historical media and their contextual information are being transferred into a 4D (3D spatial and temporal scaled) model to support research and education on urban history. Content is made accessible in two ways; via a 4D browser and a location-dependent augmented reality (AR) representation. The database consists of about 230 000 digitized historical photographs and plans of Dresden from the Deutsche Fotothek (German photographic collection). In previous articles we highlighted the prospected research agenda  as well as technological venues (Niebling et al., 2018a;Niebling et al., 2018b), and the information habits and requirements of art and architectural historians as scholarly image repository users (Friedrichs et al., 2018;Münster et al., 2018;. The main purpose of this article is to outline (1) the current state of research, (2) lessons learned and implications, and (3) the questions and research issues left open.

USER RESEARCH
After Drucker (Drucker, 2013) opened up the debate and distinguished between digitized (uses digital resources) and digital (focuses on use and potentials of digital technology), the topic remains fiercely debated (Bishop, 2018;Rodriguez-Ortega, 2019). However, digital tools and methods have supposedly arrived in art history research and education (Baca et al., 2019). An important task of the HistStadt4D project was to investigate how art and architectural scholars perform research and how current repositories of historical images are utilized within this process. In the following, we present outlines of two studies on (a) the information behavior of art and architectural students (Kröber, 2021) and (b) image repositories and their matching with requirements of art and architectural scholars .

Information behavior of scholars in art and architectural history studies
This investigation is part of a study on art history students' information behavior connected to digital repositories. A qualitative approach was chosen to generate the hypothesis, because the topic is very broad and connected to many complex aspects. The findings are derived from three different focus group interviews in September 2016, June 2019, and January 2020, including in total 25 interviews of bachelor's and master's students from the universities of Dresden and Würzburg. The related interview questions are:

•
How can the search for information and images be improved? • Do you miss any functionalities for repositories or software solutions for data analysis during your research process?
Thematic analysis was applied to identify and report themes within the data (Braun and Clarke, 2006). Transcriptions of the audio recordings were analyzed with MaxQDA (Kuckartz and Rädiker, 2019). When asked for ideas and needs for their research, it was difficult for the students to imagine possible solutions. However, some applications were introduced during lectures and internships, and by other applications and platforms. The use of map applications during an art history lecture was mentioned. Google Maps, Google Street View, and Google Earth were used due to lack of alternatives. In connection with architecture, these were applied to get an overview of an area, determine the surroundings such as the street situation, and estimate distances between objects. Older maps were used to spot changes. The students were made aware of the option to display geolocated information and images on a map by the project connected to this study. Therefore, they mentioned that the geolocated presentation supports data overviews and browsing and helps to create related visualizations.
The students are mostly familiar with 3D environments from video games. They see the 3D environments and objects as having potential for examining details and surroundings more closely. Being able to alter the point of view provides the opportunity to change and comprehend viewing angles. This may help to determine certain proportions and perceptions of buildings. The reconstruction of building stages was also named as an opportunity to visualize changes or validate hypotheses. Sometimes research objects cannot be investigated directly, so 3D digitization of sculptures was mentioned. This allows users to look at every angle of the sculpture but also to derive suitable visualizations of angles necessary for argumentation. On the one hand, the students want to use simple techniques to detect image axes or highlight aspects of an image. On the other hand, they would appreciate feature matching to compare images and detect changes or to analyze whether the attributes of an image resemble those of another art piece. Some go so far as to suggest artificial intelligence for automated recognition of figures through their attributes. Data storage is a topic of little interest to the students. They might adopt an easily accessible and usable database, but they really want options to automatically name downloaded files, including the reference. This was especially emphasized for images gathered from video where timestamps are needed.
The study shows that students are more familiar with digitized art history than with the analytic use of digital methods. Indeed, the students' ideas for technology are not new to the field. Currently, notable advances are being made in 3D modeling, machine learning (Hatchwell et al., 2019), spatial analysis and network analysis (Jaskot, 2019). The fact that the students were not aware of this shows that more technological developments in the field should be included in their education. Also, institutions like libraries, museums and repositories should consider adopting these features and maybe host 3D models for research or incorporate map applications to geolocate and contextualize information.

Figure 1. Countries of origin of investigated image repositories
A second study was based on the qualitative structured review of 107 online image repositories, conducted from 2016 to 2017 (cf. Fig. 1). The guiding question was: What are the positive and negative implications of our investigations for repository design?
The findings from this study are that, especially during the last decade, metadata quality has heavily evolved in order to facilitate the research on art, architectural history, and cultural history. This has partly overcome the distribution of incorrect or incomplete information associated with images noted by Beaudoin (Beaudoin, 2009, p.298). Despite these attempts, vast numbers of images are still insufficiently tagged, indexed, or linked. Many of the issues indicated in studies a decade ago are still valid such as the quality of metadata, the resolution of the images, and the indication of usage rights. Nevertheless, especially in scholarly image libraries, strong progress can be traced. Crowdsourced tagging initiatives can rapidly lead to relatively high-quality results (Nowak and Rüger, 2010). This promising approach is still used to tag art history repositories (Wieser, 2014) even if tagging results still have to be proven for their quality and matching to the search strategies of users. The study has implications for improving the usability of image repositories. Since digital repositories no longer serve expert users only, gaining user feedback about aspects of the interface design will not only benefit scholars and earn their trust but increase the longevity of this resource. For example, digital resources should be designed based on user needs and facilitate intuitive interaction with information. Techniques of user experience design and assessment can serve methods to support that aim (Nielsen, 1993;Rubin and Chisnell, 2008). Efficient filtering tools and a "Google-like" straightforward keyword search were widely introduced by other image libraries in recent years. Although their quality varies a lot, they allow art and architectural historians to sharpen their focus and to retrieve more precise results. Thus, scholars have changed their information seeking from scrolling to narrowing the results by filtering. Finally, when tool designers consider specific characteristics of art historical research such as its creative relationship with information, this has the potential to positively affect the whole research lifecycle. The most critical challenges for image users are the lack of accessibility and availability of high-quality content (Beaudoin, 2009, p.294). Even if portals such as Europeana (European Union, 2008) strive to make several data collections available via one single user interface, these attempts only cover a minority of data. Despite many attempts to increase the number of high-quality images available online, including massive digitization campaigns, art historians still have limited access to digital resources containing primary material and good quality open access to visual information that is digitized and presented according to their preferences and needs. Certain areas of art history which are subject to little research, such as digital art history and non-Western art, face greater difficulties regarding this issue. Understanding the different needs that scholars in such areas have can lead to building appropriate digital resources.

THE DESIGN PERSPECTIVE
An adjacent aim in this research was to investigate the perception of virtually represented building structures to create researchaware 3D visualizations. In the following, we present outlines of two studies on (a) influencing factors of perception and (b) different visualizations of ruins as 3D models.

What factors influence perception?
The design of 3D representations is frequently researched. Various influencing factors have been identified and tested: • Much research is about the visualization of different degrees of certainty. Current approaches can be roughly distinguished between enrichment of representations by explanatory elements (e.g. Dudek et al., 2015) and adaptation of representation quality, such as level of detail or visual styling (e.g. Apollonio, 2016;Lengyel and Toulouse, 2011 Glaser et al., 2017). • Scaling has been frequently assessed as an important parameter for perceiving architecture (Glaser et al., 2017;Yaneva, 2005). • Visual acuity is closely related to scale, with regard to the ability to distinguish details, but unlike scale, a main factor influencing it is distance to the object (e.g. Polig et al., 2021). • Perspective depiction and perception of architecture include the viewer effect of different fields of view (e.g. Paliou, 2018). • The relevance of lighting for the investigation of historical architecture is well known (Mondini and Ivanovici, 2014). For the virtual, specific workflows and approaches (based on visual comparisons) are proposed (e.g. Noback and Wittkopf, 2014).
During four studies we empirically researched how virtually represented architecture is perceived (Münster, 2018). Studies involved in total ~90 persons and used methods from user experience testing. The research questions were: • How much detail is needed to recognize virtual architecture?
• How well can building properties be estimated in the virtual in comparison to real world settings? • How is aesthetics perceived in the virtual with regards to presentation forms? • How well can mistakes be found in a visualization?
Our findings concerning property estimation and recognizability as well as studies from other fields of visualization (Burri and Dumit, 2008) lead to the assumption that the perception of virtual objects is highly influenced by surroundings shown and the visual framing. Therefore, a suggestion for projects to improve estimation and recognizability of objects may be to focus on a modeling surrounding objects, rather than on higher level of detail. In the context of architectural visualization, viewer effect has rarely been empirically tested. For instance, abstraction of colors has a direct influence on an assessment of perceived aesthetical qualities. Especially for visual assessment and analysis it seems to be important to mention this possible bias. Current 3D modeling projects tend to model objects as accurately as possible. It is currently under investigation whether observers do indeed notice subtle changes. Another finding is that geometric and radiometric errors in the digital visualizations were poorly recognized by both experts and laypersons. There were also major differences within both groups in naming false-positive errors. Both findings led to the hypothesis that there is no common perceptual strategy even for architectural historiansthis will be tested in a future study.

How to design 3D representations
During the work with 3D models a user study with art history students of the Universität Würzburg and the Technische Universität Dresden was used to gather information about different visualizations of ruins as 3D models and their impact on the viewer (Fig. 2). This was researched in a qualitative study involving feedback from 10 scholars (Kröber et al., 2020;Messemer and Clados, 2020). This user study led to a deeper understanding of the visualizations of 3D models of ruins and their impact on the target group. It showed that the scholars considered different visualizations as suitable depending on the context in which these were to be used. Nevertheless, one of the visualizations was favored. In a broader study with more participants will be needed to check this and to confirm which visualization can serve for which context, such as a museum, research paper, or education.

THE 4D MODELING DIMENSION
In the 4D modeling dimension the main question was the best way to integrate various historical sources in spatial environments. Specifically, this question was seen in a geodetic and photogrammetric context implying to reach a geometric and mathematic "truth" with appropriate probability and accuracy measures. This led to the following questions: • Is it possible to use content-based image retrieval (CBIR) on historical data and find similar perspectives of one building? • Is it possible to determine camera calibration parameters for historical images and allow a 6 degrees of freedom (DOF)positioning in 3D space? • Is it possible to generate historical 3D models using Structure-from-Motion (SfM)? The research mainly focused on historical image data because these provide geometric correctness in a photogrammetric sense. Thus drawings and paintings were mostly neglected in this research. At the beginning of the project, we tested whether proprietary SfM software is able to orient a large number of historical images and create dense point clouds completely automatically. Agisoft Metashape was only capable of orienting a small number of historical images from a small dataset, but the image orientation could be slightly improved using contemporary data . Because of radiometric and geometric differences between historical image pairs, we identified the detection of distinctive features and their matching as the main problem. Hence, we developed a method using quadrilaterals as geometric features exclusively for urban terrestrial images . Later, this method was outperformed by more general feature matching methods, especially the combination of Radiation-Invariant Feature Transform (RIFT) (Li et al., 2018) and Matching On Demand with view Synthesis (MODS). (Mishkin et al., 2015) also compared other methods (Maiwald et al., 2019a;Maiwald et al., 2019b). One persistent problem was that no historical dataset was available for testing feature detection and feature matching methods. Thus, we decided to make our own dataset publicly available using interactively determined feature points to calculate the relative image orientation of historical image triples using the properties of the Trifocal Tensor (Maiwald, 2019). Using deep learning technologies like convolutional neural networks for feature detection and feature matching significantly improves the number of orientable images and increased both model quality and accuracy (Maiwald and Maas, 2021). This approach also enables integration into SfM software tools using bundle adjustment for the orientation of a large number of historical images. The plan is to import the derived (scaleless) image orientations into the Web and AR applications (Fig. 3) using a Helmert transformation, enabling texture projection. Up to this point, the historical images were mainly selected interactively with no CBIR approach. But ongoing research shows that CBIR is also possible for historical image data . A fully automatic pipeline from image selection up to image orientation calculation has now been realized and will be published in near future. From the research, we draw the following relevant conclusions: • Orienting a large amount of historical image data requires well-developed feature detection and matching methods and neural networks can significantly improve the results.

•
To the best of our knowledge, no historical image dataset is usable for training a neural network. • CBIR can be used on exclusively historical image data and improve the process of image selection.

•
Contemporary images can improve the accuracy of reconstructed scenes and are useful to fill gaps where no historical data is available.

THE 4D BROWSER
A central part of the project is the 4D browser application, 1 which connects the disciplines. While target user requirements and usability aspects impact the frontend user interface, the results of automated image processing are supposed to feed the backend database of spatialized data that is also used for AR applications. However, the aim of the project was not to develop a featurecomplete application, but a demonstrator to proof concepts. The basic idea of the frontend application is to extend conventional search interfaces (e.g., search bar, facetted search) by a 3D viewport that enables users to search spatially for photographs within a virtual 3D city model (Fig. 3) (Bruschke et al., 2018). Precise knowledge of metadata would not be required anymore to find photographs of interest. However, since the images and their metadata retrieved from media repositories do not contain spatial information, the photographer's perspective of each image (i.e., position, orientation, and field of view) has to be reconstructed. A critical mass of spatially enriched images is vital for the further development. As this could not yet be automatically achieved for historical images in the beginning of the project, a workflow was implemented to spatialize images manually. Hence, the browser application is an authoring tool in the first instance. While there are some 2D approaches to spatializing historical images on a contemporary map (e.g. www.historypin.org), the 3D approach makes it possible to take the photographer's perspective and to understand their situation while taking the photo (Schindler and Dellaert, 2012). For historical photos, this also requires a 3D city model that reflects the situation of the historical building. While 3D geometries constituting the city's contemporary state can be retrieved from public authorities, 3D models of historical or demolished buildings cannot be as easily obtained. The advances in historical image processing are also not yet mature enough to automatically reconstruct proper 3D models of historical buildings. Hence, such buildings have been modeled manually. Since untextured 3D models only display buildings, (historical) maps have been incorporated showing additional hints regarding the building situation, street names, and other infrastructure that supports orientation within the city. After creating the basic conditions, features according to target group requirements are added, shifting the application toward use as a research tool. Next to the conventional metadata-based search and the basic spatial query within the 3D viewport, this includes a tripartite time slider that enables users to select a time range to filter images, to choose a point in time that should be represented by the city model, and to toggle one of a set of (historical) maps. Since photographs are linked to the buildings they depict, interaction with the respective 3D objects can additionally filter the search results. Similar images can be browsed by easily navigating to neighboring camera positions. The development was guided by the questions: How can art and architectural historians be supported in their research? Besides for positioning inside the 3D scene, for what other purposes can the spatial information about the images be used? Querying the spatial attributes of a big collection of images can help to answer specific research questions, such as: Which positions have photographers preferred for taking photos of a given building? What is the main perspective of photographers regarding a specific building? At this point, quantitative visualization methods offer support. Visualization methods can help to retain an overview of the data and extract characteristics that would have remained unrevealed otherwise. This includes conventional heat maps to identify aggregations of camera positions (Fisher, 2007), advanced heat maps on object surfaces (Chippendale et al., 2008;Pfeiffer and Memili, 2016) to visualize which parts of the building are depicted most in the images (Fig. 4), and several methods that visualize the distribution of positions and orientations (Fig. 5), including approaches based on vector fields or clustering (Ware et al., 2016;Zhou et al., 2019). Our user study determined which of these visualizations are most appropriate to gain insight into the specific research questions of the target group (to be published).  This application has the potential to become a powerful tool for research if development continues and more data is added, which is beyond the scope and resources of this project. Built as a demonstrator showcasing a rather compact amount of data, general (automatic) workflows to extend the dataset are still under development. A frequent request has been to let users (researchers, but also citizens) contribute data from their private repositories. However, this would entail unclear copyright issues and require continuous maintenance of the uploaded content to ensure quality, which the project could not provide. User accounts would have been a precondition in this regard, and could have added features like personalized collections of photos for research: these were envisaged but remained on low priority during the project. Although usability was an important topic for us during this study, which included hands-on workshops with potential users, many aspects are still improvable. This topic is often less prioritized in interdisciplinary projects, but rather important to convince people to use the application. The 3D navigation can trouble users who are not used to 3D environments (Fitzmaurice et al., 2008). Despite the known issues, the database of spatialized photographs in relation to virtual 3D buildings and the browser application as an authoring and research tool are of high value for follow-up projects. The 4D browser could showcase how scholarly and lay user research can be improved.
(1) Adding the spatial dimension facilitates the search for photographs of interest.
(2) The process of understanding urban development can be enhanced by integrating a 4D city model and historical maps.

VISUALIZAZION IN VIRTUAL AND AUGMENTED REALITY
Immersive AR applications can be created for museum-based cultural heritage knowledge transfer in a variety of ways (recent overviews: Bekele et al., 2018;Ioannides et al., 2017;Luna et al., 2019). For the dissemination of our research results, we have explored various options in both Virtual as well as in Augmented Reality. The described prototypes operate on the same data sources as the desktop 4D Browser described in the previous section. AR can be produced by haptic building models, or by projection using personalized mobile devices or AR glasses. Our installation makes it possible to link historical photographs of architecture with building models (Maiwald et al., 2019a). The photographs are mapped onto a model created by 3D printing to represent the historical appearance of the building at the time the photograph was taken (see Fig. 6). The photographs are photogrammetrically located with respect to the viewpoint and orientation of the camera relative to the building. Displaying the photographs in spatial relation to the model allows users to engage with the shooting position and shooting situation of the respective photographers at the time a photograph was taken (Niebling et al., 2018a), while viewing a haptic 3D model of architecture which is a familiar explorative environment to museum visitors In Virtual Reality (VR), we use historical photographs registered to virtual 3D models of the depicted architecture in virtual reality to engage users with the subject of photographic perspective in an interactive serious gaming setting. In an HMD-based fully immersive VR environment, users are challenged to find the specific locations and orientations of multiple images. Cultural heritage research results can be disseminated using gamification elements, where several users are encouraged to replicate the position and orientation of photographers of historical images. A semi-transparent photograph that is about to be spatialized is attached to a hand controller that the user can see, making it easy for them to compare the contents of the photograph to the virtual scene to find the point where the photograph was taken (see Fig.  7). To spatialize a photograph, the user needs to reconstruct the exterior and interior orientation of the camera. The exterior orientation is the position and orientation of the projection center of the camera in relation to the world space. Both translation and rotation components are defined by a transformation matrix that is automatically set by the user's virtual eye, that is, the user's position and orientation in the virtual world. The interior orientation of a camera is the principal distance, which defines the angle of view, and the coordinates of the principal point.
To set the interior orientation, the user must move the controller back and forth to adjust the camera constant that correlates the distance between the user's eye and the controller with the attached photograph. Shifting the controller sideward and up and down sets the offset of the principal point from the image center. In a VR setting, the urban scenery can be explored virtually with fewer restrictions than in the real world. We conducted a user study to investigate whether participants experience different levels of engagement while using the VR application compared to a paper flyer. Employing the User Engagement Scale Short Form questionnaire, the study showed a significant difference concerning aesthetic appeal and reward factor in the VR condition compared to the flyer condition. High development and maintenance costs, technical limitations, and lack of didactic concepts are main reasons for the hesitance to implement methods that have become established in exhibitions (Niebling et al., 2018b). However, our study also shows clear advantages of using immersive applications to support knowledge transfer.
Interactive engagement with the content can sustainably increase both users' learning motivation and their learning success. Finally, immersive installations have the potential to provide visitors with unforgettable experiences.

FUTURE PROSPECTS
After five years of work, various research challenges and issues remain unresolved. In user studies, the next research steps will be to investigate user behavior in a larger and more international sample. Using this sample, another next step would be to employ descriptive methods to further investigate and quantify the most important issues in research and information behavior. In our user studies, the question arose as to how digital tools like the 4D browser could be introduced much earlier to students and junior researchers and how those tools could be used for digital education. This question was not part of the initial problem, and its processing is still pending. It became obvious that we still know very little about how architectural historians visually judge sources. Also, visual parameters and designs for 3D/4D visualizations of past architecture are rarely empirically validated. Consequently, a future step may be the comprehensive investigation of design recommendations for 3D/4D representations to serve architectural history studies. Concerning 4D modeling, open research questions include how to create or augment historical image data by training neural networks in feature extraction and feature matching. Another issue is the number and quality of historical images necessary to enable the use of dense matching and the creation of historical (generalized) 3D models. Another issue is to build historical reference datasets to evaluate the accuracy of automatically derived image orientations. Another approach currently under development in a connecting project is large-scale automated workflow for image orientation to extend the dataset of spatialized photographs. Finally, the distinction into digital disciplines and humanities fall short in the case of image repositories and social science approaches are highly relevant to investigate and shape both, research processes and digital applications (Münster and Terras, 2020). Our group has only started to work on this issue and a truly multidisciplinary approach involving humanities, computing, design, and social sciences is needed to pave the way toward 4D Urban History Research and Discovery.

ACKNOWLEDGMENTS
Parts of this research described in this paper were carried out in the project Digital4Humanities (German Federal Ministry of Education and Research, 1630UG01). The authors would like to thank to Kristina Friedrichs who conducted the assessment of the image repositories.