ACCURACY INVESTIGATION ON IMAGE-BASED CHANGE DETECTION FOR BIM COMPLIANT INDOOR MODELS

: Construction progress documentation is currently of great interest for the AEC (Architecture, Engineering and Construction) branch and BIM (Building Information Modeling). Subject of this work is the geometric accuracy assessment of image-based change detection in indoor environments based on a BIM. Line features usually serve well as geodetic references in indoor scenes in order to solve for camera orientation. However, building edges are never perfectly built as planned and often geometrically generalized for BIM compliant representation. As a result, in this approach, line correspondences for image-to-model co-registration are considered as statistically uncertain entities as this is essential for dealing with metric conﬁdences in the ﬁeld of civil engineering and BIM. We present an estimation model for camera pose reﬁnement which is based on the incidence condition between model edges and corresponding image lines. Geometric accuracies are assigned to the model edges according to the Level of Accuracy (LOA) speciﬁcation for BIM. The approach is demonstrated in a series of tests using a synthetic image of an indoor BIM. The effects of varying edge detection accuracies on the estimation are investigated as well as the effects of using model edges with different geometric quality by adding Gaussian noise to the synthetic observations, each within 100 simulation runs. The results show that the camera orientation can be improved with the presented estimation model as long as the BIM compliant references meet the conditions of LOA 30 or higher ( σ < 7 . 5 mm ).


INTRODUCTION
In the AEC (Architecture, Engineering and Construction) industry, there is a great demand for automated and accurate scene capturing and surveying, mainly for the purpose of construction progress monitoring in the context of BIM (Building Information Modeling). With BIM, the construction project is based on a so-called digital twin, which represents the planned state of the object in its various phases of construction. In order for the construction management to be able to react to possible deviations from the planning at an early stage, a regular comparison of the as-planned and as-built project status must be carried out. With the increasing development of BIM, the demand on its geometric quality increases. This is especially true for the modeling of detailed structures in building interiors, as they continue to be maintained beyond the construction phase as a central database for many applications such as location based services and the internet of things. Image based measurement techniques can meet general requirements of accuracy and efficiency and therefore, are becoming more and more popular in this field. A camera's orientation is required in order to determine the spatial correspondence between an image and a 3D model (Fig. 1). However, the estimation of a camera's pose within a BIM's reference system is indoors much more difficult than outdoors. The lack of direct geo-referencing (GNSS denied environment) as well as occlusions and ambiguities due to general indoor structures are challenging circumstances for image based measurement techniques within a building's interior and related accuracy requirements. Line features usually serve well as geodetic references in indoor scenes in order to estimate the camera orientation. They are sufficiently available in man-made environments (and corresponding BIMs) and can be detected easily by common image processing algorithms. However, building edges are never perfectly built as planned and often geometrically generalized for BIM-compliant representation (Fig. 2). As a result, line correspondences for image-to-model co-registration must be considered as statistically uncertain entities as this is essential for dealing with metric confidences in the field of civil engineering and BIM. In this contribution, we present an estimation model  for camera pose refinement based on line references, given a coarse orientation (e.g. from IMU (Inertial Measurement Unit) sensors and visual odometry). The stochastic model includes the individual uncertainty information of BIM related objects according to the Level of Accuracy (LOA) specification for BIM (USIBD, 2020) (Fig. 3). The approach will support future developments of image based change detection in indoor environments with available BIM. According to compliance checking in BIM, we focus on structural and topological changes in the geometry of BIM compliant 3D models.

Related Work
Our motivation comes from the need for image-based building documentation and change detection that is, on the one hand, specifically focused on the interior and, on the other hand, takes into account BIM related accuracies and quality requirements. Nevertheless, our approach is based on some related research areas.
1.1.1 Image-based Construction Progress Documentation for BIM BIM compliant construction progress documentation based on images and computer vision technologies has become a researched topic in recent years. Most existing methods consider outdoor observations. In Tuttas et al. (2017) the asbuilt state of a construction site is detected by photogrammetric outdoor surveys. Knowledge on the construction processes is inferred from BIM objects and SfM (Structure from Motion) methods enable 3D building elements to be located. Hoegner et al. (2016) co-register image sequences with a BIM for the documentation of geometric and radiometric changes of a construction site. They use model information as fictitious observations within the adjustment. An a priori sensor pose is given from GNSS and IMU (outdoors). Other authors contribute valuable approaches for image based indoor applications, however, they do not consider BIM compliant accuracies or high requirements on geometric quality. Han and Golparvar-Fard (2015) solve camera orientation by manually selected tie points. Relevant model elements are back projected into the images in order to segment image regions for material classification and Kropp et al. (2018) increase automation for inspections on interior construction states by image sequences which are registered with a BIM to derive rich information about single construction tasks. In contrast, Brunn and Meyer (2016) showed that sub millimeter accuracies can be achieved with a multi camera system in close range indoor applications using precise geodetic control points for proper camera calibration and pose estimation.
1.1.2 Model-to-Image Matching based on Lines Edge detection is a basic task in image processing, which is why line features are generally well suited to be matched to building models. However, many approaches also mostly focus on outdoor applications. Li-Chee-Ming and Armenakis (2014) use vertical lines from UAV-borne video streams in urban scenes, that are matched to extracted line features from synthetic images of 3D building models.  present a line based model-to-image matching for texturing 3D building models with airborne oblique view thermal infrared images and Verykokou and Ioannidis (2016) use vanishing point detection techniques for the rough estimation of the exterior orientation parameters of oblique aerial images.

Uncertainty of Matching Features
Usually, image features as well as geometric primitives from 3D models should supposed to be statistically uncertain because of unavoidable inaccuracies in measurement and modeling (generalization). Therefore, these uncertainties have to be considered in the representation and estimation of such entities. For fitting parameterized 3D models to images Lowe (1991) presents general methods that take account of inherent inaccuracies in the image measurements. Heuel and Förstner (2001) combine protective geometry and statistical methods by representing points, lines and planes and their uncertainties in homogeneous vectors. They show how errors in the construction of new elements are propagated. Building up on that, Meidow et al. (2009) developed a generic estimation model for homogeneous entities and multiple arbitrary constrains and Iwaszczuk and Stilla (2017) present a similar approach for camera pose optimization for the purpose of automated texture mapping, in which a line-based model-toimage matching is developed. Although these techniques have not been presented directly for the case of construction progress documentation, they can be used for this purpose. These fairly generic techniques can be adapted to the indoor case and extended with BIM specific accuracies resp. uncertainties.

Own Contribution and Paper Structure
In contrast to previous work, the main idea of this project is to make use of any initial BIM compliant information and therefore, the edges of known BIM objects which are identified in the images will be used for camera pose refinement. We focus on interior scenes and pay particular attention to the individual geometric quality of reference objects as metric tolerances have to be met in construction. In our approach, the individual accuracy information of BIM objects and corresponding model representations is taken into account. We investigate the accuracy of sensor orientation prior to 3D reconstruction and change detection. It is basic work for image based applications in building interiors for BIM as it enables geometrically and semantically correct image interpretation for subsequent processing steps.
In the following sections, we present an estimation model for camera pose refinement based on line references, given a coarse orientation (e.g. from IMU sensors and visual odometry). The approach is based on the incidence condition between model edges and corresponding image lines in the projective space. This relation serves as a constraint in a Gauss-Helmert model. Our functional model is designed reflecting the mutual relation between the observations, using the homogeneous Plücker matrix for 3D line representation and the unknown parameters of the projection matrix. Furthermore, the stochastic model includes the individual uncertainty information of BIM related objects and facilities according to the LOA specification for BIM. The optimization approach is demonstrated in a series of tests using a synthetic image of an indoor BIM. The effects of varying edge detection accuracies on the estimation are investigated as well as the effects of using model edges with different geometric qualities by adding Gaussian noise to the synthetic observations, each within 100 simulation runs.

Method Overview
For the change detection application, a stereo camera system with an IMU will be used to easily obtain the relative orientation, the scale and an image based 3D point cloud by SfM and MVS (Multi View Stereo). Indoor environments can be characterized by their large number of straight edges e.g. on windows, doors, walls and furniture. 3D lines as geometric primitives can be used to describe the building interior in an abstract way. This brings advantages over unstructured 3D point clouds in terms of matching and co-registration with a corresponding 3D model. Therefore, the approach of Hofer et al. (2015) will be integrated in the process in order to generate an abstract 3D line model. With the results of a semantic image segmentation, those 3D line segments will be further extended with semantic information, similar to what is described by . The coarse absolute orientation of an image sequence in the BIM's reference system will be determined from matching corresponding semantic 3D line segments (Fig. 4). 3D scene reconstruction will also be supported by the available BIM information in an object based approach. Finally the change detection can be realized by the comparison with the last version of the BIM before it is updated with the new information. Altogether, this will result in a positive feedback loop because the higher the quality of the initial BIM, the faster and more precisely it can be updated again. The overall concept is depicted in figure 5.

Projection of homogeneous Points and Lines
A two dimensional straight line can be represented in the homogeneous form ax + by + c = 0 with the homogeneous vector: A 3D line can be represented by the homogeneous Plücker matrix in the projective space: where X and Y represent the endpoints of the line.
Mapping an object point X to an image point x with a pin hole camera is achieved by multiplication with the projection matrix P: with calibration matrix K, rotation matrix R, identity matrix I and the 3D coordinates of the projection center X 0 , P results from:

Camera Pose Optimization in the BIM's Reference System
The exterior orientation parameters of a camera (X0, Y0, Z0, ω, φ, κ) are implicitly available in the projection matrix P. The approximate projection matrix gets optimized in the BIM's reference system thru the observation of straight lines in the im-age (li) and corresponding 3D model edges (Li). The incidence condition of corresponding line features is used as input for the formulation of a functional model within a generic estimation model for homogeneous entities according to Meidow et al. (2009).

Functional Model The estimation is based on a Gauss
Helmert model with constraints. The corrected observations (ŷ = y +v) and the estimated unknown parameters (β) have to fulfill certain conditions. The G-conditions g(ŷ,β) = 0 describe the relation between the observations and the parameters. The H-restrictions h(β) = 0 concern only the parameters and the C-constraints c(ŷ) = 0 are imposed on the observations alone. The incidence condition of a 3D model edge, which is projected into the image with an observed corresponding straight line in the image serves as G-condition. It represents two independent constraints (Förstner and Wrobel, 2016): where 0 is a 4 x 1 zero vector.
For the handling of singular covariance matrices, the homogeneous observation entities as well as the parameters must be spherically normalized (normalized to 1). Therefore, the Hrestrictions for the parameters is: where p = vec(P), a column vector with the reshaped elements of P.
Respectively the C-constraints for the observations result in: The optimal solution for β is given by the minimum of the weighted squared residuals subject to the given constraints. This is achieved by minimizing the Lagrange function with the Lagrangian vectors λ, µ and ν: The corrections for the observations and parameters are calculated in an iterative procedure according to Meidow et al. (2009). In every iteration, the Jacobians are calculated: Additionally, in every iteration (τ ) the residuals of the constraints and the auxiliary variable a: The normal equation system is solved by using a and matrix LU decomposition in order to receive the corrections for the estimated parameters: With the covariance matrix of the contradictions: Σgg = B T ΣyyB. Finally, the residuals: 2.1.2 Stochastic Model The limited accuracy of the reference lines such as the inherent uncertainty of straight line detection have to be considered in the estimation. The homogeneous representations of 2D and 3D lines are therefore extended with individual stochastic information using covariance matrices. A 3D point with Euclidean coordinates X = [X, Y, Z] T has the Euclidean covariance matrix: The homogeneous representation of a 3D point X → X is: (14) where XE = [U, V, W ] T is the Euclidean and X h = T is the homogeneous part. It follows the homogeneous covariance matrix Σ XX : For converting a homogeneous vector back in a Euclidean representation X → X e , the Jacobian J e at X is needed, as the normalization is a non-linear function: A two dimensional straight line with the homogeneous vector l = [a, b, c] T has the homogeneous covariance matrix Σ ll : If an uncertain straight line is given by its homogeneous parameters, the parameters of the Hessian form can be derived form: The Jacobian J lh is needed at l: The covariance matrix Σ hh results from:

Parameter Estimation
For each pair of corresponding lines, the vector with the observations is y = [li, Xi, Yi, ...] T and the related covariances re- The elements of P are as p in β.

Conditioning
As the homogeneous entities relate to Euclidean BIM objects, their coordinates are expressed with respect to the BIM's reference coordinate system. A big difference between the Euclidean and the homogeneous part causes the calculation to be numerical instable, which is why conditioning as proposed by Förstner and Wrobel (2016) is applied: For 2D points T2D is composed with the centroid coordinates µx and µy and the maximum distance to the centroid smax: The procedure is analogous to 3D points. Straight lines are conditioned and re-conditioned using: Conditioning is applied on the observations. For the projection matrix results: Pc = T2D P T −1

Spherical Normalization
According to (7) and (8) the observations and initial parameters have to be spherically normalized prior to the adjustment: l s c := lc/|lc| and Σ ll = Jc Σ ll J T c . With the Jacobian Jc: In the same way it is done for the parameters and 3D model points: X s c := Xc/|Xc| with Σ XX = Jc Σ XX J T c and p s c := p c /|p c |.
In the following sections we assume the homogeneous coordinates to be conditioned and spherically normalized and omit the indices c and s.

Jacobians
The A-Matrix includes the first derivatives of the G-constraints according to the unknown parameters of p. As eq. 6 corresponds to 4 constraints of which 2 are chosen, the matrix has as twice as many rows as observed line correspondences and the number of columns corresponds to the number of parameters: The B-Matrix includes the first derivatives of the G-constraints (6) according to the observations li, Xi and Yi. It is a diagonal matrix where each main diagonal element has 2 rows (2 constraints in (6) and 11 columns (11 elements in the homogeneous vectors l, X and Y).
The C-Matrix includes the first derivatives of the C-constraints (7) according to the observations li, Xi and Yi. The derivatives for each observation triple form diagonal sub matrices of dimension (3,11), which in turn are written to the main diagonal matrix.
The H-Matrix includes the first derivatives of the H-constraints according to p. It has one row and 12 columns.

Euclidean Interpretation
The initial projection matrix is optimized during the adjustment. The homogeneous matrix, as well as the uncertain observations in the form of 3D points and 2D straight lines, are thereby conditioned and spherically normalized. For the application of the results in engineering practice, the output data should be interpreted Euclidean.

Projection Matrix
After the adjustment, an estimated projection matrix (P s c ) is available, which is also conditioned and normalized. It is reconditioned with (26).
From the estimated projection matrix, the improved parameters of the exterior orientation can be derived by decomposing the matrix: P = [A|a] = [K R| − K R X 0 ] (Förstner and Wrobel, 2016).
The projection center X 0 is obtained from: X 0 = −A −1 a. Factorizing A by QR decomposition, which expresses a matrix A as multiplication of an orthogonal matrix Q and an upper triangular matrix R, results in the rotation matrix R. A should have a positive determinant:Ā = sign(|A|) A. The inverseĀ is decomposed: [R T ,K −1 ] = qr(Ā −1 ).
The sign s (here: s = +1) of principal distance needs to be specified for the calculation of the diagonal matrix D: Finally, the rotation matrix R results from: R = DR.

Corrected Observations
After the adjustment, Euclidean interpretation of uncertain, homogeneous, conditioned and spherically normalized 3D points and 2D lines is achieved through Euclidean normalization and re-conditioning (table 1).

EXPERIMENTS
The estimation model presented in the previous section for the optimization of an approximately known camera pose was tested with synthetic data as ground truth for accuracy assessment. Euclidean normalization Xc, Σ XcXc with J e (X s c ) l e c , Σ lclc with J e (l s c ) 2. Re-conditioning X, Σ XX with T3D l, Σ ll with L2D 3. Euclidean interpretation X e , Σ X e X e with J e (X) [θ, r] T , Σ hh with J hl (l) Table 1. Euclidean interpretation of uncertain 3D points and 2D lines The basis is a real test environment. In the following, the reference model as well as the test setup and the obtained results are presented.

Test Environment and Reference Model
A laboratory for measurements with geodetic reference network is used as test environment (Fig. 6). It has a local coordinate system whose main axes are aligned with the walls of the building. The network is realized by permanently installed mini prisms and enables the stationing of tacheometers in the lower millimeter range. The laboratory serves as a test environment for validating image measurement techniques and achievable accuracies with respect to the detection of typical objects and component classes inside buildings. To obtain precise geodetic survey data, the laboratory was captured by a 3D laser scanner in a high resolution. The registration of the single scans was done by control points. The global, absolute measurement accuracy in the reference system is 1.5 millimeters and standard deviation of the relative orientation of the individual scans is in the sub millimeter range when cloudto-cloud adjustment is performed. The resulting point cloud served as the basis for BIM compliant modeling. The modeling accuracy depends on the respective object class and is based on the recommendations according to the LOA specification. For example, interior doors and windows are assigned to accuracy level 30 or 40.

Virtual Camera and Data Input
A virtual view of the interior model was rendered with 36 mm sensor width, 1600 x 1200 pixels and 30 mm focal length. The projection center is located at positionX 0 = [43.907, 28.847, 8.053] T in the reference system and the camera system is rotated by anglesω = −67.318 • ,φ = 1.8 • andκ = 29.498 • with respect to the BIM system. The true projection matrix (P) was derived from the parameters of the distortion-free virtual camera according to (5).
In the next step, 36 model points (Xi) were projected withP and (4) into the synthetic image in order to obtainxi. For the simulation of inaccuracies in the detection of edges in the image, Gaussian noise of varying magnitude is added to the Euclidean part of the true pixels (xi) in the further course. Noisy pixels (xi) result with the normally distributed random number N : xi =xi + σx i N . Then, the uncertain image points (xi, Σx i x i ) were joined to lines by join operation and variance propagation to obtain the uncertain image lines corresponding to the model edges (li, Σ l i l i ) as observations. A standard deviation in translation of 0.15 m and in rotation of 1 • was then set for the external orientation of the camera for artificial degradation (σcam). This resulted in the initial projection matrix P which is used in the adjustment as approximate initial solution.

RESULTS
P is put in the estimation model as initial value. The actual accuracy of the reference model does not matter here when using synthetic data. For LOA 10 to LOA 50, 100 simulations each were calculated with image lines of varying degrees of noise (σ l = 0.0, σ l = 0.5, σ l = 1.0, σ l = 1.5 and σ l = 3.0 pixel). After the adjustment, the parameters of the exterior orientation are only implicitly available in the vector with the estimated parameters (β).β relates top -the spherically normalized column vector containing the conditioned elements ofP. In order to derive the exterior orientation parameters,p is reshaped to the homogeneous 3x4 matrixP and re-conditioned with (26). Decomposition results in the coordinates of the projection center (X 0 ) and the rotation matrix R (with the rotation angles ω, φ and κ). The detailed results are shown in table 2.

DISCUSSION
The results show that the exterior orientation of the camera is optimized, provided the reference model has sufficient geometric quality. In the presented example using synthetic data, the camera pose is improved as long as the reference model can be assigned to at least LOA 30 (σ < 7.5 mm) and the standard deviation of the image points (for image line construction) for this camera is less than 3 pixels. While very good solutions are achieved with LOA 50 model accuracy, practically this demanding level of accuracy is very rarely achieved in real projects. In practical applications, however, it can be assumed that  essential interior elements such as doors, windows and walls are modeled at least LOA 30 compliant and are thus suitable as geometric references. In addition to the results shown, simulations were also calculated for LOD 20 and LOD 40, which also confirm this conclusion. While realtime matching is not our primary purpose, we focous on camera pose optimization. For maximum accuracy requirements (in the range of millimeters), relative rather than absolute accuracy could be used for parameter estimation. In this way, local changes would be precisely detectable in specially defined subsystems within the overall BIM. The GeoPose (OGC) standard will support this approach by its concept of frame transform chains for any real or digital object's pose. It encapsulates sufficient information to transform pose geometry in any other frame in its associated frame chain. The uncertainty in the geographic position can be integrated.

CONCLUSION AND OUTLOOK
We have shown that a coarse camera orientation can be improved to be accurate enough to reveal statements about relevant changes between the model and the actual state. BIM compliant indoor models with usual geometric quality are generally sufficient and useful as geodetic reference for change detection applications and construction progress documentation. The presented estimation model will be used in further experiments in the test environment with real data from different camera sensors and varying geometric conditions (e.g. spatial distribution of line references). In addition, further tests regarding feature matching and pose estimation in other environments are planned to investigate whether difficult conditions such as high architectural symmetry could effect the results. In real-world use, it can be assumed that the BIM compliant reference model is faulty, regardless of the LOA definition. The as-planned state will likely not fully match the as-built state. Since outliers, miss-matches and deviations are to be detected in order to update the digital twin properly, a robust estimation has to be realized including an option for re-weighting observations. Additionally, vanishing point detection will be integrated to further support the process of camera orientation indoors. Image based 3D reconstruction and change detection in the context of BIM is aimed. Therefore, the next step is to further validate the image measurement and reconstruction accuracy with optimized camera orientation. Also, the effect of the available image resolution on the method will be investigated.
In this context, also investigating state of the art sensors and technologies for visual localization within the interior of buildings for single images and image sequences in a visual-BIMbased SLAM approach is planned. These include the Iphone 12 Pro (Apple) with LiDAR sensor (if applicable with indoor localization infrastructures using beacons), the Intel RealSense depth camera including an IMU and stereo SLAM technology and Microsoft's HoloLens2.