Robust and affordable localization and mapping for 3d reconstruction. Application to architecture and construction

Delgado del Hoyo, Francisco Javier

Robust and affordable localization and mapping for 3d reconstruction. Application to architecture and construction

Delgado del Hoyo, Francisco Javier

Dirigida por:

Belén Palop del Río Directora

Universidad de defensa: Universidad de Valladolid

Fecha de defensa: 17 de diciembre de 2018

Tribunal:

Juan Carlos Torres Cantero Presidente/a
Valentín Cardeñoso Payo Secretario
Manuel Abellanas Oar Vocal

Departamento:

Didáctica de las Ciencias Experimentales, Sociales y de la Matemática

Tipo: Tesis

Teseo: 572433 DIALNET UVADOC editor

Resumen

Three-dimensional reconstruction is the process of estimating the unknown depth of the points in a scene. There are several approaches to solve this problem. For example, depth can be retrieved using active sensing (e.g. laser scans) or passive sensing (e.g. digital images). The former provides more expensive, accurate and clean measurements, whereas the latter provides multiple affordable noisy measurements. Both of them can capture the color of the point according to the radiance perceived by the sensor. Hence, the fundamental differences between active and passive sensing concern the acquisition cost of the device and the accuracy of the delivered measurements. Recent advances in passive sensing technologies have presented them as a compelling alternative to active sensing. Digital cameras are ubiquitous nowadays thanks to the popularization of smartphones. However, conventional 3D reconstruction is still not feasible on such devices due to the high computational demand of resources. The scientific community has stepped into this possibility with outstanding contributions in the last five years. Our research aims at joining to this effort, advancing in robust solutions that operate in most devices and scenes. At the same time, a good accuracy is advisable, especially in sequences recorded without challenging conditions. Affordable and ubiquitous technologies for 3D reconstruction pave the way for wider adoption of 3D digitalization. Three-dimensional reconstruction allows users to digitize buildings and other environments that serve as a source of documentation in more advanced applications. This sort of documentation is highly demanded by the AEC industry to keep track of the construction progress or to save the state of the building before a refurbishment. In addition, information from traditional 2D images can be augmented with the associated 3D model. Passive devices capture information about their environment and represent it with images. Image-based 3D reconstruction estimates the depth of any point based on its projection on multiple images. When this estimation is repeated for a large number of corresponding points in images, a point cloud of the scene is generated. In these devices, algorithms must deal with the uncertainty introduced by a large number of noisy measurements. Fortunately, such problems have been investigated by the robotics community in SLAM, which comprises the algorithms for tracking the camera pose at the same time that the map of the scene is being reconstructed. In this sense, SLAM can be perceived as a natural extension of 3D reconstruction when applied to robot navigation. Our aim is to provide an accurate 3D reconstruction with color point clouds of the scene from an off-the-shelf single-lens camera moving around the scene. In order to achieve this goal, we have developed KN-SLAM, a SLAM system that takes advantage of the contributions introduced by ORB-SLAM. The system follows a sparse indirect approach, guided by the ORB features detected on each frame. This is the most suitable approach for commodity cameras since their rolling shutter introduces photometric artifacts that difficult the tracking of the pixels. The tracking, mapping, loop detection and relocalization stages of the \slam pipeline rely on the quality of such features. More concretely, our contributions are the following: - an adaptive bootstrapping procedure that takes into account the number of failures; - an exhaustive, non-greedy, 2D-3D guided matching algorithm that ensures correspondences with a minimum distance between descriptors; - a constrained connectivity graph where each keyframe is linked to the best keyframes in order to keep the minimum number of edges but preserve the accuracy; and - a loop detection procedure based on smart thresholds selected according to the results achieved in previous optimizations. We have conducted an exhaustive evaluation of KN-SLAM in 47 sequences belonging to four heterogeneous datasets. The file formats and ground truths provided by these sequences have been normalized so that they can be compared following the same methodology. Several metrics have been considered to assess the performance, accuracy, and robustness of the trajectories estimated by KN-SLAM and ORB-SLAM. Our experimental results determine that, despite the high performance achieved by ORB-SLAM, KN-SLAM is able to improve the accuracy and robustness in more than half of the sequences, although it depends on the challenges introduced by such sequences. To gain more insight, we have carried out a further analysis of the results taking into account such challenges. We have assessed nine challenging characteristics for each sequence with a value on a five-level scale. This characterization has been combined with a class label for training an SVM classifier. This label corresponds to the system that achieves a lower ATE (KN-SLAM or ORB-SLAM). The coefficients of the trained classifier let us analyze what characteristics of the scene make KN-SLAM more suitable than ORB-SLAM. From this analysis, we have determined that KN-SLAM reduces the ATE in sequences with a good balance between rotations and translations, visual loops and poor illumination conditions. We conclude that the restricted connectivity of the graph harms the accuracy of the tracking when the camera is performing violent movements like pure rotations, but it helps to reach lower ATE when loops are closed and the drift is still tolerable. The output of a SLAM system is usually discrete (i.e. it is a point cloud instead of a triangulated 3D mesh). The mesh can be achieved by an additional 3D reconstruction stage, which is based on the volumetric fusion of multiple partial meshes. Different systems in different application domains can include the information provided by the reconstructed mesh in certain functionalities. One of the sectors more interested in 3D reconstruction and digitalization is the AEC industry. For example, fast and cheap 3D models can be applied to track the state of a construction w.r.t. the prior design. In addition, the sector is slowly transitioning from a CAD-based methodology to the BIM methodology, in which stakeholders collaborate around a shared model of the building. This model includes both the geometry and the semantics of the structural parts and the constructive processes of the building. Despite the benefits of BIM for the AEC industry, its adoption is being quite lazy. In most cases, it is imposed by the customer of the project. Given our deep knowledge of the sector and the suggestions received from several BIM stakeholders, we aim to facilitate the adoption of the BIM methodology with an easy-to-use cross-platform system called 3D-SIMOS. More specifically, our goal is to reduce the gap and pitfalls found at moving from CAD to BIM in the design phase. We address the challenge by providing AEC stakeholders with a solution to manage constructive processes over the structure of the building represented by a 3D mesh. Moreover, 3D-SIMOS does not only allow stakeholders to create, simulate, track and monitor the advances of the construction but also to visualize information about the building (e.g. to showcase the building to an interested customer). Advanced Visualization techniques have been applied to design smart interfaces over a cross-platform implementation that exploits the WebGL API of modern web browsers. Any project in 3D-SIMOS is generated from standard CAD 3D model and planning where the tasks and resources are arranged. Information from both schemas is integrated into the framework proposed by the IFC using an alignment between IFC, MPP and COLLADA. Furthermore, we have proposed an explicit symbolic representation to include dynamics of constructive processed over static representations of the building like IFC. This representation is based on an expandable dual graph that encodes the interconnected spaces of the building. In this graph, we have defined a computational framework for the functionals that evaluates the dynamic attributes of constructive processes. The evolution of such attributes in time describes a flow, which can be visualized with different types of fields. This system has been evaluated by combining qualitative and quantitative aspects extracted from interviews with experts of the AEC industry and our experimental results. We have concluded that 3D-SIMOS can promote the adoption of the BIM methodology by stakeholders that are still not accustomed to applying it in their current workflow. In addition, the system operates with good performance on middle-end smartphones and desktops, and it is also easy-to-use. However, some of the stakeholders have noted that the rendering quality of the visualization may not be compelling for showcasing the building to possible customers since computer-generated images are frequently accepted for this task. Besides, the oversimplification of the available functionalities makes the system naive from the perspective of a professional user that would require to integrate more powerful tools with 3D-SIMOS. Fortunately, the system was designed with extensibility in mind so the required effort to aggregate new functionalities would be minimal thanks to the underlying BIM model.