ParkingTwin: Training-Free Streaming 3D Reconstruction
for Parking-Lot Digital Twins

Xinhao Liu1 Yu Wang1 Xiansheng Guo1* Gordon Owusu Boateng2
Yu Cao1 Haonan Si1 Xingchen Guo3 Nirwan Ansari4
1 School of Information and Communication Engineering, University of Electronic Science and Technology of China
2 Department of Communications and Networking, Xi'an Jiaotong-Liverpool University
3 School of Electrical Engineering, Hebei University of Technology
4 Advanced Networking Lab., Department of Electrical and Computer Engineering, New Jersey Institute of Technology

Real-time reconstruction demo running on GTX 1660.

Abstract

High-fidelity digital twins of parking lots provide essential environmental priors for path planning, collision detection, and perception system validation of Automated Valet Parking (AVP). However, constructing such robotoriented twins faces a fundamental "trilemma" involving geometric ambiguity, environmental interference, and computational constraints: 1) The restricted and sparse forward-facing views of mobile platforms lead to geometric degeneration in traditional methods due to insufficient parallax; 2)Frequent dynamic occlusions (e.g., moving vehicles) and extreme lightingvariations impede consistent texture fusion; and 3) Existing neural rendering methods rely on computationally expensive offline optimization, failing to meet the real-time streaming requirements of edge-side robotics. To address these challenges, we propose ParkingTwin, a training-free, lightweight, and streaming 3D reconstruction system. The core innovations are three-fold: 1) OSM-Prior Driven Geometric Construction: We leverage OpenStreetMap (OSM) semantic topology to directly generate metric-consistent 3D Truncated Signed Distance Field (TSDF). This approache transforms "blind" geometric search into deterministic mapping, resolving the ill-posedness caused by sparse views while eliminating costly geometric optimization overhead. 2) Geometry-Aware Dynamic Filtering: We introduce a quad-modal geometric constraint field based on normal, height, and depth consistency to perform real-time rejection of dynamic vehicles and transient occlusions without prior training. 3) Illumination-Robust Fusion in the CIELAB Color Space: By incorporating adaptive L-channel weighting and depth gradient suppression, we decouple luminance and chromaticity in the perceptual space to eliminate seams and artifacts caused by abrupt lighting changes. Experiments demonstrate that our system achieves 30+ Frames Per Second (FPS) online streaming reconstruction on an entry-level GPU (GTX 1660). On a large-scale 68,000 m2 real-world dataset, our method achieves an Structural Similarity Index Measure (SSIM) of 0.87 (a 16.0% improvement), accelerates end-to-end processing by approximately 15×, and reduces video memory usage by 83.3% compared with state-of-the-art 3D Gaussian Splatting (3DGS) methods that require high-end GPUs (RTX 4090D). The system outputs explicit triangular meshes directly compatible with Unity/Unreal Engine (UE) digital twin workflows, effectively serving as an automated asset generator for initializing parking lot Digital Twins.

Methodology

System Architecture

Figure 1: The complete pipeline of the Parking Twin system. The system operates in three stages: (1) OSM-Prior Driven Geometric Initialization directly generates a metric-consistent TSDF mesh; (2) Geometry-Prior Based Dynamic Filtering utilizes multi-modal constraints to remove vehicles without training; (3) LAB Perceptual Fusion ensures seamless texturing under varying illumination conditions.

Dataset & Geometric Initialization

We introduce ICPARK, a large-scale (68,000 m²) real-world dataset. The acquisition strictly adhered to constraints typical of real-world inspection: sparse forward-facing views.

Vehicle Challenges

Figure 2: Dynamic challenges in the ICPARK dataset. High occlusion rates and diverse vehicle appearances.

OSM Map TSDF Mesh

Figure 6: (Left) OSM vector map. (Right) The generated 3D TSDF mesh exhibits clean topology and manifold geometry.

Interactive 3D Model

Explore the reconstructed parking lot in 3D. Use your mouse to rotate, zoom, and pan.

Loading 3D Model...
🖱️ Left Click + Drag: Rotate | Mouse Wheel: Zoom | Right Click + Drag: Pan

Interactive 3D reconstruction result of the parking lot. The model shows the complete geometric structure and texture details.

⚠️ Note: Due to page loading file size limitations, the interactive map displayed is a cropped 1/4 version. The complete model (449MB) can be downloaded at https://pan.quark.cn/s/d180e32624df

💡 Tip: For the best viewing experience, download the HTML file and open it locally with the mesh.ply file in the assets folder.

Qualitative Comparison

Trajectory
(a) Trajectory
Ours
(b) Parking Twin (Ours)
3DGS
(c) 3DGS
ESLAM
(d) ESLAM

Figure 3: Global reconstruction comparison. (b) Parking Twin generates a clean, vehicle-free floor plan. (c) 3DGS and (d) ESLAM exhibit ghosting artifacts and geometric noise.

Frame Comparison

Figure 4: Detailed texture quality. Row 2 (Ours) successfully removes dynamic vehicles and reconstructs clear signage. Rows 3-4 (Baselines) fail to remove vehicles and suffer from geometric holes.

OpenMVS Failure

Figure 5: Failure of Traditional MVS (OpenMVS) under sparse views. Due to lack of parallax, dense matching becomes ill-posed, resulting in >60% geometric loss.

Quantitative Results

Method PSNR ↑ SSIM ↑ LPIPS ↓ Time (min) ↓ VRAM (GB) ↓ Dyn. Removal GPU
3DGS 26.5 ± 0.3 0.75 ± 0.02 0.21 ± 0.01 74 ± 5 36.0 ± 2.0 No RTX 4090D
ESLAM 28.9 ± 0.4 0.82 ± 0.03 0.17 ± 0.02 243 ± 15 80.0 ± 5.0 No RTX PRO 6000
Parking Twin (Ours) 30.1 ± 0.2 0.87 ± 0.01 0.13 ± 0.01 5 ± 0.5 6.0 ± 0.5 Yes GTX 1660

Ablation Study

Effectiveness of OSM Prior

Depth Fusion
(a) Depth Fusion (no prior)
OSM Prior
(b) OSM Prior (Ours)

Figure 7: Depth Fusion suffers from noise and jagged boundaries, while OSM Prior ensures clean topology.

Cumulative Module Contributions

Baseline Local
(a) Baseline (RGB)
Veh Removal Local
(b) + Veh. Rem.
Full System Local
(c) Full System (LAB)

Figure 9 (Local View): Progressive ablation. Baseline shows ghosting; Vehicle Removal fixes geometry but leaves lighting seams; LAB Fusion eliminates seams.

Baseline Global
(a) Baseline
Veh Removal Global
(b) + Veh. Rem.
Full System Global
(c) Full System

Figure 10 (Global View): The LAB fusion strategy successfully achieves global color balance compared to RGB fusion.

Citation

@misc{liu2026parkingtwintrainingfreestreaming3d, title={ParkingTwin: Training-Free Streaming 3D Reconstruction for Parking-Lot Digital Twins}, author={Xinhao Liu and Yu Wang and Xiansheng Guo and Gordon Owusu Boateng and Yu Cao and Haonan Si and Xingchen Guo and Nirwan Ansari}, year={2026}, eprint={2601.13706}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2601.13706}, }