Real-time reconstruction demo running on GTX 1660.
Abstract
High-fidelity digital twins of parking lots provide essential environmental priors for path planning, collision detection, and perception system validation of Automated Valet Parking (AVP). However, constructing such robotoriented twins faces a fundamental "trilemma" involving geometric ambiguity, environmental interference, and computational constraints: 1) The restricted and sparse forward-facing views of mobile platforms lead to geometric degeneration in traditional methods due to insufficient parallax; 2)Frequent dynamic occlusions (e.g., moving vehicles) and extreme lightingvariations impede consistent texture fusion; and 3) Existing neural rendering methods rely on computationally expensive offline optimization, failing to meet the real-time streaming requirements of edge-side robotics. To address these challenges, we propose ParkingTwin, a training-free, lightweight, and streaming 3D reconstruction system. The core innovations are three-fold: 1) OSM-Prior Driven Geometric Construction: We leverage OpenStreetMap (OSM) semantic topology to directly generate metric-consistent 3D Truncated Signed Distance Field (TSDF). This approache transforms "blind" geometric search into deterministic mapping, resolving the ill-posedness caused by sparse views while eliminating costly geometric optimization overhead. 2) Geometry-Aware Dynamic Filtering: We introduce a quad-modal geometric constraint field based on normal, height, and depth consistency to perform real-time rejection of dynamic vehicles and transient occlusions without prior training. 3) Illumination-Robust Fusion in the CIELAB Color Space: By incorporating adaptive L-channel weighting and depth gradient suppression, we decouple luminance and chromaticity in the perceptual space to eliminate seams and artifacts caused by abrupt lighting changes. Experiments demonstrate that our system achieves 30+ Frames Per Second (FPS) online streaming reconstruction on an entry-level GPU (GTX 1660). On a large-scale 68,000 m2 real-world dataset, our method achieves an Structural Similarity Index Measure (SSIM) of 0.87 (a 16.0% improvement), accelerates end-to-end processing by approximately 15×, and reduces video memory usage by 83.3% compared with state-of-the-art 3D Gaussian Splatting (3DGS) methods that require high-end GPUs (RTX 4090D). The system outputs explicit triangular meshes directly compatible with Unity/Unreal Engine (UE) digital twin workflows, effectively serving as an automated asset generator for initializing parking lot Digital Twins.
Methodology
Figure 1: The complete pipeline of the Parking Twin system. The system operates in three stages: (1) OSM-Prior Driven Geometric Initialization directly generates a metric-consistent TSDF mesh; (2) Geometry-Prior Based Dynamic Filtering utilizes multi-modal constraints to remove vehicles without training; (3) LAB Perceptual Fusion ensures seamless texturing under varying illumination conditions.
Dataset & Geometric Initialization
We introduce ICPARK, a large-scale (68,000 m²) real-world dataset. The acquisition strictly adhered to constraints typical of real-world inspection: sparse forward-facing views.
Figure 2: Dynamic challenges in the ICPARK dataset. High occlusion rates and diverse vehicle appearances.
Figure 6: (Left) OSM vector map. (Right) The generated 3D TSDF mesh exhibits clean topology and manifold geometry.
Interactive 3D Model
Explore the reconstructed parking lot in 3D. Use your mouse to rotate, zoom, and pan.
Interactive 3D reconstruction result of the parking lot. The model shows the complete geometric structure and texture details.
⚠️ Note: Due to page loading file size limitations, the interactive map displayed is a cropped 1/4 version. The complete model (449MB) can be downloaded at https://pan.quark.cn/s/d180e32624df
💡 Tip: For the best viewing experience, download the HTML file and open it locally with the mesh.ply file in the assets folder.
Qualitative Comparison
Figure 3: Global reconstruction comparison. (b) Parking Twin generates a clean, vehicle-free floor plan. (c) 3DGS and (d) ESLAM exhibit ghosting artifacts and geometric noise.
Figure 4: Detailed texture quality. Row 2 (Ours) successfully removes dynamic vehicles and reconstructs clear signage. Rows 3-4 (Baselines) fail to remove vehicles and suffer from geometric holes.
Figure 5: Failure of Traditional MVS (OpenMVS) under sparse views. Due to lack of parallax, dense matching becomes ill-posed, resulting in >60% geometric loss.
Quantitative Results
| Method | PSNR ↑ | SSIM ↑ | LPIPS ↓ | Time (min) ↓ | VRAM (GB) ↓ | Dyn. Removal | GPU |
|---|---|---|---|---|---|---|---|
| 3DGS | 26.5 ± 0.3 | 0.75 ± 0.02 | 0.21 ± 0.01 | 74 ± 5 | 36.0 ± 2.0 | No | RTX 4090D |
| ESLAM | 28.9 ± 0.4 | 0.82 ± 0.03 | 0.17 ± 0.02 | 243 ± 15 | 80.0 ± 5.0 | No | RTX PRO 6000 |
| Parking Twin (Ours) | 30.1 ± 0.2 | 0.87 ± 0.01 | 0.13 ± 0.01 | 5 ± 0.5 | 6.0 ± 0.5 | Yes | GTX 1660 |
Ablation Study
Effectiveness of OSM Prior
Figure 7: Depth Fusion suffers from noise and jagged boundaries, while OSM Prior ensures clean topology.
Cumulative Module Contributions
Figure 9 (Local View): Progressive ablation. Baseline shows ghosting; Vehicle Removal fixes geometry but leaves lighting seams; LAB Fusion eliminates seams.
Figure 10 (Global View): The LAB fusion strategy successfully achieves global color balance compared to RGB fusion.