DoubleTake

Geometry Guided Depth Estimation

ECCV 2024


teaser.png

Abstract


Estimating depth from a sequence of posed RGB images is a fundamental computer vision task, with applications in augmented reality, path planning etc. Prior work typically makes use of previous frames in a multi view stereo framework, relying on matching textures in a local neighborhood. In contrast, our model leverages historical predictions by giving the latest 3D geometry data as an extra input to our network. This self-generated geometric hint can encode information from areas of the scene not covered by the keyframes and it is more regularized when compared to individual predicted depth maps for previous frames. We introduce a Hint MLP which combines cost volume features with a hint of the prior geometry, rendered as a depth map from the current camera location, together with a measure of the confidence in the prior geometry. We demonstrate that our method, which can run at interactive speeds, achieves state-of-the-art estimates of depth and 3D scene reconstruction in offline, incremental, and revisit evaluation scenarios.



Our ScanNetV2 trained-model can also get high quality offline reconstructions via TSDF fusion in just 13.8s on average.

Approach


Overview (incremental)


Our key contribution is the injection of cheaply-available metadata into the feature volume. Each volumetric cell is then reduced in parallel with an MLP into a feature map before input into a 2D cost volume encoder-decoder. We also make use of an image encoder specifically used to enforce a strong image prior when propagating and correcting depth estimates from the cost volume throughout the frame in the cost volume encoder-decoder. This formulation is flexible and allows for three different operating modes: 1) incremental for online depth and reconstruction at 76.6ms per frame, 2) offline for high-quality offline depth and reconstruction at 13.8s per scene, and 3) revisit for depth estimation when revisiting locations after a long absence at 62.8ms per frame.

Hint MLP - Geometry Injection

method_detail.png


Geometry Injection Our feature volume is reduced to a cost volume via a matching MLP. Our Hint MLP then combines the multi-view-stereo cost volume with an estimate of previously predicted geometry. For every location in the cost volume, the Hint MLP takes as input (i) the visual matching score, (ii) the geometry hint, formed as the absolute difference between the rendered depth hint and the depth plane at that cost volume position, and (iii) an estimate of the confidence of the hint at that pixel.

Results


Online Depths - incremental

Ours_incremental_depth.png

Online Reconstructions - incremental

Ours_incremental_depth.png

Offline Depths - offline

offline_depth.png

Offline Reconstructions - offline

offline_recon.png

Depths for Revisiting a Location - revisit

revisit_depths.png

Out of Distribution Performance - offline

OOD_Compare.png
Our 2D based model performs well on out-of-distribution domains.

Resources


Paper

Paper

Supplemental

Supplemental

BibTeX

If you find this work useful for your research, please cite:

    @inproceedings{sayed2024doubletake,
      title={DoubleTake: Geometry Guided Depth Estimation},
      author={Mohamed Sayed and Filippo Aleotti and Jamie Watson and Zawar Qureshi and Guillermo Garcia-Hernando and Gabriel Brostow and Sara Vicente and Michael Firman},
      booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
      year={2024},
    }
        

© This webpage was in part inspired from this template.