DoubleTake

Estimating depth from a sequence of posed RGB images is a fundamental computer vision task, with applications in augmented reality, path planning etc. Prior work typically makes use of previous frames in a multi view stereo framework, relying on matching textures in a local neighborhood. In contrast, our model leverages historical predictions by giving the latest 3D geometry data as an extra input to our network. This self-generated geometric hint can encode information from areas of the scene not covered by the keyframes and it is more regularized when compared to individual predicted depth maps for previous frames. We introduce a Hint MLP which combines cost volume features with a hint of the prior geometry, rendered as a depth map from the current camera location, together with a measure of the confidence in the prior geometry. We demonstrate that our method, which can run at interactive speeds, achieves state-of-the-art estimates of depth and 3D scene reconstruction in offline, incremental, and revisit evaluation scenarios.

Our ScanNetV2 trained-model can also get high quality offline reconstructions via TSDF fusion in just 13.8s on average.

Approach

Overview (incremental)

Our key contribution is the injection of cheaply-available metadata into the feature volume. Each volumetric cell is then reduced in parallel with an MLP into a feature map before input into a 2D cost volume encoder-decoder. We also make use of an image encoder specifically used to enforce a strong image prior when propagating and correcting depth estimates from the cost volume throughout the frame in the cost volume encoder-decoder. This formulation is flexible and allows for three different operating modes: 1) incremental for online depth and reconstruction at 76.6ms per frame, 2) offline for high-quality offline depth and reconstruction at 13.8s per scene, and 3) revisit for depth estimation when revisiting locations after a long absence at 62.8ms per frame.

Hint MLP - Geometry Injection

Geometry Injection Our feature volume is reduced to a cost volume via a matching MLP. Our Hint MLP then combines the multi-view-stereo cost volume with an estimate of previously predicted geometry. For every location in the cost volume, the Hint MLP takes as input (i) the visual matching score, (ii) the geometry hint, formed as the absolute difference between the rendered depth hint and the depth plane at that cost volume position, and (iii) an estimate of the confidence of the hint at that pixel.

Results

Online Depths - incremental

Online Reconstructions - incremental

Offline Depths - offline

Offline Reconstructions - offline

Depths for Revisiting a Location - revisit

Out of Distribution Performance - offline

Our 2D based model performs well on out-of-distribution domains.

Resources

BibTeX

    @inproceedings{sayed2024doubletake,
      title={DoubleTake: Geometry Guided Depth Estimation},
      author={Mohamed Sayed and Filippo Aleotti and Jamie Watson and Zawar Qureshi and Guillermo Garcia-Hernando and Gabriel Brostow and Sara Vicente and Michael Firman},
      booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
      year={2024},
    }

DoubleTake

Geometry Guided Depth Estimation

ECCV 2024

Mohamed Sayed¹ Filippo Aleotti¹ Jamie Watson^{1, 2} Zawar Qureshi¹ Guillermo Garcia-Hernando¹ Gabriel Brostow^1,2 Sara Vicente¹ Michael Firman¹

Abstract

Approach

Overview (incremental)

Hint MLP - Geometry Injection

Results

Online Depths - incremental

Online Reconstructions - incremental

Offline Depths - offline

Offline Reconstructions - offline

Depths for Revisiting a Location - revisit

Out of Distribution Performance - offline

Resources

Paper

Supplemental

BibTeX

DoubleTake

Geometry Guided Depth Estimation

ECCV 2024

Mohamed Sayed1 Filippo Aleotti1 Jamie Watson1, 2 Zawar Qureshi1 Guillermo Garcia-Hernando1 Gabriel Brostow1,2 Sara Vicente1 Michael Firman1

Abstract

Approach

Overview (incremental)

Hint MLP - Geometry Injection

Results

Online Depths - incremental

Online Reconstructions - incremental

Offline Depths - offline

Offline Reconstructions - offline

Depths for Revisiting a Location - revisit

Out of Distribution Performance - offline

Resources

Paper

Supplemental

BibTeX

Mohamed Sayed¹ Filippo Aleotti¹ Jamie Watson^{1, 2} Zawar Qureshi¹ Guillermo Garcia-Hernando¹ Gabriel Brostow^1,2 Sara Vicente¹ Michael Firman¹