Extracting planes from a 3D scene is useful for downstream tasks in robotics and augmented reality. In this paper we tackle the problem of estimating the planar surfaces in a scene from posed images. Our first finding is that a surprisingly competitive baseline results from combining popular clustering algorithms with recent improvements in 3D geometry estimation. However, such purely geometric methods are understandably oblivious to plane semantics, which are crucial to discerning distinct planes. To overcome this limitation, we propose a method that predicts multi-view consistent plane embeddings that complement geometry when clustering points into planes. We show through extensive evaluation on the ScanNetV2 dataset that our new method outperforms existing approaches and our strong geometric baseline for the task of plane estimation.
Our method for 3D plane estimation. For each RGB keyframe we estimate per-pixel depth, planar probability and planar embedding. We fuse the depths and planar probabilities into a TSDF and extract a mesh. We then train a per-scene MLP to distill the per-pixel embeddings into 3D-consistent embeddings. These are finally grouped via clustering into 3D planes.
If you find this work useful for your research, please cite:
@inproceedings{watson-planes-2024, title = {AirPlanes: Accurate Plane Estimation via {3D}-Consistent Embeddings}, author = {Watson, Jamie and Aleotti, Filippo and Sayed, Mohamed and Qureshi, Zawar and Mac Aodha, Oisin and Brostow, Gabriel and Firman, Michael and Vicente, Sara}, booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2024}, }
We are extremely grateful to Saki Shinoda, Jakub Powierza, and Stanimir Vichev for their invaluable infrastructure support.
© This webpage was in part inspired by this template.