Software & Data Downloads — robust-rotation-estimation

Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
for estimating camera rotation in crowded, real-world scenes from handheld monocular video.

We present a novel approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video. While camera rotation estimation (and more general motion estimation) is a well-studied problem, no previous methods exhibit both high accuracy and acceptable speed in this setting. Because the setting is not addressed well by other data sets, we provide a new dataset and benchmark, with high-accuracy, rigorously tested ground truth on 17 video sequences. Our method uses a novel generalization of the Hough transform on SO3 to efficiently find the camera rotation most compatible with the optical flow. Methods developed for wide baseline stereo (e.g., 5-point methods) do not do well with the small baseline implicit in monocular video. Methods used in autonomous driving(e.g., on the KITTI dataset) leverage either a) limited camera rotation, limited acceleration, and large baselines seen in driving datasets, b) visibility of the ground plane (the road), or c) integration across multiple frames, and do not generalize well to handheld video. Finally, almost all methods have significant problems with moving objects in the scene. To address these cases, robustification techniques like RANSAC can help find solutions through extensive random sampling. However, these methods become extremely slow when the number of RANSAC iterations is large (as needed for dynamic scenes). Our method is more accurate by almost 40 percent than the next best method, and is similar in speed to the fastest available algorithms. This represents a strong new performance point for crowded scenes, an important and realistic setting for computer vision.

Related Publications
Delattre, F., Dirnfeld, D., Nguyen, P., Scarano, S., Jones, M.J., Miraldo, P., Learned-Miller, E., "Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes", IEEE International Conference on Computer Vision (ICCV), DOI: 10.1109/ICCV51070.2023.00894, October 2023, pp. 3715-3724.
BibTeX TR2023-123 PDF Video Software
- @inproceedings{Delattre2023oct,
- author = {Delattre, Fabien and Dirnfeld, David and Nguyen, Phat and Scarano, Stephen and Jones, Michael J. and Miraldo, Pedro and Learned-Miller, Erik},
- title = {Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes},
- booktitle = {IEEE International Conference on Computer Vision (ICCV)},
- year = 2023,
- pages = {3715--3724},
- month = oct,
- publisher = {IEEE/CVF},
- doi = {10.1109/ICCV51070.2023.00894},
- issn = {2380-7504},
- isbn = {979-8-3503-0718-4},
- url = {https://www.merl.com/publications/TR2023-123}
- }

Access software at https://github.com/merlresearch/robust-rotation-estimation.