Comparative analysis of monocular depth estimation in satellite two-dimensional images

Authors

DOI:

https://doi.org/10.18372/2073-4751.83.20547

Keywords:

monocular depth estimation, satellite images, metrics, DepthPro, Depth-Anything V2, DPT-Large, US3D, 3D reconstruction, Google Maps

Abstract

With the rapid growth in the use of satellite mapping services, there is a need for simple and effective methods and means of constructing 3D (three-dimensional) reconstructions of terrain using only 2D (two-dimensional) images as input, which ensure the reproducibility and comparability of results in applied urban analysis and navigation tasks. Monocular depth prediction models are a tool for formalizing such processes, as they allow the relative structure of a scene to be reconstructed with a minimum amount of input data, which helps to assess the relief and height contrasts of objects. Despite the availability of services that allow obtaining three-dimensional satellite images, their limitation is that they are either commercial or do not work in Ukraine. That is why the task is set to build such a model that will solve these problems. The paper considers pre-trained models and their application to satellite images. The models considered were trained on different datasets, so the quality of reconstruction was evaluated on pre-trained models, selected by quality, and trained on a dataset of satellite images. The results of the study can be used to develop methods for aircraft navigation, reconstruction, and terrain analysis, and allow for simpler and faster 3D reconstruction of arbitrary objects, although they are inferior in accuracy to resource-intensive methods. It should be noted that the proposed methodology allows the results to be applied to applied problems in other domains. It is reproducible, easy to use, and suitable for further scaling to larger samples and metric calibration problems.

References

M. Technologies, «Precision3D Data Suite,» Maxar Technologies, [Онлайновий]. Available: https://www.maxar.com/maxar-intelligence/products/precision3d-data-suite. [Дата звернення: 30 Вересня 2025].

Google, «Google Maps,» Google, [Онлайновий]. Available: https://mapsplatform.google.com/maps-products/3d-maps/. [Дата звернення: 30 Вересня 2025].

P. L. Guth, A. Van Niekerk, C. H. Grohmann, J.-P. Muller, L. Hawker, I. V. Florinsky, D. Gesch, H. I. Reuter, V. Herrera-Cruz, S. Riazanoff, C. López-Vázquez, C. C. Carabajal, C. Albinet та S, «Digital Elevation Models: Terminology and Definitions,» Remote Sensing, т. 13, № 18, 2021.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit та N. Houlsby, «An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale,» в International Conference on Learning Representations (ICLR), 2021.

D. J. Butler, J. Wulff, G. B. Stanley та M. J. Black, «A Naturalistic Open Source Movie for Optical Flow Evaluation,» в European Conference on Computer Vision (ECCV), 2012.

L. Mehl, J. Schmalfuss, A. Jahedi, Y. Nalivayko та A. Bruhn, «Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo,» в IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

D. Eigen, C. Puhrsch та R. Fergus, «Depth Map Prediction from a Single Image Using a Multi-Scale Deep Network,» в Advances in Neural Information Processing Systems (NeurIPS), 2014.

C. Godard, O. Mac Aodha та G. J. Brostow, «Unsupervised Monocular Depth Estimation with Left-Right Consistency,» в IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

C. Godard, O. Mac Aodha, M. Firman та G. J. Brostow, «Digging Into Self-Supervised Monocular Depth Estimation,» в IEEE/CVF International Conference on Computer Vision (ICCV), 2019.

R. Ranftl, K. Lasinger, D. Hafner, K. Schindler та V. Koltun, «Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer,» IEEE Transactions on Pattern Analysis and Machine Intelligence, т. 44, № 3, p. 1623–1637, 2022.

R. Ranftl, A. Bochkovskiy та V. Koltun, «Vision Transformers for Dense Prediction,» в IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

S. F. Bhat, I. Alhashim та P. Wonka, «AdaBins: Depth Estimation Using Adaptive Bins,» в IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

W. Yin, J. Zhang, O. Wang, S. Niklaus, L. Mai, S. Chen та C. Shen, «Learning to Recover 3D Scene Shape From a Single Image,» в IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

S. F. Bhat, R. Birkl, D. Wofk, P. Wonka та M. Müller, «ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth,» 2023. [Онлайновий]. Available: https://doi.org/10.48550/arXiv.2302.12288. [Дата звернення: 30 Вересня 2025].

M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski та A. Joulin, «Emerging Properties in Self-Supervised Vision Transformers,» в IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, N. Ballas, W. Galuba, R. Howes, P.-Y. Huang, S.-W. Li та Misra, «DINOv2: Learning Robust Visual Features without Supervision,» 2023. [Онлайновий]. Available: https://doi.org/10.48550/arXiv.2304.07193. [Дата звернення: 30 Вересня 2025].

R. Ranftl, A. Bochkovskiy та V. Koltun, «Vision Transformers for Dense Prediction,» в IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

L. Yang, B. Kang, Z. Huang, X. Xu, J. Feng та H. Zhao, «Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data,» в IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

B. K. Z. H. Z. Z. X. X. J. F. H. Z. Lihe Yang, «Depth Anything V2,» 2024. [Онлайновий]. Available: https://arxiv.org/abs/2406.09414. [Дата звернення: 2025 September 2025].

A. D. H. G. M. S. Y. Z. S. R. R. V. K. Aleksei Bochkovskii, «Sharp Monocular Metric Depth in Less Than a Second,» 2024. [Онлайновий]. Available: https://arxiv.org/abs/2410.02073. [Дата звернення: 30 Вересня 2025].

I. G. /. US3D, «US3D Dataset – Urban Semantic 3D (Jacksonville & Omaha),» [Онлайновий]. Available: https://eod-grss-ieee.com/dataset-detail/bS9HOHBpaEJuMVJ2ZWV4Z01NTGNvZz09. [Дата звернення: 30 Вересня 2025].

Published

2025-12-19

Issue

Section

Статті