
Author: Jacob Marks (Machine Learning Engineer at Voxel51) Run and Evaluate Monocular Depth...
Author: Jacob Marks (Machine Learning Engineer at Voxel51) Run and Evaluate Monocular Depth...
**Monocular Depth Estimation** is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error. Source: [Defocus Deblurring Using Dual-Pixel Data ](https://arxiv.org/abs/2005.00305)
**Depth Estimation** is the task of measuring the distance of each pixel relative to the camera. Depth is extracted from either monocular (single) or stereo (multiple views of a scene) images. Traditional methods use multi-view geometry to find the relationship between the images. Newer methods can directly estimate depth by minimizing the regression loss, or by learning to generate a novel view from a sequence. The most popular benchmarks are KITTI and NYUv2. Models are typically evaluated according to a RMS metric. Source: [DIODE: A Dense Indoor and Outdoor DEpth Dataset ](https://arxiv.org/abs/1908.00463)