Welcome to the Old 4D Light Field Benchmark.
A complete description of the datasets and the acquisition process is available in the VMV 2013 paper:
“Datasets and Benchmarks for Densely Sampled 4D Light Fields”
THIS DATASET IS NOT SUPPORTED ANYMORE
Conversion between depth and disparity. To compare disparity results to the ground truth depth, the latter has to ﬁrst be converted to disparity. Given a depth Z, the disparity or slope of the epipolar lines d in pixels per grid unit is
where B is the baseline or distance between two cameras, f the focal length in pixel and Δx the shift between two neighboring images relative to an arbitrary rectification plane (in case of light ﬁelds generated with Blender, this is the scene origin). The parameters in the upper equation are given by the following attributes in the main HDF ﬁle:
|B||dH||distance between to cameras|
|Δx||shift||shift between neighboring images|
where Z0 is the distance between the blender camera and the scene origin in [BE], fov is the ﬁeld of view in units radian and b the distance between two cameras in [BE]. Since all light ﬁelds are rendered or captured on a regular equidistant grid, it is sufficient to use only the horizontal distance between two cameras to deﬁne the baseline.
For the real-world light ﬁelds, a Nikon D800 digital camera is mounted on a stepper-motor driven gantry manufactured by Physical Instruments. A picture of the setup can be seen in ﬁgure 4. Accuracy and repositioning error of the gantry is well in the micrometer range. The capturing time for a complete light ﬁeld depends on the number of images, about 15 seconds are required per image. As a consequence, this acquisition method is limited to static scenes. The internal camera matrix must be estimated beforehand by capturing images of a calibration pattern and invoking the camera calibration algorithms of the OpenCV library, see next section for details. Experiments have shown that the positioning accuracy of the gantry actually surpasses the pattern based external calibration as long as the diﬀerences between the sensor and movement planes are kept minimal.
Ground truth for the real world scenes was generated using standard pose estimation techniques. First, we acquired 3D polygon meshes for an object in the scene using a Breuckmann SmartscanHE structured light scanner. The meshes contain between 2.5 and 8 Million faces with a stated accuracy of down to 50 micron. The object-to-camera pose was estimated by hand-picking 2D-to-3D feature points from the light ﬁeld center view and the 3D mesh, and then calculating the external camera matrix using an iterative Levenberg-Marquardt approach from the OpenCV library [Bra00]. This method is used for both the internal and external calibration.
[Bra00] Bradski G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000).