Creating video that mimics human visual perception

Recent significant breakthroughs in core video processing techniques have nurtured video technology into one that looks aptly placed to contest the capabilities of the human visual system. For one, the last couple of decades have witnessed a phenomenal increase in the number of pixels accommodated by display systems, enabling the transition from standard-definition (SD) video to high-definition (HD) video.

Another noteworthy evolution is the stark enhancement in pixel quality, characterized by high dynamic range (HDR) systems as they elegantly displace their low dynamic range (LDR) equivalents.

Moreover, the intuitive approaches developed in the understanding of images to replicate the perceptual abilities of the human brain have met with encouraging successes, as have 3D video systems in their drive toward a total eclipse of their 2D counterparts.

These advanced techniques coerce toward a common purpose-to ensure the disappearance of boundaries between the real and digital worlds, achieved through the capture of videos that mimic the various aspects of human visual perception. These aspects fundamentally relate to video processing research in the fields of video capture, display technologies, data compression as well as understanding video content.

Video capture in 3D, HD and HDR

The two distinct technologies used in the capture of digital videos are the charge-coupled devices (CCD) and complementary metal-oxide-semiconductor (CMOS) image sensors, both of which convert light intensities into appropriate values of electric charges to be later processed as electronic signals.

Leveraging on a remarkable half-century of continued development, these technologies enable the capture of HD videos of exceptional quality. Nevertheless, in terms of HDR videos, these technologies pale in comparison to the capabilities of a typical human eye, itself boasting a dynamic range (the ratio of the brightest to darkest parts visible) of about 10000:1.

Existing digital camcorders can either only capture the brighter portions of a scene using short exposure durations or the darker portions using longer exposure durations.

Practically, this shortcoming can be circumvented with the use of multiple camcorders with one or two beam splitters, in which several video sequences are captured concurrently under different exposure settings.

Beam splitters allow for the simultaneous capture of identical LDR scenes, the best portions of which are then used to synthesize HDR videos. From a research perspective, the challenge is to achieve this feat of a higher dynamic range with the use of a single camcorder, albeit with an unavoidable but reasonable reduction in quality that is insignificantly perceivable.

Moreover, it is envisioned that HDR camcorders equipped with advanced image sensors may serve this purpose in the near future.

3D capture technologies widely employ stereoscopic techniques of obtaining stereo pairs using a two-view setup. Cameras are mounted side by side, with a separation typically equal to the distance between a person’s pupils.

Exploiting the idea that views from distant objects arrive at each eye along the same line of sight, while those from closer objects arrive at different angles, realistic 3D images can be obtained from the stereoscopic image pair.

Multi-view technology, an alternative to stereoscopy, captures 3D scenes by recording several independent video streams using an array of cameras. Additionally, plenoptic cameras, which capture the light field of a scene, can also be used for multiview capture with a single main lens. The resulting views can then either be shown on multiview displays or stored for further processing.