Real-time focus range sensor
S. K. Nayar and M. Watanabe and M. Noguchi
IEEE Trans. on Pattern Analysis and Machine Intelligence  18  1186-1198  (1996)
Structures of dynamic scenes can only be recovered using a real-time range sensor. Depth from defocus offers an effective solution to fast and dense range estimation. However, accurate depth estimation requires theoretical and practical solutions to a variety of problems including recovery of textureless surfaces, precise blur estimation, and magnification variations caused by defocusing. Both textured and textureless surfaces are recovered using an illumination pattern that is projected via the same optical path used to acquire images. The illumination pattern is optimized to maximize accuracy and spatial resolution in computed depth. The relative blurring in two images is computed using a narrow-band linear operator that is designed by considering all the optical, sensing, and computational elements of the depth from defocus system. Defocus invariant magnification is achieved by the use of an additional aperture in the imaging optics. A prototype focus range sensor has been developed that has a workspace of 1 cubic foot and produces up to 512 times;480 depth estimates at 30 Hz with an average RMS error of 0.2%. Several experimental results are included to demonstrate the performance of the sensor
Shape from focus
S. K. Nayar and Y. Nakagawa
IEEE Trans. on Pattern Analysis and Machine Intelligence  16  824-831  (1994)

The shape from focus method presented here uses different focus levels to obtain a sequence of object images. The sum-modified-Laplacian (SML) operator is developed to provide local measures of the quality of image focus. The operator is applied to the image sequence to determine a set of focus measures at each image point. A depth estimation algorithm interpolates a small number of focus measure values to obtain accurate depth estimates. A fully automated shape from focus system has been implemented using an optical microscope and tested on a variety of industrial samples. Experimental results are presented that demonstrate the accuracy and robustness of the proposed method. These results suggest shape from focus to be an effective approach for a variety of challenging visual inspection tasks
Depth from defocus vs. stereo: how different really are they?
Y. Y. Schechner and N. Kiryati
  2  1784-1786  (1998)
Depth from focus (DFF) and depth from defocus (DFD) methods are shown to be realizations of the geometric triangulation principle. Fundamentally, the depth sensitivities of DFF and DFD are not different than those of stereo (or motion) based systems having the same physical dimensions. Contrary to common belief DFD does not inherently avoid the matching (correspondence) problem. Basically DFD and DFF do not avoid the occlusion problem any more than triangulation techniques, but they are more stable in the presence of such disruptions. The fundamental advantage of DFF and DFD methods is the two-dimensionality of the aperture, allowing more robust estimation. These results elucidate the limitations of methods based on depth of field and provide a foundation for fair performance comparison between DFF/DFD and shape from stereo (or motion) algorithms
Depth estimation and image restoration using defocused stereo pairs
A. N. Rajagopalan and S. Chaudhuri and U. Mudenagudi
IEEE Trans. on Pattern Analysis and Machine Intelligence  26  1521 -1525  (2004)
We propose a method for estimating depth from images captured with a real aperture camera by fusing defocus and stereo cues. The idea is to use stereo-based constraints in conjunction with defocusing to obtain improved estimates of depth over those of stereo or defocus alone. The depth map as well as the original image of the scene are modeled as Markov random fields with a smoothness prior, and their estimates are obtained by minimizing a suitable energy function using simulated annealing. The main advantage of the proposed method, despite being computationally less efficient than the standard stereo or DFD method, is simultaneous recovery of depth as well as space-variant restoration of the original focused image of the scene.
Stereoscopic {3D-TV}: Visual Comfort
W. J. Tam and F. Speranza and S. Yano and K. Shimono and H. Ono
Broadcasting, IEEE Transactions on  57  335 -346  (2011)

Among the key topics of discussion and research on three-dimensional television (3D-TV), visual comfort is certainly one of the most critical. This is because it is well known that some viewers experience visual discomfort when looking at stereoscopic displays. It is important to properly address the issue of visual comfort to avoid possible delays in the deployment of 3D-TV. Here we present a concise overview of the main topics relevant to comfort in viewing stereoscopic television and survey the key factors influencing visual comfort. Potential end users of 3D-TV, content creators, program providers, broadcasters, display manufacturers and researchers will find this overview useful.
Potential hazards of viewing 3-{D} stereoscopic television, cinema and computer games: a review
P. A. Howarth
Ophthalmic and Physiological Optics  31  111--122  (2011)

The visual stimulus provided by a 3-D stereoscopic display differs from that of the real world because the image provided to each eye is produced on a flat surface. The distance from the screen to the eye remains fixed, providing a single focal distance, but the introduction of disparity between the images allows objects to be located geometrically in front of, or behind, the screen. Unlike in the real world, the stimulus to accommodation and the stimulus to convergence do not match. Although this mismatch is used positively in some forms of Orthoptic treatment, a number of authors have suggested that it could negatively lead to the development of asthenopic symptoms. From knowledge of the zone of clear, comfortable, single binocular vision one can predict that, for people with normal binocular vision, adverse symptoms will not be present if the discrepancy is small, but are likely if it is large, and that what constitutes `large' and `small' are idiosyncratic to the individual. The accommodation-convergence mismatch is not, however, the only difference between the natural and the artificial stimuli. In the former case, an object located in front of, or behind, a fixated object will not only be perceived as double if the images fall outside Panum's fusional areas, but it will also be defocused and blurred. In the latter case, however, it is usual for the producers of cinema, TV or computer game content to provide an image that is in focus over the whole of the display, and as a consequence diplopic images will be sharply in focus. The size of Panum's fusional area is spatial frequency-dependent, and because of this the high spatial frequencies present in the diplopic 3-D image will provide a different stimulus to the fusion system from that found naturally.
Nonlinear disparity mapping for stereoscopic {3D}
M. Lang and A. Hornung and O. Wang and S. Poulakos and A. Smolic and M. Gross
    75:1--75:10  (2010)

This paper addresses the problem of remapping the disparity range of stereoscopic images and video. Such operations are highly important for a variety of issues arising from the production, live broadcast, and consumption of 3D content. Our work is motivated by the observation that the displayed depth and the resulting 3D viewing experience are dictated by a complex combination of perceptual, technological, and artistic constraints. We first discuss the most important perceptual aspects of stereo vision and their implications for stereoscopic content creation. We then formalize these insights into a set of basic disparity mapping operators. These operators enable us to control and retarget the depth of a stereoscopic scene in a nonlinear and locally adaptive fashion. To implement our operators, we propose a new strategy based on stereoscopic warping of the input video streams. From a sparse set of stereo correspondences, our algorithm computes disparity and image-based saliency estimates, and uses them to compute a deformation of the input views so as to meet the target disparities. Our approach represents a practical solution for actual stereo production and display that does not require camera calibration, accurate dense depth maps, occlusion handling, or inpainting. We demonstrate the performance and versatility of our method using examples from live action post-production, 3D display size adaptation, and live broadcast. An additional user study and ground truth comparison further provide evidence for the quality and practical relevance of the presented work.
Computational stereo camera system with programmable control loop
S. Heinzle and P. Greisen and D. Gallup and C. Chen and D. Saner and A. Smolic and A. Burg and W. Matusik and M. Gross
    94:1--94:10  (2011)

Stereoscopic 3D has gained significant importance in the entertainment industry. However, production of high quality stereoscopic content is still a challenging art that requires mastering the complex interplay of human perception, 3D display properties, and artistic intent. In this paper, we present a computational stereo camera system that closes the control loop from capture and analysis to automatic adjustment of physical parameters. Intuitive interaction metaphors are developed that replace cumbersome handling of rig parameters using a touch screen interface with 3D visualization. Our system is designed to make stereoscopic 3D production as easy, intuitive, flexible, and reliable as possible. Captured signals are processed and analyzed in real-time on a stream processor. Stereoscopy and user settings define programmable control functionalities, which are executed in real-time on a control processor. Computational power and flexibility is enabled by a dedicated software and hardware architecture. We show that even traditionally difficult shots can be easily captured using our system.
Concurrent monoscopic and stereoscopic animated film production
R. Neuman
    38  (2009)

The number of theater screens domestically that are equipped for digital 3D exhibition is currently only about one quarter of the total number that are reached by an animated feature film in wide release. Any such film could not ignore the aesthetic demands particular to 2D exhibition on a statistical basis alone. However, the cost and effort of producing a 3D version, despite the numerical disadvantage, might indicate the type of commitment to this burgeoning medium that would dictate putting out only the best 3D product. As it is not always practical to create two completely artistically divergent versions of a film, the manner in which a production navigates through the compromises between the two will determine the success of the results. In the face of this reality, the production pipeline for Bolt was designed with the goal of delivering the full artistic vision of the directors for the 2D film that the majority of filmgoers would see, yet deliver an uncompromising immersive experience to 3D audiences.
Depth Director: A System for Adding Depth to Movies
B. Ward and S. B. Kang and E. P. Bennett
Computer Graphics and Applications, IEEE  31  36 -48  (2011)

Depth Director is an interactive system for converting 2D footage to 3D. It integrates recent computer vision advances with specialized tools that let users accurately recreate or stylistically manipulate 3D depths.
Stereoscopic Cinema
F. Devernay and P. Beardsley
Image-Based Rendering
S. B. Kang and Y. Li and X. Tong and H.-Y. Shum
  2    (2006)

Coarse-to-fine stereo vision with accurate {3D} boundaries
M. Sizintsev and R. P. Wildes
Image and Vision Computing  28  352 - 366  (2010)
This paper presents methods for efficient recovery of accurate binocular disparity estimates in the vicinity of 3D surface discontinuities. Of particular concern are methods that impact coarse-to-fine, local block-based matching as it forms the basis of the fastest and the most resource efficient stereo computation procedures. A novel coarse-to-fine refinement procedure that adapts match window support across scale to ameliorate corruption of disparity estimates near boundaries is presented. Extensions are included to account for half-occlusions and colour uniformity. Empirical results show that incorporation of these advances in the standard coarse-to-fine, block matching framework reduces disparity errors by more than a factor of two, while performing little extra computation, preserving low complexity and the parallel/pipeline nature of the framework. Moreover, the proposed advances prove to be beneficial for CTF global matchers as well.
Real-time stereo-based view synthesis algorithms: A unified framework and evaluation on commodity {GPU}s
S. Rogmans and J. Lu and P. Bekaert and G. Lafruit
Signal Processing: Image Communication  24  49-64  (2009)
Novel view synthesis based on dense stereo correspondence is an active research problem. Despite that many algorithms have been proposed recently, this flourishing, cross-area research field still remains relatively less structured than its front-end constituent part, stereo correspondence. Moreover, so far little work has been done to assess different stereo-based view synthesis algorithms, particularly when real-time execution is enforced as a hard application constraint. In this paper, we first propose a unified framework that seamlessly connects stereo correspondence and view synthesis. The proposed framework dissects the typical algorithms into a common set of individual functional modules, allowing the comparison of various design decisions. Aligned with this algorithmic framework, we have developed a flexible GPU-accelerated software model, which contains optimized implementations of several recent real-time algorithms, specifically focusing on local cost aggregation and image warping modules. Based on this common software model running on graphics hardware, we evaluate the relative performance of various design combinations in terms of both view synthesis quality and real-time processing speed. This comparative evaluation leads to a number of observations, and hence offers useful guides to the future design of real-time stereo-based view synthesis algorithms.
Fractional Stereo Matching Using Expectation-Maximization
W. Xiong and H. S. Chung and J. Jia
IEEE Transactions on Pattern Analysis and Machine Intelligence  31  428--443  (2009)
A Stereo Approach that Handles the Matting Problem via Image Warping
M. Bleyer and M. Gelautz and C. Rother and C. Rhemann
Stereo Matching on Objects with Fractional Boundary
W. Xiong and J. Jia
A compact algorithm for rectification of stereo pairs
A. Fusiello and E. Trucco and A. Verri
Machine Vision and Applications  12  16--22  (2000)

Multiple-View Geometry in Computer Vision
R. Hartley and A. Zisserman

Theory and Practice of Projective Rectification
R. Hartley
International Journal of Computer Vision  35  115--127  (1999)

Computer Vision: A Modern Approach
D. Forsyth and J. Ponce

reviews of {F}oundations of the Stereoscopic Cinema by {L}enny {L}ipton
C. Smith and S. Benton
Optical Engineering  22    (1983)
Self Image Rectification for Uncalibrated Stereo Video with Varying Camera Motions and Zooming Effects
C.-M. Cheng and S.-H. Lai and S.-H. Su
In this paper, we propose a novel self image rectification algorithm for uncalibrated stereo video sequences. Different from conventional stereo systems, this algorithm performs adaptive calibration that allows unequal motions and zooming effects in both cameras. For the first stereo frame, we estimate a reduced set of camera parameters through a nonlinear optimization process to minimize the geometric errors of the corresponding points in pre-rectified image coordinates. For the subsequent frames, these parameters are updated via minimizing the objective function that jointly considers the geometric errors and the smoothness constraints over temporal variations. The experimental results of applying this algorithm to two real sequences are shown to demonstrate its superior performance in reliable rectification distortions and robustness against outliers.
Evaluating Methods for Controlling Depth Perception in Stereoscopic Cinematography
G. Sun and N. Holliman
  7237    (2009)
Stereoscopic image quality metrics and compression
P. Gorley and N. Holliman
Stereoscopic Displays and Applications XIX  6803  680305  (2008)
We are interested in metrics for automatically predicting the compression settings for stereoscopic images so that we can minimize file size, but still maintain an acceptable level of image quality. Initially we investigate how Peak Signal to Noise Ratio (PSNR) measures the quality of varyingly coded stereoscopic image pairs. Our results suggest that symmetric, as opposed to asymmetric stereo image compression, will produce significantly better results. However, PSNR measures of image quality are widely criticized for correlating poorly with perceived visual quality. We therefore consider computational models of the Human Visual System (HVS) and describe the design and implementation of a new stereoscopic image quality metric. This, point matches regions of high spatial frequency between the left and right views of the stereo pair and accounts for HVS sensitivity to contrast and luminance changes at regions of high spatial frequency, using Michelson's Formula and Peli's Band Limited Contrast Algorithm. To establish a baseline for comparing our new metric with PSNR we ran a trial measuring stereoscopic image encoding quality with human subjects, using the Double Stimulus Continuous Quality Scale (DSCQS) from the ITU-R BT.500-11 recommendation. The results suggest that our new metric is a better predictor of human image quality preference than PSNR and could be used to predict a threshold compression level for stereoscopic image pairs.
Perceived quality of compressed stereoscopic images: Effects of symmetric and asymmetric {JPEG} coding and camera separation
P. Seuntiens and L. Meesters and W. Ijsselsteijn
ACM Trans. Appl. Percept.  3  95--109  (2009)

JPEG compression of the left and right components of a stereo image pair is a way to save valuable bandwidth when transmitting stereoscopic images. This paper presents results on the effects of camera-base distance (B) and JPEG coding on overall image quality, perceived depth, perceived sharpness, and perceived eye strain. In the experiment, two stereoscopic still scenes were used, varying in depth (three different camera-base distances: 0, 8, and 12 cm) and compression ratio (4 levels: original, 1:30, 1:40, and 1:60). All levels of compression were applied to both the left and right stereo image, resulting in a 4 x 4 matrix of all possible symmetric and asymmetric coding combinations. The observers were asked to assess image quality, sharpness, depth, and eye strain. Results showed that an increase in JPEG coding had a negative effect on image quality, sharpness, and eye strain, but had no effect on perceived depth. An increase in camera-base distance increased perceived depth and reported eye strain, but had no effect on perceived sharpness. Results on asymmetric and symmetric coding showed that the relationship between perceived image quality and average bit rate is not straightforward. In some cases, image quality ratings of a symmetric coded pair can be higher than for an asymmetric coded pair, even if the averaged bit rate for the symmetric pair is lower, than for the asymmetric pair. Furthermore, sharpness and eye strain correlated highly and medium, respectively, with perceived image quality.
Human perception of mismatched stereoscopic {3D} inputs
L. B. Stelmach and W. J. Tam and D. V. Meegan and A. Vincent and P. Corriveau
  1  5-8  (2000)

The bandwidth required to transmit stereoscopic video images is nominally twice that required for standard, monoscopic images. One method of reducing the required bandwidth is to code the two video streams asymmetrically. We assessed the perceptual impact of this bandwidth-reduction technique for low-pass filtering, DCT-based quantization, and a combination of filtering and quantization. It was found that the binocular percept depended on the type of degradation: for low-pass filtering, the binocular percept was dominated by the high-quality image, whereas for quantization it corresponded to the average of the inputs to the two eyes. The results indicated that asymmetrical coding is a promising technique for reducing storage and transmission bandwidth of stereoscopic sequences
Estimation of omnidirectional camera model from epipolar geometry
B. Micusik and T. Pajdla
  1  485--490  (2003)
{3D} Movie Making: Stereoscopic Digital Cinema from Script to Screen
B. Mendiburu

Making big things look small: The effect of blur on perceived scale
R. T. Held and E. A. Cooper and J. F. O'Brien and M. S. Banks
ACM Trans. Graph.      (2009)

The visual perception of {3-D} shape from multiple cues: Are observers capable of perceiving metric structure?
J. T. Todd and J. F. Norman
Perception \& Psychophysics  65  31-47  (2003)
Three experiments are reported in which observers judged the three-dimensional (3-D) structures of virtual or real objects defined by various combinations of texture, motion, and binocular disparity under a wide variety of conditions. The tasks employed in these studies involved adjusting the depth of an object to match its width, adjusting the planes of a dihedral angle so that they appeared orthogonal, and adjusting the shape of an object so that it appeared to match another at a different viewing distance. The results obtained on all of these tasks revealed large constant errors and large individual differences among observers. There were also systematic failures of constancy over changes in viewing distance, orientation, or response task. When considered in conjunction with other, similar reports in the literature, these findings provide strong evidence that human observers do not have accurate perceptions of 3-D metric structure.
The visual perception of {3D} shape
J. T. Todd
Trends in Cognitive Sciences  8  115 - 121  (2004)

Shape from Specularities: Computation and Psychophysics
A. Blake and H. Bulthoff
Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences  331  237--252  (1991)

10.1098/rstb.1991.0012 Images of artificial and natural scenes typically contain many `specularities' generated by mirror-like reflection from glossy surfaces. Until fairly recently computational models of visual processes have tended to regard specularities as obscuring underlying scene structure. Mathematical modelling shows that, on the contrary, they are rich in local geometric information. Recent psychophysical findings support the notion that the brain can apply that information. Our results concern the inference of 3D structure from 2D shaded images of glossy surfaces. Stereoscopically viewed highlights or `specularities' are found to serve as cues for 3D local surface-geometry.
Does the brain know the physics of specular reflection?
A. Blake and H. Bülthoff
Nature  343  165--168  (1990)

Focus cues affect perceived depth
S. J. Watt and K. Akeley and M. O. Ernst and M. S. Banks
J. Vis.  5  834-862  (2005)
Depth information from focus cues---accommodation and the gradient of retinal blur---is typically incorrect in three-dimensional (3-D) displays because the light comes from a planar display surface. If the visual system incorporates information from focus cues into its calculation of 3-D scene parameters, this could cause distortions in perceived depth even when the 2-D retinal images are geometrically correct. In Experiment 1 we measured the direct contribution of focus cues to perceived slant by varying independently the physical slant of the display surface and the slant of a simulated surface specified by binocular disparity (binocular viewing) or perspective/texture (monocular viewing). In the binocular condition, slant estimates were unaffected by display slant. In the monocular condition, display slant had a systematic effect on slant estimates. Estimates were consistent with a weighted average of slant from focus cues and slant from disparity/texture, where the cue weights are determined by the reliability of each cue. In Experiment 2, we examined whether focus cues also have an indirect effect on perceived slant via the distance estimate used in disparity scaling. We varied independently the simulated distance and the focal distance to a disparity-defined 3-D stimulus. Perceived slant was systematically affected by changes in focal distance. Accordingly, depth constancy (with respect to simulated distance) was significantly reduced when focal distance was held constant compared to when it varied appropriately with the simulated distance to the stimulus. The results of both experiments show that focus cues can contribute to estimates of 3-D scene parameters. Inappropriate focus cues in typical 3-D displays may therefore contribute to distortions in perceived space.
A lighting reproduction approach to live-action compositing
P. Debevec and A. Wenger and C. Tchou and A. Gardner and J. Waese and T. Hawkins
ACM Trans. Graph. (Proc. ACM SIGGRAPH 2002)  21  547--556  (2002)

We describe a process for compositing a live performance of an actor into a virtual set wherein the actor is consistently illuminated by the virtual environment. The Light Stage used in this work is a two-meter sphere of inward-pointing RGB light emitting diodes focused on the actor, where each light can be set to an arbitrary color and intensity to replicate a real-world or virtual lighting environment. We implement a digital two-camera infrared matting system to composite the actor into the background plate of the environment without affecting the visible-spectrum illumination on the actor. The color reponse of the system is calibrated to produce correct color renditions of the actor as illuminated by the environment. We demonstrate moving-camera composites of actors into real-world environments and virtual sets such that the actor is properly illuminated by the environment into which they are composited.
Coraline, cornered
D. Bordwell
Health issues with virtual reality displays: What we do know and what we don't
J. P. Wann and M. Mon-Williams
ACM SIGGRAPH Computer Graphics  31  53--57  (1997)

During the late 1980s and early 1990s virtual reality (VR) technology enjoyed a prolonged honeymoon with the international media who presented a glossy futuristic image of the technology. It was inevitable that the media would eventually tire of this image and look for new journalistic angles, such as the negative effects of VR. Some speculation then ensued about the negative social consequences of VR in the public domain ("Social autism" and "The end of civilisation as we know it…" --- Stone, 1992, BBC Horizon), although these speculations now appear unfounded.The first direct assertion, in the international media, that there might be visual safety issues with VR technology, came from the reporting of findings from Mon-Williams, Wann and Rushton in 1993 [5]. Since then a steady trickle of media features have strongly hinted at virtual problems that may arise (e.g. Business Week, July 10, 1995; New Scientist, Jan 27, 1996), each being followed by accusations of "scare-mongering" from some sectors of the VR community. Our research findings have fueled some of the negative reports, and our position has angered some VR protagonists. Also, like other research groups in this field, we are often approached by journalists in pursuit of a sensationalist story.Hence we think it is timely to examine what we know about the effects of virtual reality displays; what we don't know about virtual reality displays; and what research should be undertaken to resolve the unknown issues.
Real-time, accurate depth of field using anisotropic diffusion and programmable graphics cards
M. Bertalmio and P. Fort and D. Sanchez-Crespo
    767-773  (2004)

Computer graphics cameras lack the finite depth of field (DOF) present in real world ones. This results in all objects being rendered sharp regardless of their depth, reducing the realism of the scene. On top of that, real-world DOF provides a depth cue, that helps the human visual system decode the elements of a scene. Several methods have been proposed to render images with finite DOF, but these have always implied an important trade-off between speed and accuracy. We introduce a novel anisotropic diffusion partial differential equation (PDE) that is applied to the 2D image of the scene rendered with a pin-hole camera. In this PDE, the amount of blurring on the 2D image depends on the depth information of the 3D scene, present in the Z-buffer. This equation is well posed, has existence and uniqueness results, and it is a good approximation of the optical phenomenon, without the visual artifacts and depth inconsistencies present in other approaches. Because both inputs to our algorithm are present at the graphics card at every moment, we can run the processing entirely in the GPU. This fact, coupled with the particular numerical scheme chosen for our PDE, allows for real-time rendering using a programmable graphics card.
2 Worlds in 3 Dimensions
P. Kozachik
American Cinematographer  90  26  (2009)

Stereoscopic and autostereoscopic display systems
I. Sexton and P. Surman
IEEE Signal Processing Magazine  16  85-99  (1999)

In this article, the requirements for 3D television are outlined, and the suitability of different display types are considered. A brief description of our own approach is offered, which suggests that the first generation of 3D television will be a two-image system comprising a pair of images directed to the appropriate viewers' eyes. This will be achieved by means of steering optics controlled by a head-position tracker
Relighting human locomotion with flowed reflectance fields
C.-F. Chabert and P. Einarsson and A. Jones and B. Lamond and W.-C. Ma and S. Sylwan and T. Hawkins and P. Debevec
    76  (2006)
On New View Synthesis Using Multiview Stereo
O. Woodford and I. D. Reid and P. H. S. Torr and A. W. Fitzgibbon
  2  1120--1129  (2007)
{3-D} Television, Movies and Computer Graphics Without Glasses
R. B. Collender
IEEE Trans. on Consumer Electronics    56--61  (1986)

Television on standard bandwidth, theatre movies and sophisticated computer graphics-all in three dimensions without the need to wear special glasses are feasible with contemporary technology. This paper shows a viable approach to achieving the above with state of the art equipment. The 3-D results resemble a hologram extending to infinity with full color and without any special viewing zones or noticable anomallies over the entire viewing field.
A Survey of {3DTV} Displays: Techniques and Technologies
P. Benzie and J. Watson and P. Surman and I. Rakkolainen and K. Hopf and H. Urey and V. Sainov and C. von Kopylow
IEEE Trans. on Circuits and Systems for Video Technology  17  1647-1658  (2007)

The display is the last component in a chain of activity from image acquisition, compression, coding transmission and reproduction of 3-D images through to the display itself. There are various schemes for 3-D display taxonomy; the basic categories adopted for this paper are: holography where the image is produced by wavefront reconstruction, volumetric where the image is produced within a volume of space and multiple image displays where two or more images are seen across the viewing field. In an ideal world a stereoscopic display would produce images in real time that exhibit all the characteristics of the original scene. This would require the wavefront to be reproduced accurately, but currently this can only be achieved using holographic techniques. Volumetric displays provide both vertical and horizontal parallax so that several viewers can see 3-D images that exhibit no accommodation/convergence rivalry. Multiple image displays fall within three fundamental types: holoform in which a large number of views give smooth motion parallax and hence a hologram-like appearance, multiview where a series of discrete views are presented across viewing field and binocular where only two views are presented in regions that may occupy fixed positions or follow viewers' eye positions by employing head tracking. Holography enables 3-D scenes to be encoded into an interference pattern, however, this places constraints on the display resolution necessary to reconstruct a scene. Although holography may ultimately offer the solution for 3DTV, the problem of capturing naturally lit scenes will first have to be solved and holography is unlikely to provide a short-term solution due to limitations in current enabling technologies. Liquid crystal, digital micromirror, optically addressed liquid crystal and acoustooptic spatial light modulators (SLMs) have been employed as suitable spatial light modulation devices in holography. Liquid crystal SLMs are generally favored owing to the c- ommercial availability of high fill factor, high resolution addressable devices. Volumetric displays provide both vertical and horizontal parallax and several viewers are able to see a 3-D image that exhibits no accommodation/convergence rivalry. However, the principal disadvantages of these displays are: the images are generally transparent, the hardware tends to be complex and non-Lambertian intensity distribution cannot be displayed. Multiple image displays take many forms and it is likely that one or more of these will provide the solution(s) for the first generation of 3DTV displays.
{3D} motion picture theatre
R. B. Collender and M. A. Collender
Short-Latency Disparity Vergence in Humans: Evidence for Early Spatial Filtering
B. M. Sheliga and K. J. Chen and E. J. Fitzgibbon and F. A. Miles
Annals of the New York Academy of Sciences  1039  252--259  (2005)
Our study was concerned with the disparity detectors underlying the initial disparity vergence responses (DVRs) that are elicited at ultrashort latencies by binocular disparities applied to large images. DVRs were elicited in humans by applying horizontal disparity to vertical square-wave gratings lacking the fundamental (termed here, the ``missing fundamental''). In the frequency domain, a pure square wave is composed of odd harmonics---first, third, fifth, seventh, etc.---such that the third, fifth, seventh, etc., have amplitudes that are one-third, one-fifth, one-seventh, etc., that of the first, and the missing fundamental lacks the first harmonic. The patterns seen by the two eyes have a phase difference of one-quarter wavelength, so the disparity of the features and 4n + 1 harmonics (where n = integer) has one sign (crossed or uncrossed), whereas the 4n - 1 harmonics---including the strongest Fourier component (the third harmonic)---has the opposite sign (uncrossed or crossed): spatial aliasing. The earliest DVRs, recorded with the search-coil technique, had minimum latencies of 70 to 80 ms and were generally in the direction of the third harmonic, that is, uncrossed disparities resulted in convergent eye movements. In other experiments on the DVRs, one eye saw a missing fundamental and the other saw a pure sine wave with the contrast and wavelength of the third harmonic but differing in phase by one-quarter wavelength. This resulted in short-latency vergence in accordance with matching of the third harmonic. These data all indicate the importance of the Fourier components, consistent with early spatial filtering prior to binocular matching.
Short-Latency Disparity Vergence in Humans
C. Busettini and E. J. Fitzgibbon and F. A. Miles
Journal of Neurophysiology  85  1129--1152  (2001)
Properties of stimuli eliciting vergence eye movements and stereopsis
D. E. Mitchell
Vision Research  10  145--162  (1970)

{3D} {TV} Broadcasting
C. Fehn
C. M. Schor
  2  1300--1312  (2003)

Visual comfort and apparent depth in {3D} systems: effects of camera convergence distance
F. Speranza and L. B. Stelmach and W. J. Tam and R. Glabb
  4864  146--156  (2002)

We investigated the effect of convergence of stereoscopic cameras on visual comfort and apparent depth. In Experiment 1, viewers rated comfort and depth of three stereoscopic sequences acquired with convergence distance set at 60, 120, 180, 240 cm, or infinity (i.e., parallel). Moderately converged conditions were rated either as comfortable (i.e., 240 cm) or more comfortable (i.e., 120 and 180 cm) than the parallel condition. The 60 cm condition was rated the least comfortable. Camera convergence had no effects on ratings of apparent depth. In Experiment 2, we used computer-generated stereoscopic still images to investigate the effects of convergence in the absence of lens distortions. Results matched those obtained in Experiment 1. In Experiment 3, we artificially introduced keystone distortions in stereoscopic still images. We found that increasing the amount of keystone distortion caused only a minimal decrease in visual comfort and apparent depth.
Distortion of depth perception in virtual environments using stereoscopic displays: quantitative assessment and corrective measures
M. Kleiber and C. Winkelholz
  6803  68030C  (2008)

The aim of the presented research was to quantify the distortion of depth perception when using stereoscopic displays. The visualization parameters of the used virtual reality system such as perspective, haploscopic separation and width of stereoscopic separation were varied. The experiment was designed to measure distortion in depth perception according to allocentric frames of reference. The results of the experiments indicate that some of the parameters have an antithetic effect which allows to compensate the distortion of depth perception for a range of depths. In contrast to earlier research which reported underestimation of depth perception we found that depth was overestimated when using true projection parameters according to the position of the eyes of the user and display geometry.
Original and creative stereoscopic film making
E. Criado
  6803  68030U  (2008)
The stereoscopic cinema has become, once again, a hot topic in the film production. For filmmakers to he successful in this field, a technical background in the principles of binocular perception and how our brain interprets the incoming data from our eyes are fundamental. It is also paramount for a stereoscopic production to adhere certain rules for cornfort and safety. There is an immense variety of options in the art of standard ``flat'' photography. and the possibilities only can be multiply with the stereo. The stereoscopic imaging has its own unique areas for subjective, original and creative control that allow anincrediblerangeofpossiblecombinationsbyworkinginsidethestandards,andinsomecasesontheboundariesofthebasicstereorules.Thestereoscopicimagingcanheapproachedina``flat''manner,likechannelingsoundthroughanaudioequalizerwithallthebandsatthesamelevel,Itcanprovidearealisticperception.whichinmanycasescanhesuflicient.thankstotherock-solidviewinginherenttothestereoscopicimage.huttherearemanymorepossibilities.Thisdocumentdescribessomeofthebasicoperatingparametersandconceptsforstereoscopicimaging.hutitalsooffersideasfuracreativeprocessbasedonthevariationandcombinationofthesebasicparameters.whichcanleadintoatrulyinnovativeandoriginalviewingexperience.
Cosmic Cookery: Making a Stereoscopic {3D} Animated Movie
N. Holliman and C. Baugh and C. Frenk and A. Jenkins and B. Froner and D. Hassaine and J. Helly and N. Metcalfe and T. Okamoto
  6055  34--45  (2006)
This paper describes our experience making a short stereoscopic movie visualizing the development of structure in the universe during the 13.7 billion years from the Big Bang to the present day. Aimed at a general audience for the Royal Society's 2005 Summer Science Exhibition, the movie illustrates how the latest cosmological theories based on dark matter and dark energy are capable of producing structures as complex as spiral galaxies and allows the viewer to directly compare observations from the real universe with theoretical results. 3D is an inherent feature of the cosmology data sets and stereoscopic visualization provides a natural way to present the images to the viewer, in addition to allowing researchers to visualize these vast, complex data sets. The presentation of the movie used passive, linearly polarized projection onto a 2m wide screen but it was also required to playback on a Sharp RD3D display and in anaglyph projection at venues without dedicated stereoscopic display equipment. Additionally lenticular prints were made from key images in the movie. We discuss the following technical challenges during the stereoscopic production process; 1) Controlling the depth presentation, 2) Editing the stereoscopic sequences, 3) Generating compressed movies in display specific formats.We conclude that the generation of high quality stereoscopic movie content using desktop tools and equipment is feasible. This does require careful quality control and manual intervention but we believe these overheads are worthwhile when presenting inherently 3D data as the result is significantly increased impact and better understanding of complex 3D scenes.
Slant from texture and disparity cues: Optimal cue combination
J. M. Hillis and S. J. Watt and M. S. Landy and M. S. Banks
Journal of Vision  4  967-992  (2004)
How does the visual system combine information from different depth cues to estimate three-dimensional scene parameters? We tested a maximum-likelihood estimation (MLE) model of cue combination for perspective (texture) and binocular disparity cues to surface slant. By factoring the reliability of each cue into the combination process, MLE provides more reliable estimates of slant than would be available from either cue alone. We measured the reliability of each cue in isolation across a range of slants and distances using a slant-discrimination task. The reliability of the texture cue increases as |slant| increases and does not change with distance. The reliability of the disparity cue decreases as distance increases and varies with slant in a way that also depends on viewing distance. The trends in the single-cue data can be understood in terms of the information available in the retinal images and issues related to solving the binocular correspondence problem. To test the MLE model, we measured perceived slant of two-cue stimuli when disparity and texture were in conflict and the reliability of slant estimation when both cues were available. Results from the two-cue study indicate, consistent with the MLE model, that observers weight each cue according to its relative reliability: Disparity weight decreased as distance and |slant| increased. We also observed the expected improvement in slant estimation when both cues were available. With few discrepancies, our data indicate that observers combine cues in a statistically optimal fashion and thereby reduce the variance of slant estimates below that which could be achieved from either cue alone. These results are consistent with other studies that quantitatively examined the MLE model of cue combination. Thus, there is a growing empirical consensus that MLE provides a good quantitative account of cue combination and that sensory information is used in a manner that maximizes the precision of perceptual estimates.
Ordinal configural cues combine with metric disparity in depth perception
J. Burge and M. A. Peterson and S. E. Palmer
Journal of Vision  5  534--542  (2005)
Prior research on the combination of depth cues generally assumes that different cues must be in the same units for meaningful combination to occur. We investigated whether the geometrically ordinal cues of familiarity and convexity influence depth perception when unambiguous metric information is provided by binocular disparity. We used bipartite, random dot stereograms with a central luminance edge shaped like a face in profile. Disparity specified that the edge and dots on one side were closer than the dots on the other side. Configural cues suggested that the familiar, face-shaped region was closer than the unfamiliar side. Configural cues caused an increase in perceived depth for a given disparity signal when they were consistent with disparity and a decrease in perceived depth when they were inconsistent. Thus, geometrically ordinal configural cues can quantitatively influence a metric depth cue. Implications for the combination of configural and depth cues are discussed.
Subjective evaluation of stereoscopic images: effects of camera parameters and display duration
W. A. IJsselsteijn and H. de Ridder and J. Vliegen
IEEE Trans. on Circuits and Systems for Video Technology  10  225--233  (2000)
Two experiments are presented that were aimed to investigate the effects of stereoscopic filming parameters and display duration on observers' judgements of naturalness and quality of stereoscopic images. The paper first presents a literature review of temporal factors in stereoscopic vision, with reference to stereoscopic displays. Several studies have indicated an effect of display duration on performance-oriented (criterion based) measures. The experiments reported were performed to extend the study of display duration from performance to appreciation-oriented measures. In addition, the present study aimed to investigate the effects of manipulating camera separation, convergence distance, and focal length on perceived quality and naturalness, In the first experiment, using display durations of both 5 and 10 s, 12 observers rated the naturalness of depth and the quality of depth for stereoscopic still images. The results showed no significant main effect of the display duration. A small yet significant shift between naturalness and quality was found for both duration conditions. This result replicated earlier findings, indicating that this is a reliable effect, albeit content-dependent. The second experiment was performed using display durations ranging from 1 to 15 s. The results of this experiment showed a small yet significant effect of display duration. Whereas longer display durations do not have a negative impact on the appreciative scores of optimally reproduced stereoscopic images, observers do give lower judgements to monoscopic images and stereoscopic images with unnatural disparity values as display duration increases. In addition, the results of both experiments provide support for the argument that stereoscopic camera toe-in should be avoided if possible
Measurement of parallax distribution and its application to the analysis of visual comfort for stereoscopic {HDTV}
Y. Nojiri and H. Yamanoue and A. Hanazato and F. Okano
  5006  195--205  (2003)

The relationship between visual comfort and parallax distribution for stereoscopic HDTV has been studied. In this study, we first examined a method for measuring this parallax distribution. As it is important to understand the characteristics of the distribution in a frame or temporal changes of the characteristics, rather than having detailed information on the parallax at every point, we propose a method to measure the parallax based on the phase correlation. It includes a way of reducing the measurement error depending on the phase correlation method. The method was used to measure stereoscopic HDTV images with good results. Secondly, we conducted a subjective evaluation test of visual comfort and sense of presence using 48 different stereoscopic HDTV pictures, and compared the results with the parallax distributions in these pictures measured by the proposed method. The comparison showed that the range of parallax distribution and the average parallax distribution significantly affect visual comfort when viewing stereoscopic HDTV images. It is also suggested that the range of parallax distribution in many of the images that were judged comfortable to view is located within approximate 0.3Diopter.
Human factors of {3DTV}: an overview of current research at {H}einrich-{H}ertz-{I}nstitut {B}erlin
S. Pastoor
    11/1-11/4  (1992)
At present, it is not clear at which performance level and up to which extent stereoscopic methods have the potential to meet the user requirements on advanced future TV systems. Human factors studies intend to clarify this situation and to provide the basis to assess various technological solutions. The author concentrates on respective studies of the Human Factors Department, Heinrich Hertz Institute Berlin, and attempts to assess some approaches to 3DTV currently under discussion
Human factors of {3D} displays in advanced image communications
S. Pastoor
Displays  14  150 - 157  (1993)

Three-dimensional displays provide an unambiguous visual representation of the spatial structure of natural scenes and computer-generated virtual environments and thus have proven substantial advantages over conventional displays in remote guidance and inspection tasks and in application fields such as medical imagery and architectural and molecular modelling. Recent years have seen increasing efforts to extend 3D technologies into the domain of image communications. These efforts received support from human factors studies indicating that 3D displays are highly appreciated by prospective users of image communications systems for their enhanced psychological effects (telepresence and communicative presence). On the other hand, these studies have revealed heavy technological requirements that must be met in order to avoid visible image distortions and increased visual strain.
Geometrical analysis of puppet-theater and cardboard effects in stereoscopic {HDTV} images
H. Yamanoue and M. Okui and F. Okano
IEEE Trans. on Circuits and Systems for Video Technology  16  744--752  (2006)

A fundamental element of stereoscopic image production is to geometrically analyze the conversion from real space to stereoscopic images by binocular parallax under various shooting and viewing conditions. This paper reports on this analysis, particularly on the setting of the optical axes of three-dimensional (3-D) cameras, which has received little attention in the past. First, we identified the conditions for setting the optical axes that maintain linearity during the conversion from real space to stereoscopic images. We then clarified, in geometrical terms, the shooting and viewing conditions and also conditions under which the puppet-theater effect and cardboard effect occur. The results showed that the parallel camera configuration, by which optical axes are kept parallel to each other, does not produce the puppet-theater effect as the apparent magnification (lateral magnification) of a shooting target is not dependent on the shooting distance. However, the toed-in camera configuration, where the apparent magnification of a shooting target is dependent on the shooting distance, may produce this effect. The cardboard effect is shown to be likely to occur for both camera configurations by defining this phenomenon by the ratio of depthwise reproduction magnification (depth magnification) and apparent reproduction magnification (lateral magnification). Lastly, the paper reports on the relationship between the results of this analysis and those of subjective evaluation experiments. The results need a closer examination by using many more images.
Rectification with Intersecting Optical Axes for Stereoscopic Visualization
J. Zhou and B. Li
Proc. ICPR  2  17-20  (2006)

There exist various methods for stereoscopic viewing of images, most requiring some special glasses for controlling what goes to the left and the right eyes of the viewer. Recent technology developments have resulted in displays that enable 3D viewing without glasses. However, these displays demand a true stereo pair as the input, which greatly limits their practical use, as true stereoscopic media are scarce. In our recent work (Zhou and Li, 2006), we developed a systematic approach to automatic rectification of two images of the same scene captured by cameras at general positions, so that the results can be viewed on a 3D display. However, the approach cannot work well for large camera displacement (i.e., very wide baseline). In this paper, we propose a new rectification scheme to address this wise baseline rectification problem, with the basic idea of using a special stereo setup with intersecting optical axes. In a sense, the idea mimics human vision when viewing objects close to the eyes. Experiments with a 3D display demonstrate the feasibility and effectiveness of the proposed approach
Image Rectification for Stereoscopic Visualization Without {3D} Glasses
J. Zhou and B. Li
  4071  495--498  (2006)

Improving the visual comfort of stereoscopic images
L. B. Stelmach and W. J. Tam and F. Speranza and R. Renaud and T. Martin
  5006  269-282  (2003)

We compared the visual comfort and apparent depth of stereoscopic images for three camera configurations: parallel (without image shift), image-shifted and converged. In the parallel and image-shifted configurations, the stereo cameras were pointed straight ahead. In the converged configuration the cameras were toed-in. In the image-shifted configuration the image frame was shifted perpendicularly with respect to the line of sight of the camera. The parallel configuration produces images with uncomfortably large disparities for objects near the camera. By converging the cameras or by shifting the image, these large disparities can be reduced and visual comfort can be improved. However, the converged configuration introduces keystone distortions into the image, which can produce visual discomfort. The image-shifted configuration does not introduce keystone distortions, but affects the width of the image frame. It also requires unusual camera hardware or computer post-processing to shift the images. We found that converged and image-shifted configurations improved the visual comfort of stereoscopic images by an equivalent amount, without affecting the apparent depth. Keystone distortions in the converged configuration had no appreciable negative effect on visual comfort.
Stereoscopic image generation based on depth images for {3D} {TV}
L. Zhang and W. J. Tam
IEEE Trans. Broadcasting  51  191--199  (2005)
A depth-image-based rendering system for generating stereoscopic images is proposed. One important aspect of the proposed system is that the depth maps are pre-processed using an asymmetric filter to smoothen the sharp changes in depth at object boundaries. In addition to ameliorating the effects of blocky artifacts and other distortions contained in the depth maps, the smoothing reduces or completely removes newly exposed (disocclusion) areas where potential artifacts can arise from image warping which is needed to generate images from new viewpoints. The asymmetric nature of the filter reduces the amount of geometric distortion that might be perceived otherwise. We present some results to show that the proposed system provides an improvement in image quality of stereoscopic virtual views while maintaining reasonably good depth quality.
Limits of fusion and depth judgment in stereoscopic color displays
Y.-Y. Yeh and L. D. Silverstein
Hum. Factors  32  45--60  (1990)

The effective use of stereoscopic display systems is dependent, in part, on reliable data describing binocular fusion limits and the accuracy of depth discrimination for such visual display devices. These issues were addressed in three experiments, as were the effects of interocular cross talk. Results showed that limits of fusion were approximately 27.0 min arc for crossed disparity and 24.0 min arc for uncrossed disparity. Subjects were extremely accurate in distinguishing the relative distance among four groups of stimuli, were able to identify a pair of stimuli colocated at the same depth plane within each group, and were fairly accurate in scaling stimuli along the depth dimension. The mean error in using disparity as a depth cue was approximately 2.2 min arc. Interocular cross talk had little effect on fusion limits for 200-ms stimulus presentations but significantly affected fusion for longer (2 s) presentations that enabled vergence responses to be executed. Depth discrimination performance was essentially unaffected by interocular cross talk; however, cross talk significantly influenced subjective ratings of image quality and visual comfort.
High-quality video view interpolation using a layered representation
C. L. Zitnick and S. B. Kang and M. Uyttendaele and S. Winder and R. Szeliski
ACM Trans. Graph.  23  600--608  (2004)
The ability to interactively control viewpoint while watching a video is an exciting application of image-based rendering. The goal of our work is to render dynamic scenes with interactive viewpoint control using a relatively small number of video cameras. In this paper, we show how high-quality video-based rendering of dynamic scenes can be accomplished using multiple synchronized video streams com- bined with novel image-based modeling and rendering algorithms. Once these video streams have been processed, we can synthesize any intermediate view between cameras at any time, with the poten- tial for space-time manipulation. In our approach, we first use a novel color segmentation-based stereo algorithm to generate high-quality photoconsistent correspondences across all camera views. Mattes for areas near depth discontinuities are then automatically extracted to reduce artifacts during view syn- thesis. Finally, a novel temporal two-layer compressed representa- tion that handles matting is developed for rendering at interactive rates.
Development of {MPEG} Standards for {3D} and Free Viewpoint Video
A. Smolic and H. Kimata and A. Vetro
A comparative study of free-viewpoint video techniques for sports events
J. J. M. Kilner and J. Starck and A. Hilton
    87--96  (2006)
This paper presents a quantitative analysis of free-viewpoint video techniques applied to the problem of virtual view synthesis in sports events. A consideration of errors in the synthesis pipeline is presented along with a taxonomy of these errors and a framework for evaluating the quality of view synthesis when compared to ground truth. Three reconstruction techniques are evaluated, billboarding, shape from silhouette and view-dependent shape from silhouette. View-dependent rendering is used for virtual view synthesis. It is shown that currently the shape from silhouette technique provides the best completeness, while the view-dependent shape from silhouette technique provides the best appearance.
View Interpolation Along a Chain of Weakly Calibrated Cameras
D. Farin and Y. Morvan and P. H. N. de With
This paper presents an algorithm for interpolating interme- diate views along a chain of cameras. There is no restriction on the camera placement as long as the distance between successive cameras is not too large. The interpolated views lie on a virtual poly-line defined by the (ordered) set of cam- eras. Our algorithm requires no strong camera calibration as the necessary epipolar geometry is estimated from the input images itself. The algorithm first runs a preprocessing step to rectify the images to canonical epipolar geometry and to calculate disparity images. The actual view interpolation uses this data to synthesize intermediate views in real-time
Free viewpoint video extraction, representation, coding and rendering
A. Smolic and K. Mueller and P. Merkle and T. Rein and M. Kautzner and P. Eisert and T. Wieg
  5  3287--3290  (2004)
Free viewpoint video provides the possibility to freely navigate within dynamic real world video scenes by choosing arbitrary viewpoints and view directions. So far, related work only considered free viewpoint video extraction, representation, and rendering methods. Compression and transmission has not yet been studied in detail and combined with the other components into one complete system. In this paper, we present such a complete system for efficient free viewpoint video extraction, representation, coding, and interactive rendering. Data representation is based on 3D mesh models and view-dependent texture mapping using video textures. The geometry extraction is based on a shape-from-silhouette algorithm. The resulting voxel models are converted into 3D meshes that are coded using MPEG-4 SNHC tools. The corresponding video textures are coded using an H.264/AVC codec. Our algorithms for view-dependent texture mapping have been adopted as an extension of MPEG-4 AFX. The presented results illustrate that based on the proposed methods a complete transmission system for efficient free viewpoint video can be built.
Free-viewpoint video with stereo and matting
S. B. Kang and C. L. Zitnick and M. Uyttendaele and S. Winder and R. Szeliski

Stereoscopic inpainting: Joint color and depth completion from stereo images
L. Wang and H. Jin and R. Yang and M. Gong
    1-8  (2008)

We present a novel algorithm for simultaneous color and depth inpainting. The algorithm takes stereo images and estimated disparity maps as input and fills in missing color and depth information introduced by occlusions or object removal. We first complete the disparities for the occlusion regions using a segmentation-based approach. The completed disparities can be used to facilitate the user in labeling objects to be removed. Since part of the removed regions in one image is visible in the other, we mutually complete the two images through 3D warping. Finally, we complete the remaining unknown regions using a depth-assisted texture synthesis technique, which simultaneously fills in both color and depth. We demonstrate the effectiveness of the proposed algorithm on several challenging data sets.
Stereo reconstruction with mixed pixels using adaptive over-segmentation
Y. Taguchi and B. Wilburn and C. L. Zitnick
    2720--2727  (2008)
We present an over-segmentation based, dense stereo algorithm that jointly estimates segmentation and depth. For mixed pixels on segment boundaries, the algorithm computes foreground opacity (alpha), as well as color and depth for the foreground and background. We model the scene as a collection of fronto-parallel planar segments in a reference view, and use a generative model for image formation that handles mixed pixels at segment boundaries. Our method iteratively updates the segmentation based on color, depth and shape constraints using MAP estimation. Given a segmentation, the depth estimates are updated using belief propagation. We show that our method is competitive with the state-of-the-art based on the new Middlebury stereo evaluation, and that it overcomes limitations of traditional segmentation based methods while properly handling mixed pixels. Z-keying results show the advantages of combining opacity and depth estimation.
Boundary matting for view synthesis
S. W. Hasinoff and S. B. Kang and R. Szeliski
Comput. Vis. Image Underst.  103  22--32  (2006)
In the last few years, new view synthesis has emerged as an important application of 3D stereo reconstruction. While the quality of stereo has improved, it is still imperfect, and a unique depth is typically assigned to every pixel. This is problematic at object boundaries, where the pixel colors are mixtures of foreground and background colors. Interpolating views without explicitly accounting for this effect results in objects with a ``cut-out'' appearance. To produce seamless view interpolation, we propose a method called boundary matting, which represents each occlusion boundary as a 3D curve. We show how this method exploits multiple views to perform fully automatic alpha matting and to simultaneously refine stereo depths at the boundaries. The key to our approach is the 3D representation of occlusion boundaries estimated to sub-pixel accuracy. Starting from an initial estimate derived from stereo, we optimize the curve parameters and the foreground colors near the boundaries. Our objective function maximizes consistency with the input images, favors boundaries aligned with strong edges, and damps large perturbations of the curves. Experimental results suggest that this method enables high-quality view synthesis with reduced matting artifacts.
Overconstrained Linear Estimation of Radial Distortion and Multi-view Geometry
R. M. Steele and C. Jaynes

This paper introduces a new method for simultaneous estimation of lens distortion and multi-view geometry using only point correspondences. The new technique has significant advantages over the current state-of-the art in that it makes more effective use of correspondences arising from any number of views. Multi-view geometry in the presence of lens distortion can be expressed as a set of point correspondence constraints that are quadratic in the unknown distortion parameter. Previous work has demonstrated how the system can be solved efficiently as a quadratic eigenvalue problem by operating on the normal equations of the system. Although this approach is appropriate for situations in which only a minimal set of matchpoints are available, it does not take full advantage of extra correspondences in overconstrained situations, resulting in significant bias and many potential solutions. The new technique directly operates on the initial constraint equations and solves the quadratic eigenvalue problem in the case of rectangular matrices. The method is shown to contain significantly less bias on both controlled and real-world data and, in the case of a moving camera where additional views serve to constrain the number of solutions, an accurate estimate of both geometry and distortion is achieved.
A Rational Function Lens Distortion Model for General Cameras
D. Claus and A. W. Fitzgibbon
    213--219  (2005)
We introduce a new rational function (RF) model for radial lens distortion in wide-angle and catadioptric lenses, which allows the simultaneous linear estimation of motion and lens geometry from two uncalibrated views of a 3D scene. In contrast to existing models which admit such linear estimates, the new model is not specialized to any particular lens geometry, but is sufficiently general to model a variety of extreme distortions. The key step is to define the mapping between image (pixel) coordinates and 3D rays in camera coordinates as a linear combination of nonlinear functions of the image coordinates. Like a ``kernel trick'', this allows a linear algorithm to estimate nonlinear models, and in particular offers a simple solution to the estimation of nonlinear image distortion. The model also yields an explicit form for the epipolar curves, allowing correspondence search to be efficiently guided by the epipolar geometry. We show results of an implementation of the RF model in estimating the geometry of a real camera lens from uncalibrated footage, and compare the estimate to one obtained using a calibration grid.
Simultaneous Linear Estimation of Multiple View Geometry and Lens Distortion
A. W. Fitzgibbon
  1  125--132  (2001)
A problem in uncalibrated stereo reconstruction is that cameras which deviate from the pinhole model have to be pre-calibrated in order to correct for nonlinear lens distortion. If they are not, and point correspondence is attempted using the uncorrected images, the matching constraints provided by the fundamental matrix must be set so loose that point matching is significantly hampered. This paper shows how linear estimation of the fundamental matrix from two-view point correspondences may be augmented to include one term of radial lens distortion. This is achieved by (1) changing from the standard radial-lens model to another which (as we show) has equivalent power, but which takes a simpler form in homogeneous coordinates, and (2) expressing fundamental matrix estimation as a quadratic eigenvalue problem (QEP), for which efficient algorithms are well known. I derive the new estimator, and compare its performance against bundle-adjusted calibration-grid data. The new estimator is fast enough to be included in a RANSAC-based matching loop, and we show cases of matching being rendered possible by its use. I show how the same lens can be calibrated in a natural scene where the lack of straight lines precludes most previous techniques. The modification when the multi-view relation is a planar homography or trifocal tensor is described.
Projective Rectification with Minimal Geometric Distortion
H.-H. P. Wu and C.-C. Chen
    221--242  (2007)
There has been an increasing interest in the 3D imaging in the fields of entertainment, simulation, medicine, 3D visual communication, 3D tele-robotics, and 3D TV to augment the reality of presence or to provide vivid and accurate structure information. In order to provide vivid information in these and other 3D applications, efficient techniques to generate, store, and view the stereoscopic video are essential. While many methods are available for acquiring stereoscopic video, the images pairs obtained might not be in rectified form. Therefore, rectification is usually needed to support comfortable viewing and effective compression for storage and transmission. Projective geometry has been proved to be a useful tool for solving the rectification problem without camera calibration. However, if the matrices used for projective rectification (homographies) are not constrained properly, the rectification process can cause great geometric distortion. For visual applications, rectification with minimum geometry distortion should be pursued. In this chapter, we propose an improved algorithm to minimize the distortion by combining a newly developed projective transform with a properly chosen shearing transform. This new method is equipped with flexibility and can be adapted to various imaging models. Experimental data show that the proposed method works quite well for all the image pairs taken different imaging conditions. Comparison with other available method based on visual inspection and numerical data demonstrates the superiority of the new approach.
Fish-eye-stereo calibration and epipolar rectification
S. Abraham and W. Förstner
ISPRS Journal of Photogrammetry and Remote Sensing  59  278--288  (2005)

The paper describes calibration and epipolar rectification for stereo with fish-eye optics. While stereo processing of classical cameras is state of the art for many applications, stereo with fish-eye cameras have been much less discussed in literature. This paper discusses the geometric calibration and the epipolar rectification as pre-requisite for stereo processing with fish-eyes. First, it surveys mathematical models to describe the projection. Then the paper presents a method of generating epipolar images which are suitable for stereo-processing with a field of view larger than 180$\,^{\circ}$ in vertical and horizontal viewing directions. One example with 3D-point measuring from real fish-eye images demonstrates the feasibility of the calibration and rectification procedure.
Fundamental Matrix for Cameras with Radial Distortion
J. P. Barreto and K. Daniilidis
When deploying a heterogeneous camera network or when we use cheap zoom cameras like in cell-phones, it is not practical, if not impossible to off-line calibrate the radial distortion of each camera using reference objects. It is rather desirable to have an automatic procedure without strong assumptions about the scene. In this paper, we present a new algorithm for estimating the epipolar geometry of two views where the two views can be radially distorted with different distortion factors. It is the first algorithm in the literature solving the case of different distortion in the left and right view linearly and without assuming the existence of lines in the scene. Points in the projective plane are lifted to a quadric in three-dimensional projective space. A radial distortion of the projective plane results to a matrix transformation in the space of lifted coordinates. The new epipolar constraint depends linearly on a 4 x 4 radial fundamental matrix which has 9 degrees of freedom. A complete algorithm is presented and tested on real imagery.
Basic Principles of the Three-Dimensional Film
R. Spottiswoode and N. L. Spottiswoode and C. Smith
SMPTE Journal  59  249--286  (1952)
Professional three-dimensional (3-D) film productions cannot be satisfactorily undertaken without a comprehensive theory of the transmission of an image in space from scene to screen. In Part I the outlines of such a theory are laid down, and the elements of a standard set of concepts and nomenclature put forward. Part II draws an example from a recent film, The Black Swan, to show how the stereotechnician computes a sequence of shots in the desired space relationship, and how simple graphical techniques may be employed to plot such relationships. From these graphs may be determined the magnitude of any postcorrections required to alter the continuity in space, to adjust the film to screens of widely differing size or to eliminate certain camera errors. Part III forms a critique of existing camera procedures, including those based on the supposed identity between human vision and the viewing of the space image. Part IV sums up the differences of technique between the flat film and the 3-D film.
{3-D} Film --- {W}ikipedia{,} The Free Encyclopedia
Reconstruction of correct {3-D} perception on screens viewed at different distances
R. Kutka
IEEE Trans. on Comm.  42  29--33  (1994)

A calibration technique for the realistic representation of stereo images is introduced. It guarantees the correct appearance of object distances and sizes irrespective of whether they are imaged on very large or very small screens. If a matching stereo image pair is projected onto several screens of different sizes, the apparent geometry changes from screen to screen. On smaller monitors, all objects seem to be nearer to the spectator and smaller (puppet-theater effect). The paper proves that a global shift between the two stereo frames is necessary and sufficient to reconstruct the 3D geometry exactly. This shift does not depend on image contents but only on the screen size
Selected Attempts at Stereoscopic Moving Pictures and Their Relationship to the Development of Motion Picture Technology, 1852-1903
H. M. Gosser

3-{D} fimmakers : Conversations with Creators of Stereoscopic Motion Pictures
R. Zone

Stereoscopic Cinema and the Origins of 3-{D} Film, 1838-1952
R. Zone

From stereoview cards to large-format IMAX films, 3-D technology's heightened realism and powerful visual allure have held audiences captive for over a century and a half. The technology, known as stereoscopy, creates an illusion of depth by presenting two slightly different images to the eye in print or on-screen. The advent of stereoscopic film technology excited both filmmakers and audiences, as a means of replicating all of the sounds, colors, movement, and dimensionality of life and nature for the first time. The origins of 3-D film are often linked with a proliferation of stereoscopic films in the 1950s. By the time films like Man in the Dark and House of Wax was attracting large crowds, however, the technology behind this form of filmmaking was already over a century old. Stereoscopic Cinema and the Origins of 3-D Film, 1838--1952, examines this ``novelty period'' of stereoscopic film, charting its progression from Charles Wheatstone's 1938 discovery of 3-D to the 1952 release of Arch Oboler's innovative film, Bwana Devil. Stereoscopic specialist Ray Zone argues that the development of stereoscopic film can best be understood through a historical analysis of the technology rather than of its inventors. Zone examines the products used to create stereoscopic images, noting such milestones as David Brewster's and Oliver Wendell Holmes's work with stereoscopes, the use of polarizing image selection, and the success of twin-strip 3-D films, among others. In addition, Zone looks at the films produced up to 1952, discussing public reception of early 3-D short films as well as longer features such as Power of Love in single-strip anaglyphic projection in 1922 and Semyon Ivanov's 1941 autostereoscope Robinson Crusoe. He integrates his examination of the evolution of 3-D film with other cinematic developments, demonstrating the connection between stereoscopic motion pictures and modern film production. Stereoscopic Cinema and the Origins of 3-D Film, 1838--1952, is an exhaustive study of not only the evolution of 3-D technology and the subsequent filmmaking achievements but also the public response to and cultural impact of 3-D movies. Zone takes the reader on a voyage of discovery into the rich history of a field that predates photography and that continues to influence television and computer animation today.
Binocular Vision and Stereopsis
I. P. Howard and B. J. Rogers

This book is a survey of knowledge about binocular vision, with an emphasis on its role in the perception of a three-dimensional world. The primary interest is biological vision. In each chapter, physiological, behavioral, and computational approaches are reviewed in some detail, discussed, and interrelated. The authors describe experiments required to answer specific questions and relates them to new terminologies and current theoretical schemes.
Researches in binocular vision
K. N. Ogle

This book is a very able discussion of the basic physiology involved in binocular visual processes by one of the foremost authorities in this field. It is well illustrated with diagrams and contains many graphical summaries of the material pre- sented. It is not easy reading, although this is a fault of the material and not of the author. Much of the material was heretofore available only in the original works of Helmholtz, Hofmann, Bielschowsky, Hering and Tschermak. To this, Dr. Ogle has added the original work of the Dartmouth group-much of it his own. Although the literature and texts on the anomalies of binocular vision are quite adequate from a clinical point of view, very little is written about the basic physiological prin- ciples in normal binocular vision. This book fills that need. Here, for the first time in English, between the covers of a single book, the most pertinent parts of this subject are ably presented by an authority. The text is divided into four parts: The first deals mainly with studies of the horopter and the theorv of corresponding retinal points. The second part takes up fusional processes with a discussion of Panum's areas, fixation disparity, peripheral retinal fusion, and cyclo- fusional eye movements. The third part deals mainly with space perception, the induced effect and the changes incident to asymmetrical convergence. Part four is a complete discus- sion of aniseikonia. This book will be of little aid to the clinician seeking a brief summary of how best to treat binocular anomalies. However, it will be indispensable to those interested in the physiological processes of binocular vision and is certainly the most complete and authoritative work in this field in recent years.
Geometry of Binocular Vision and a Model for Stereopsis
J. J. Koenderink and A. J. van Doorn
Biol. Cybernetics  21  29--35  (1976)

If a binocular observer looks at surfaces, the disparity is a continuous vector field defined on the manifold of cyclopean visual directions. We derive this field for the general case that the observer is presented with a curved surface and fixates an arbitrary point. We expand the disparity field in the neighbourhood of a visual direction. The first order approximation can be decomposed into congruences, similarities and deformations. The deformation component is described by the traceless part of the symmetric part of the gradient of the disparity. The deformation component carries all information concerning the slant of a surface element that is contained in the disparity field itself; it is invariant for changes of fixation, differential cyclotorsion and uniform aniseikonia. The deformation component can be found from a comparison of the orientation of surface details in the left and right retinal images. The theory provides a geometric explanation of the percepts obtained with uniform and oblique meridional aniseikonia. We utilize the geometric theory to construct a mechanistic model of stereopsis that obviates the need for internal zooming mechanisms, but nevertheless is insensitive to differential cyclotorsion or uniform aniseikonia.
Geometric and induced effects in binocular stereopsis and motion parallax
R. S. Allison and B. J. Rogers and M. F. Bradshaw
Vision Research  43  1879--1893  (2003)
This paper examines and contrasts motion-parallax analogues of the induced-size and induced-shear effects with the equivalent induced effects from binocular disparity. During lateral head motion or with binocular stereopsis, vertical-shear and vertical-size transformations produced `induced effects' of apparent inclination and slant that are not predicted geometrically. With vertical head motion, horizontal-shear and horizontal-size transformations produced similar analogues of the disparity induced effects. Typically, the induced effects were opposite in direction and slightly smaller in size than the geometric effects. Local induced-shear and induced-size effects could be elicited from motion parallax, but not from disparity, and were most pronounced when the stimulus contained discontinuities in velocity gradient. The implications of these results are discussed in the context of models of depth perception from disparity and structure from motion.
Induced size effect: I. A New Phenomenon in Binocular Space Perception Associated with the Relative Sizes of the Images of the Two Eyes
K. N. Ogle
American Medical Association Archives of Ophtalmology  20  604--623  (1938)
Vertical disparities, differential perspective and binocular stereopsis
B. J. Rogers and M. F. Bradshaw
Nature  361  253--255  (1993)
Does depth perception require vertical-disparity detectors?
J. C. A. Read and B. G. Cumming
Journal of Vision  6  1323--1355  (2006)
Stereo depth perception depends on the fact that objects project to different positions in the two eyes. Because our eyes are offset horizontally, these retinal disparities are mainly horizontal, and horizontal disparity suffices to give an impression of depth. However, depending on eye position, there may also be small vertical disparities. These are significant because, given both vertical and horizontal disparities, the brain can deduce eye position from purely retinal information and, hence, derive the position of objects in space. However, we show here that, to achieve this, the brain need measure only the magnitude of vertical disparity; for physically possible stimuli, the sign then follows from the stereo geometry. The magnitude of vertical disparity---and hence eye position---can be deduced from the response of purely horizontal-disparity sensors because vertical disparity moves corresponding features off the receptive fields, reducing the effective binocular correlation. As proof, we demonstrate an algorithm that can accurately reconstruct gaze and vergence angles from the population activity of pure horizontal-disparity sensors and show that it is subject to the induced effect. Given that disparities experienced during natural viewing are overwhelmingly horizontal and that eye position measures require only horizontal-disparity sensors, this work raises two questions: Does the brain in fact contain sensors tuned to nonzero vertical disparities, and if so, why?
System and process for generating a two-layer, {3D} representation of a scene
C. L. Zitnick and R. Szeliski and S. B. Kang and M. T. Uyttendaele and S. Winder
A system and process for generating a two-layer, 3D representation of a digital or digitized image from the image and a pixel disparity map of the image is presented. The two layer representation includes a main layer having pixels exhibiting background colors and background disparities associated with correspondingly located pixels of depth discontinuity areas in the image, as well as pixels exhibiting colors and disparities associated with correspondingly located pixels of the image not found in these depth discontinuity areas. The other layer is a boundary layer made up of pixels exhibiting foreground colors, foreground disparities and alpha values associated with the correspondingly located pixels of the depth discontinuity areas. The depth discontinuity areas correspond to prescribed sized areas surrounding depth discontinuities found in the image using a disparity map thereof.
Stereo images with comfortable perceived depth
G. R. Jones and N. S. Holliman and D. Lee
A method of producing a stereo image of a (real or simulated) scene using at least one (real or simulated) camera, which creates the impression of being a 3D image when viewed on a display by a user, wherein the depth of the scene is mapped onto a maximum perceived depth of the image on the display, and the maximum perceived depth is chosen to provide comfortable viewing for the user.
Efficient Dense Stereo with Occlusions for New View-Synthesis by Four-State Dynamic Programming
A. Criminisi and A. Blake and C. Rother and J. Shotton and P. H. Torr
Int. J. Comput. Vision  71  89--110  (2007)

Computing rectifying homographies for stereo vision
C. Loop and Z. Zhang
  1  -131 Vol. 1  (1999)
Image rectification is the process of applying a pair of 2D projective transforms, or homographies, to a pair of images whose epipolar geometry is known so that epipolar lines in the original images map to horizontally aligned lines in the transformed images. We propose a novel technique for image rectification based on geometrically well defined criteria such that image distortion due to rectification is minimized. This is achieved by decomposing each homography into a specialized projective transform, a similarity transform, followed by a shearing transform. The effect of image distortion at each stage is carefully considered
E. Hecht

Accurate, authoritative and comprehensive, Optics, Fourth Edition< has been revised to provide readers with the most up-to-date coverage of optics. The market leader for over a decade, this book provides a balance of theory and instrumentation, while also including the necessary classical background. The writing style is lively and accessible. For college instructors, students, or anyone interested in optics.
Virtual Control of Optical Axis of the {3DTV} Camera for Reducing Visual Fatigue in Stereoscopic {3DTV}
J.-I. Park and G. M. Um and C. Ahn and C. Ahn
ETRI Journal  26  597--604  (2004)

The Many Ways to Create a {3-D} Image
C. Chinnock
SMPTE Motion Imaging Journal      (2008)
This article provides some basic information about the ways stereoscopic 3-D images can be displayed for cinema, consumer, or professional applications. While this may sound like a fairly simple exercise, in fact, there are probably dozens of ways to do this. As a result, it is useful to have a reference article that briefly describes the more popular approaches and organizes these approaches to better understand the features and benefits of each. Accompanying this article is a wall chart of stereoscopic 3-D display technologies. This family tree is broken into three branches and outlines the various approaches to creating 3-D. But even the structure of the wall chart was not so straightforward as it can be organized in several ways.
The Last Great Innovation: The Stereoscopic Cinema
L. Lipton
SMPTE Motion Imaging Journal    518--523  (2007)

In February 2007, a sea change in perception took place. A full-page article in Variety magazine extolled the virtues of 3-D movies, and a few days later, an editorial in the L.A. Times stated that 3-D had to be taken seriously by the studios. Since then, a number of articles in the trades have discussed the stereoscopic cinema in positive terms. It has become the great hope of the industry after languishing for a century, primarily because recent 3-D movies are producing about three times the revenue of the simultaneously released 2-D version of films such as Chicken Little, Monster House, and Meet the Robinsons. In addition, Disney's The Nightmare Before Christmas, a 14-year-old film that had been in home release, was converted to 3-D and profitably theatrically re-released. Before February, in the trades and the popular press, the stereoscopic cinema was referred to with derision; it was a joke, to be dismissed. That is no longer the case. What has changed in addition to the better box of- fice? In this article, I will attempt to provide a historical perspective with regard to technology introductions to the cinema, and explore the rea- sons for this recent change in attitude, with regard to the stereoscopic cinema.
Parallax distribution for ease of viewing in stereoscopic {HDTV}
H. Yamanoue and S. Ide and M. Okui and F. Okano and M. Bitou and N. Terashima
In order to identify the conditions that make stereoscopic images easier to view, we analyzed the psychological effects using a stereoscopic HDTV system, and examined the relationship between this analysis and the parallax distribution patterns. First, we evaluated the impressions of several stereoscopic images included in standard 3-D HDTV test charts and past 3-D HDTV programs using some evaluation terms. Two factors were thus extracted, the first related to the ``sense of presence'' and the second related to ``ease of viewing''. Secondly, we applied principal component analysis to the parallax distribution of the stereoscopic images used in the subjective evaluation tests, in order to extract the features of the parallax distribution, then we examined the relationship between the factors and the features of the parallax distribution. The results indicated that the features of the parallax distribution are strongly related to ``ease of viewing'', and for ease of viewing 3-D images, the upper part of the screen should be located further away from the viewer with less parallax irregularity, and the entire image should be positioned behind the screen.
Spatial Distortion Prediction System for Stereoscopic Images
K. Masaoka and A. Hanazato and M. Emoto and H. Yamanoue and Y. Nojiri and F. Okano
Journal of Electronic Imaging  15    (2006)
We propose a system to calculate the spatial distortion in 3-D images based on the shooting, display, and viewing conditions. It can be used to predict the extent of the perceived puppet-theater effect and the cardboard effect. The magnitude of the spatial distortion and the extent of the puppet-theater and cardboard effects are displayed using a space grid whose size can be estimated based on the objects' depths, calculated from the binocular parallax of the acquired stereoscopic images. This system can also be used to predict excessive binocular parallax and excessive parallax distribution. Several cases in which puppet-theater and cardboard effects are expected to be produced are presented. We also demonstrate how the proposed system might be used to predict ratings of naturalness and quality of depth.
Characteristics of accommodation toward apparent depth
T. Takeda and K. Hashimoto and N. Hiruma and Y. Fukui
Vision Research  39  2087--2091  (1999)

This paper deals with characteristics of accommodation evoked by perceived depth sensation and the dynamic relationship between accommodation and vergence, applying newly developed optical measurement apparatuses. A total of five subjects looked at three different two-dimensional stimuli and two different three-dimensional stimuli; namely a real image and a stereoscopic image. With regard to the two-dimensional stimuli, a manifest accommodation without any accompanying vergence was found because of an apparent depth sensation even though the target distance was kept constant. With regard to the three-dimensional stimuli, larger accommodation and clear vergence were evoked because of binocular parallax and a stronger depth sensation. As for the stereoscopic image, a manifest overshoot (the accommodation peaked first and receded considerably) was found while the vergence remained constant. On the other hand, the overshoot of accommodation was smaller when subjects were watching the real image. These results reveal that brain depth perception has a higher effect on accommodation than expected. The relationship of accommodation and vergence toward the stereoscopic image suggests a reason why severe visual fatigue is commonly experienced by many viewers using stereoscopic displays. It has also paved the way for the numerical analysis of the oculomotor triad system.
Analysis of the Influence of Vertical Disparities Arising in Toed-in Stereoscopic Cameras
R. S. Allison
Journal of Imaging Science and Technology  51  317-327  (2007) paper.pdf
A basic task in the construction and use of a stereoscopic camera and display system is the alignment of the left and right images appropriately---a task generally referred to as camera convergence. Convergence of the real or virtual stereoscopic cameras can shift the range of portrayed depth to improve visual comfort, can adjust the disparity of targets to bring them nearer to the screen and reduce accommodation-vergence conflict, or can bring objects of interest into the binocular field of view. Although camera convergence is acknowledged as a useful function, there has been considerable debate over the transformation required. It is well known that rotational camera convergence or ``toe-in'' distorts the images in the two cameras producing patterns of horizontal and vertical disparities that can cause problems with fusion of the stereoscopic imagery. Behaviorally, similar retinal vertical disparity patterns are known to correlate with viewing distance and strongly affect perception of stereoscopic shape and depth. There has been little analysis of the implications of recent findings on vertical disparity processing for the design of stereoscopic camera and display systems. I ask how such distortions caused by camera convergence affect the ability to fuse and perceive stereoscopic images.
Resampling radially captured images for perspectively correct stereoscopic display
N. A. Dodgson
  3295  100-110  (1998)
When rendering or capturing stereoscopic images, two arrangements of the cameras are possible: radial (`toed-in') and parallel. In the radial case all of the cameras' axes pass through a common point; in the parallel case these axes are parallel to one another. The radial configuration causes distortions in the viewed stereoscopic image, manifest as vertical misalignments between parts of the images seen by the viewer's two eyes. The parallel case does not suffer from this distortion, and is thus considered to be the more correct method of capturing stereoscopic imagery. The radial case is, however, simpler to implement than the parallel: standard cameras or renderers can be used with no modification. In the parallel case special lens arrangements or modified rendering software is required. If a pinhole camera is assumed it should be readily apparent that the same light rays pass through the pinhole in the same directions whether the camera is aligned radially to or parallel to the other cameras. The difference lies in how these light rays are sampled to produce an image. In the case of a non-pinhole (real) camera, objects in focus should behave as for the pinhole case, while objects out of focus may behave slightly differently. The geometry of both radial and parallel cases is described and it is shown how a geometrical transform of an image produced in one case can be used to generate the image which would have been produced in the other case. This geometric transform is achieved by a resampling operation and various resampling algorithms are discussed. The resampling process can result in a degradation in the quality of the image. An indication of the type and severity of this degradation is given.
A survey of perceptual evaluations and requirements of three-dimensional {TV}
L. M. J. Meesters and W. A. IJsselsteijn and P. J. H. Seuntiens
IEEE Trans. on Circuits and Systems for Video Technology  14  381- 391  (2004)
A high-quality three-dimensional (3-D) broadcast service (3-D TV) is becoming increasingly feasible based on various recent technological developments combined with an enhanced understanding of 3-D perception and human factors issues surrounding 3-D TV. In this paper, 3-D technology and perceptually relevant issues, in particular 3-D image quality and visual comfort, in relation to 3-D TV systems are reviewed. The focus is on near-term displays for broadcast-style single- and multiple-viewer systems. We discuss how an image quality model for conventional two-dimensional images needs to be modified to be suitable for image quality research for 3-D TV. In this respect, studies are reviewed that have focused on the relationship between subjective attributes of 3-D image quality and physical system parameters that induce them (e.g., parameter choices in image acquisition, compression, and display). In particular, artifacts that may arise in 3-D TV systems are addressed, such as keystone distortion, depth-plane curvature, puppet theater effect, cross talk, cardboard effect, shear distortion, picket-fence effect, and image flipping. In conclusion, we summarize the perceptual requirements for 3-D TV that can be extracted from the literature and address issues that require further investigation in order for 3-D TV to be a success.
A Geometric Comparison of Algorithms for Fusion Control in Stereoscopic {HTD}s
Z. Wartell and L. F. Hodges and W. Ribarsky
IEEE Transactions on Visualization and Computer Graphics  8  129-143  (2002)

This paper concerns stereoscopic virtual reality displays in which the head is tracked and the display is stationary, attached to a desk, tabletop, or wall. These are called stereoscopic HTDs (Head-Tracked Display). Stereoscopic displays render two perspective views of a scene, each of which is seen by one eye of the user. Ideally, the user's natural visual system combines the stereo image pair into a single, 3D perceived image. Unfortunately, users often have difficulty fusing the stereo image pair. Researchers use a number of software techniques to reduce fusion problems. This paper geometrically examines and compares a number of these techniques and reaches the following conclusions: In interactive stereoscopic applications, the combination of view placement, scale, and either false eye separation or \alpha{\hbox{-}}{\rm false} eye separation can provide fusion control geometrically similar to image shifting and image scaling. However, in stereo HTDs, image shifting and image scaling also generate additional geometric artifacts not generated by the other methods. We anecdotally link some of these artifacts to exceeding perceptual limitations of human vision. While formal perceptual studies are still needed, geometric analysis suggests that image shifting and image scaling may be less appropriate than the other methods for interactive, stereo HTDs.
Stereoscopic transparency: Constraints on the perception of multiple surfaces
I. Tsirlin and R. S. Allison and L. M. Wilcox
Journal of Vision  8  1--10  (2008)

Stereo-transparency is an intriguing, but not well-understood, phenomenon. In the present experiment, we simultaneously manipulated the number of overlaid planes, density of elements, and depth separation between the planes in random dot stereograms to evaluate the constraints on stereoscopic transparency. We used a novel task involving identification of patterned planes among the planes constituting the stimulus. Our data show that observers are capable of segregating up to six simultaneous overlaid surfaces. Increases in element density or number of planes have a detrimental effect on the transparency percept. The effect of increasing the inter-plane disparity is strongly influenced by other stimulus parameters. This latter result can explain a difference in the literature concerning the role of inter-plane disparity in perception of stereo-transparency. We argue that the effects of stimuli parameters on the transparency percept can be accounted for not only by inhibitory interactions, as has been suggested, but also by the inherent properties of disparity detectors.
The camera convergence problem revisited
R. S. Allison
  5291  167-178  (2004) Camera convergence problem revisited.pdf
Convergence of the real or virtual stereoscopic cameras is an important operation in stereoscopic display systems. For example, convergence can shift the range of portrayed depth to improve visual comfort; can adjust the disparity of targets to bring them nearer to the screen and reduce accommodation-vergence conflict; or can bring objects of interest into the binocular field-of-view. Although camera convergence is acknowledged as a useful function, there has been considerable debate over the transformation required. It is well known that rotational camera convergence or 'toe-in' distorts the images in the two cameras producing patterns of horizontal and vertical disparities that can cause problems with fusion of the stereoscopic imagery. Behaviorally, similar retinal vertical disparity patterns are known to correlate with viewing distance and strongly affect perception of stereoscopic shape and depth. There has been little analysis of the implications of recent findings on vertical disparity processing for the design of stereoscopic camera and display systems. We ask how such distortions caused by camera convergence affect the ability to fuse and perceive stereoscopic images.
The World of {3-D} Movies
E. Sammons
Binocular Vision (Course Reader)
C. M. Schor
Human stereo matching is not restricted to epipolar lines
S. B. Stevenson and C. M. Schor
Vision Research  37  2717-2723  (1997)

Computational approaches to stereo matching have often taken advantage of a geometric constraint which states that matching elements in the left and right eye images will always fall on 'epipolar lines'. The use of this epipolar constraint reduces the search space from two dimensions to one, producing a tremendous saving in the computation time required to find the matching solution. Use of this constraint requires a precise knowledge of the relative horizontal, vertical and torsional positions of the two eyes, however, and this information may be unavailable in many situations. Experiments with dynamic random element stereograms reveal that human stereopsis can detect and identify the depth of matches over a range of both vertical and horizontal disparity. Observers were able to make accurate near/far depth discriminations when vertical disparity was as large as 45 arcmin, and were able to detect the presence of correlation over a slightly larger range. Thus, human binocular matching sensitivity is not strictly constrained to epipolar lines.
Visual fatigue caused by viewing stereoscopic motion images: Background, theories, and observations
K. Ukai and P. A. Howarth
Displays  29  106-116  (2007)

The background, theories, and observations on visual stress possibly caused by viewing stereoscopic motion images are reviewed. Visual fatigue caused by stereoscopic images is a safety issue. Fatigue is possible caused by the discrepancy between accommodative and convergence stimuli that are included in the image. Studies on accommodation and convergence are surveyed and an explanation regarding the characteristics of these functions is offered. Studies in the literature on changes in oculomotor function after viewing stereoscopic images, including changes in pupillary responses, are discussed. Evaluation of visual fatigue, particularly in relation to different methods of viewing stereoscopic displays is described.
Visual Experience of {3D} {TV}
P. J. H. Seuntiens
Visual discomfort in stereoscopic displays: a review
M. T. M. Lambooij and W. A. IJsselsteijn and I. Heynderickx
  6490    (2007)

Visual discomfort has been the subject of considerable research in relation to stereoscopic and autostereoscopic displays, but remains an ambiguous concept used to denote a variety of subjective symptoms potentially related to different underlying processes. In this paper we clarify the importance of various causes and aspects of visual comfort. Classical causative factors such as excessive binocular parallax and accommodation-convergence conflict appear to be of minor importance when disparity values do not surpass one degree limit of visual angle, which still provides sufficient range to allow for satisfactory depth perception in consumer applications, such as stereoscopic television. Visual discomfort, however, may still occur within this limit and we believe the following factors to be the most pertinent in contributing to this: (1) excessive demand of accommodation-convergence linkage, e.g., by fast motion in depth, viewed at short distances, (2) 3D artefacts resulting from insufficient depth information in the incoming data signal yielding spatial and temporal inconsistencies, and (3) unnatural amounts of blur. In order to adequately characterize and understand visual discomfort, multiple types of measurements, both objective and subjective, are needed.
Visual comfort of binocular and {3D} displays
F. L. Kooi and A. Toet
Displays  25  99-108  (2004)

Imperfections in binocular image pairs can cause serious viewing discomfort. For example, in stereo vision systems eye strain is caused by unintentional mismatches between the left and right eye images (stereo imperfections). Head-mounted displays can induce eye strain due to optical misalignments. We have experimentally determined the level of (dis)comfort experienced by human observers viewing brief presentations of imperfect binocular image pairs. We used a wide range of binocular image imperfections that are representative for commonly encountered optical errors (spatial distortions: shifts, magnification, rotation, keystone), imperfect filters (photometric asymmetries: luminance, color, contrast, crosstalk), and stereoscopic disparities. The results show that nearly all binocular image asymmetries seriously reduce visual comfort if present in a large enough amount. From our data we estimate threshold values for the onset of discomfort. The database collected in this study allows a more accurate prediction of visual comfort from the specification of a given binocular viewing system. Being able to predict the level of visual discomfort from the specification of binocular viewing systems greatly helps the design and selection process. This paper provides the basis.
Vergence--accommodation conflicts hinder visual performance and cause visual fatigue
D. M. Hoffman and A. R. Girshick and K. Akeley and M. S. Banks
Journal of Vision  8  1--30  (2008)

Three-dimensional (3D) displays have become important for many applications including vision research, operation of remote devices, medical imaging, surgical training, scientific visualization, virtual prototyping, and more. In many of these applications, it is important for the graphic image to create a faithful impression of the 3D structure of the portrayed object or scene. Unfortunately, 3D displays often yield distortions in perceived 3D structure compared with the percepts of the real scenes the displays depict. A likely cause of such distortions is the fact that computer displays present images on one surface. Thus, focus cues---accommodation and blur in the retinal image---specify the depth of the display rather than the depths in the depicted scene. Additionally, the uncoupling of vergence and accommodation required by 3D displays frequently reduces one's ability to fuse the binocular stimulus and causes discomfort and fatigue for the viewer. We have developed a novel 3D display that presents focus cues that are correct or nearly correct for the depicted scene. We used this display to evaluate the influence of focus cues on perceptual distortions, fusion failures, and fatigue. We show that when focus cues are correct or nearly correct, (1) the time required to identify a stereoscopic stimulus is reduced, (2) stereoacuity in a time-limited task is increased, (3) distortions in perceived depth are reduced, and (4) viewer fatigue and discomfort are reduced. We discuss the implications of this work for vision research and the design and use of displays.
Two factors in visual fatigue caused by stereoscopic {HDTV} images
S. Yano and M. Emoto and T. Mitsuhashi
Displays    141-150  (2004)

As respects with visual fatigue caused from watching stereoscopic HDTV images, we found that when stereoscopic HDTV images were displayed within the corresponding range of depth of focus, and remained still in the depth direction, the degree of visual fatigue was almost the same as that induced by watching images displayed at the depth of the screen. However, when images were displayed outside the corresponding range of depth of focus, visual fatigue was clearly induced. Moreover, we found that even if images were displayed within the corresponding range of depth of focus, visual fatigue was induced if the images were moved in depth according to a step pulse function.
Human Factors for Stereoscopic Images
K. Ukai
    1697-1700  (2006)

{TWISTER}: an immersive autostereoscopic display
K. Tanaka and J. Hayashi and M. Inami and S. Tachi
    59--66  (2004)

This paper describes in detail the design, development, and evaluation of the TWISTER III (Telexistence Wide-angle Immersive STEReoscope) and demonstrates how this system can display immersive three-dimensional full-color and live motion pictures without the need for special eye-wear. The device works as a cylindrical display by rotating 30 display units around an observer and presenting time-varying patterns, while immersive autostereoscopic vision is achieved by employing a ``rotating parallax barrier'' method. After explaining the principle, we discuss the designs and implementations for maximum performance in various aspects of the display. We also evaluate the display.
Subjective evaluation of visual fatigue caused by motion images
J. Kuze and K. Ukai
Displays  29  159--166  (2008)

A questionnaire was developed to subjectively assess visual fatigue caused by viewing various types of motion images. The questionnaire was evaluated using four types of moving images; playing a TV game using an HMD or a TV, viewing images with and without stabilization of camera shake, viewing a movie with and without colour break-up and viewing either a stereoscopic movie (anaglyph method) or a nonstereoscopic movie. Factor analysis revealed five factors: (1) Eye Strain, (2) General Discomfort, (3) Nausea, (4) Focusing Difficulty and (5) Headache, which were effective for classifying motion images.
Study of the dynamic interactions between vergence and accommodation
R. Suryakumar
Introduction: Accommodation (change in ocular focus) and Vergence (change in ocular alignment) are two ocular motor systems that interact with each other to provide clear single binocular vision. While retinal blur drives accommodation as a reflex, retinal disparity changes accommodative position through the convergence-accommodation (or simply vergence-accommodation, VA) cross-link. Similarly, while retinal disparity primarily drives the vergence system, a change in retinal blur alters vergence through the accommodative-convergence (AC) cross-link. Although much information is known on the individual response dynamics of blur accommodation and disparity vergence, very little is known about the cross-linkages AC and VA. VA represents the unique situation where a stimulus to vergence (retinal disparity) drives a change in accommodation. When these dynamic measures are compared to those of vergence and blur accommodation a better understanding of the critical or rate limiting step within the system of vergence and accommodation can be determined. Accordingly, the purpose of this thesis was to determine the response dynamics of vergence driven accommodation (VA) and compare the response parameters to simultaneous measures of disparity vergence and blur driven accommodation. Methods: A disparity stimulus generator (DSG) was modified to allow step stimulus demands of disparity to be created on a 0. 2 cpd non-accommodative difference of Gaussian target. Retinal disparity of different step amplitude demands were created as an ON / OFF paradigm and projected on a stereo monitor set at a distance of 1. 2m. Two experiments were conducted. The first experiment investigated the first order properties of VA in comparison to similar measures of blur driven accommodation (BA). The second study aimed at comparing the first order and second order dynamics of disparity vergence, VA and BA. In the first experiment, stimulus measures of vergence, vergence-accommodation and BA were studied. Six normal young adult subjects participated in the study. Accommodation was measured continuously at 25Hz with the commercially available PowerRefractor (Multichannel systems, Germany). A Badal optical system was designed and accommodative response to step stimulus demands were measured. VA and BA measures obtained from the PowerRefractor were matched and plotted as main sequences (amplitude vs. peak velocity). Peak velocities between the two responses were compared using two-way analysis of variance (ANOVA) with Bonferroni post-tests. In the second experiment, the response dynamics of vergence, vergence-accommodation, and blur accommodation were assessed and compared on 6 young adult subjects. Eye position was measured continuously by a stereo eye tracker at a sampling rate of 120Hz. A high speed photorefractor (sampling = 60Hz) was custom designed and synchronized with a stereo eye tracker to allow simultaneous measurement of vergence and VA. Monocular blur driven accommodation measures were also obtained with the Badal optometer and the high speed photorefractor (sampling = 75Hz). VA, BA and disparity vergence responses were analyzed and temporal parameters like latency, amplitude, duration, time to peak velocity, peak acceleration, duration of acceleration, and skewness were calculated. Main sequence plots (response amplitude vs. peak velocity) were generated and compared between disparity ON and disparity OFF. The dynamic measures of VA were compared to the measures of monocular blur driven accommodation. All comparisons were done using a two-way ANOVA with Bonferroni post-tests. Results: Study 1: The results showed that response amplitude of VA during disparity ON and disparity OFF paradigms was linearly related to the peak velocity for an amplitude range of 0. 5 to 2. 5 D (Disparity ON: peak velocity of vergence-accommodation = 0. 812 * amplitude + 1. 564, R2 = 0. 452, p<0. 0001 and Disparity OFF: peak velocity of vergence-accommodation = 1. 699* amplitude ? 0. 234, R2 = 0. 86, p <0. 0001). The rate of change of peak velocity as a function of response magnitude was lower for VA during disparity ON compared to VA during disparity OFF. BA responses also showed amplitude dependent dynamic properties (Accommodation peak velocity = 1. 593 * amplitude - 0. 008, R2 = 0. 84, p<0. 001; Dis-accommodation peak velocity = 1. 646 * amplitude - 0. 036, R2 = 0. 77, p<0. 001). There was no statistical difference in the velocity of accommodation and dis-accommodation. Study 2: When amplitudes were matched, disparity vergence response during disparity ON and disparity OFF had similar main sequence relationships. The mean values for the stimulus and response VA/V ratios were similar (0. 13$\pm$0. 05 D/Delta and 0. 15$\pm$0. 09 D/Delta respectively). All the temporal parameters of vergence-accommodation were similar during disparity ON and disparity OFF paradigms. When blur accommodation and vergence-accommodation measures were compared, all the first order and second order temporal parameters in the response were similar between the two systems. Also, disparity vergence exhibited significantly greater peak velocity and peak acceleration compared to two accommodation responses. The results also confirmed that the velocity of accommodation and dis-accommodation showed a statistically significant linear relationship as a function of amplitude for the range of amplitudes tested (Accommodation, y = 2. 55x + 0. 65, R2 = 0. 55, p<0. 0001; Dis-accommodation, y = 2. 66x + 0. 50, R2=0. 65, p<0. 0001). Conclusions: The dynamic properties of VA are amplitude dependent. Although initial results from study 1 suggested that VA may be slower during disparity ON, the results from study 2 using the high speed photorefractor and an improved analysis procedure showed that VA responses were equally fast between disparity ON (convergence) and disparity OFF (divergence). All temporal properties of VA were independent of vergence type (convergence/divergence). VA and BA have similar dynamic properties in humans suggesting that they may controlled by a common neural pathway or limited by the plant. Also, when compared to accommodation responses, disparity vergence exhibited greater velocities and accelerations reflecting the differences in the magnitude of neural innervation and plant mechanics between the two systems. The study also confirmed amplitude dependent response dynamics of blur driven accommodation and dis-accommodation.
Stereoscopic displays and visual comfort: a review
M. Lambooij and W. A. IJsselsteijn and I. Heynderickx
SPIE Newsroom      (2007)

We suggest a rule of thumb for comfortably viewing stereoscopic displays, and also highlight circumstances under which visual comfort may be compromised.
Repeated Vergence Adaptation Causes the Decline of Visual Functions in Watching Stereoscopic Television
M. Emoto and T. Niida and F. Okano
Journal of Display Technology  1  328-340  (2005)
To evaluate visual fatigue when viewing stereoscopic TV, a technology expected to become the broadcasting display system of the future. Wide public acceptance of stereoscopic TV awaits resolution of many issues, including visual fatigue on viewing TV images. Visual fatigue was induced using a visual function simulator, consisting of prism and lens optical systems, while viewing stereoscopic TV. We assessed subject visual fatigue through subjective reports of symptoms and by the changes in visual functions. These functions included: viewer B [Js fusional break point, recovery point, accommodation step response, and visual evoked cortical potentials (VECP)]. Significant changes of some visual functions were found after watching simulated stereoscopic TV when the vergence load was heavy or when it changed over time; relative vergence limits decreased and the latency of VECP increased after watching, reflecting visual fatigue. After subjects rested, relative vergence limits recovered to pre-viewing levels. Our findings lead us to conclude that, aside from excessive horizontal binocular parallax, discontinuous changes in parallax is also a major factor that contributes to visual fatigue in the viewing of stereoscopic images. It also causes a decreased range of relative vergence, accommodation response, and a delay in the P100 latency of VECP.
A Theory of Human Stereo Vision
D. Marr and T. Poggio
An algorithm is proposed for solving the stereoscopic matching problem. The algorithm consists of five steps: 1.) Each image is filtered with bar masks of four sizes that vary with eccentricity; the equivalent filters are about one octave wide. 2.) Zero-crossings of the mask values are localized, and positions that correspond to terminations are found. 3.) For each mask size, matching takes place between pairs of zero crossings or terminations of the same sign in the two images, for a range of disparities up to about the width of the mask's central region. 4.) Wide masks can control vergence movements, thus causing small masks to come into correspondence. 5.) When a correspondence is achieved, it is written into a dynamic buffer, called the 2-1/2-D sketch. It is shown that this proposal provides a theoretical framework for most existing psychophysical and neurophysiological data about stereopsis. Several critical experimental predictions are also made, for instance about the size of Panum's area under various conditions. The results of such experiments would tell us whether, for example, cooperativity is necessary for the fusion process.
Stereoscopic camera and viewing systems with undistorted depth presentation and reduced or eliminated erroneous acceleration and deceleration perceptions, or with perceptions produced or enhanced for special effects
D. B. Diner
Methods for providing stereoscopic image presentation and stereoscopic configurations using stereoscopic viewing systems having converged or parallel cameras may be set up to reduce or eliminate erroneously perceived accelerations and decelerations by proper selection of parameters, such as an image magnification factor q and intercamera distance 2w. For converged cameras, q is selected to be equal to Ve-qwl=0, where V is the camera convergence distance, e is half the interocular distance of an observer, w is half the intercamera distance and l is the actual distance from said first nodal point of each camera to said convergence point, and for parallel cameras, q is selected to be equal to e/w. While converged cameras cannot be set up to provide fully undistorted three-dimensional views, they can be set up to provide a linear relationship between real and apparent depth and thus minimize erroneously perceived accelerations and decelerations for three sagittal planes, x=-w, x=0 and x=+w which are indicated to the observer. Parallel cameras can be set up to provide fully undistorted three-dimensional views by controlling the location of the observer and by magnification and shifting of left and right images. In addition, the teachings of this disclosure can be used to provide methods of stereoscopic image presentation and stereoscopic camera configurations to produce a nonlinear relation between perceived and real depth, and erroneously produce or enhance perceived accelerations and decelerations in order to provide special effects for entertainment, training or educational purposes.
Spatial size limits in stereoscopic vision
B. Y. Schlesinger and Y. Yeshurun
Spatial Vision  11  279--293  (1998)

Stereoscopic vision is extremely precise in detecting minute differences between adjacent depth planes, but quite imprecise in estimating absolute depth. In this paper, we address the issue of the spatial acuity (and not the stereo acuity) of stereopsis. Static RDS (random dot stereograms) stimuli were used to find the spatial grain in which human stereoscopic vision operates. Using psychophysical experiments it was found that foveally, stimuli smaller than 8' cannot be accurately perceived. For other eccentricities, it was found that this threshold is inversely proportional to the Cortical Magnification factor. We interpret this spatial size limit, which is an order of magnitude larger than visual spatial acuity, as an indication that stereopsis is an area based comparison rather than a point process, and discuss the relations between the cortical 'patch' size that corresponds to this 8' limit and Ocular Dominance Columns.
Natural problems for stereoscopic depth perception in virtual environments
J. P. Wann and S. Rushton and M. Mon-Williams
Vision Research  35  2731--2736  (1995)

The use of virtual reality (VR) display systems has escalated over the last 5 yr and may have consequences for those working within vision research. This paper provides a brief review of the literature pertaining to the representation of depth in stereoscopic VR displays. Specific attention is paid to the response of the accommodation system with its cross-links to vergence eye movements, and to the spatial errors that arise when portraying three-dimensional space on a two-dimensional window. It is suggested that these factors prevent large depth intervals of three-dimensional visual space being rendered with integrity through dual two-dimensional arrays.
Mapping perceived depth to regions of interest in stereoscopic images
N. S. Holliman
Proc. SPIE Stereoscopic Image Processing and Rendering  5291    (2004)

The usable perceived depth range of a stereoscopic 3D display is limited by human factors considerations to a defined range around the screen plane. There is therefore a need in stereoscopic image creation to map depth from the scene to a target display without exceeding these limits. Recent image capture methods provide precise control over this depth mapping but map a single range of scene depth as a whole and are unable to give preferential stereoscopic representation to a particular region of interest in the scene. A new approach to stereoscopic image creation is described that allows a defined region of interest in scene depth to have an improved perceived depth representation compared to other regions of the scene. For example in a game this may be the region of depth around a game character, or in a scientific visualization the region around a particular feature of interest. To realize this approach we present a novel algorithm for stereoscopic image capture and describe an implementation for the widely used ray-tracing package POV-Ray. Results demonstrate how this approach provides content creators with improved control over perceived depth representation in stereoscopic images.
Controlling Perceived Depth in Stereoscopic Images
G. R. Jones and D. Lee and N. S. Holliman and D. Ezra
Proc. SPIE Stereoscopic Displays and Virtual Reality Systems VIII  4297  42--53  (2001)

Stereoscopic images are hard to get right, and comfortable images are often only produced after repeated trial and error. The main difficulty is controlling the stereoscopic camera parameters so that the viewer does not experience eye strain or double images from excessive perceived depth. Additionally, for head tracked displays, the perceived objects can distort as the viewer moves to look around the displayed scene. We describe a novel method for calculating stereoscopic camera parameters with the following contributions: (1) Provides the user intuitive controls related to easily measured physical values. (2) For head tracked displays; necessarily ensures that there is no depth distortion as the viewer moves. (3) Clearly separates the image capture camera/scene space from the image viewing viewer/display space. (4) Provides a transformation between these two spaces allowing precise control of the mapping of scene depth to perceived display depth. The new method is implemented as an API extension for use with OpenGL, a plug-in for 3D Studio Max and a control system for a stereoscopic digital camera. The result is stereoscopic images generated correctly at the first attempt, with precisely controlled perceived depth. A new analysis of the distortions introduced by different camera parameters was undertaken.
A stereo display prototype with multiple focal distances
K. Akeley and S. J. Watt and A. R. Girshick and M. S. Banks
ACM Trans. Graph.  23  804--813  (2004)

Typical stereo displays provide incorrect focus cues because the light comes from a single surface. We describe a prototype stereo display comprising two independent fixed-viewpoint volumetric displays. Like autostereoscopic volumetric displays, fixed-viewpoint volumetric displays generate near-correct focus cues without tracking eye position, because light comes from sources at the correct focal distances. (In our prototype, from three image planes at different physical distances.) Unlike autostereoscopic volumetric displays, however, fixed-viewpoint volumetric displays retain the qualities of modern projective graphics: view-dependent lighting effects such as occlusion, specularity, and reflection are correctly depicted; modern graphics processor and 2-D display technology can be utilized; and realistic fields of view and depths of field can be implemented. While not a practical solution for general-purpose viewing, our prototype display is a proof of concept and a platform for ongoing vision research. The design, implementation, and verification of this stereo display are described, including a novel technique of filtering along visual lines using 1-D texture mapping.
A pilot study on pupillary and cardiovascular changes induced by stereoscopic video movies
H. Oyamada and A. Iijima and A. Tanaka and K. Ukai and H. Toda and N. Sugita and M. Yoshizawa and T. Bando
Journal of NeuroEngineering and Rehabilitation  4    (2007)

Background Taking advantage of developed image technology, it is expected that image presentation would be utilized to promote health in the field of medical care and public health. To accumulate knowledge on biomedical effects induced by image presentation, an essential prerequisite for these purposes, studies on autonomic responses in more than one physiological system would be necessary. In this study, changes in parameters of the pupillary light reflex and cardiovascular reflex evoked by motion pictures were examined, which would be utilized to evaluate the effects of images, and to avoid side effects. Methods Three stereoscopic video movies with different properties were field-sequentially rear-projected through two LCD projectors on an 80-inch screen. Seven healthy young subjects watched movies in a dark room. Pupillary parameters were measured before and after presentation of movies by an infrared pupillometer. ECG and radial blood pressure were continuously monitored. The maximum cross-correlation coefficient between heart rate and blood pressure, rho_max, was used as an index to evaluate changes in the cardiovascular reflex. Results Parameters of pupillary and cardiovascular reflexes changed differently after subjects watched three different video movies. Amplitudes of the pupillary light reflex, CR, increased when subjects watched two CG movies (movies A and D), while they did not change after watching a movie with the real scenery (movie R). The rho_max was significantly larger after presentation of the movie D. Scores of the questionnaire for subjective evaluation of physical condition increased after presentation of all movies, but their relationship with changes in CR and rho_max was different in three movies. Possible causes of these biomedical differences are discussed. Conclusion The autonomic responses were effective to monitor biomedical effects induced by image presentation. Further accumulation of data on multiple autonomic functions would contribute to develop the tools which evaluate the effects of image presentation to select applicable procedures and to avoid side effects in the medical care and rehabilitation.
Interactive Simulation of the Human Eye Depth of Field and Its Correction by Spectacle Lenses
M. Kakimoto and T. Tatsukawa and Y. Mukai and T. Nishita
Computer Graphics Forum  26  627--636  (2007)
This paper describes a fast rendering algorithm for verification of spectacle lens design. Our method simulates refraction corrections of astigmatism as well as myopia or presbyopia. Refraction and defocus are the main issues in the simulation. For refraction, our proposed method uses per-vertex basis ray tracing which warps the environ- ment map and produces a real-time refracted image which is subjectively as good as ray tracing. Conventional defocus simulation was previously done by distribution ray tracing and a real-time solution was impossible. We introduce the concept of a blur field, which we use to displace every vertex according to its position. The blurring information is precomputed as a set of field values distributed to voxels which are formed by evenly subdividing the perspective projected space. The field values can be determined by tracing a wavefront from each voxel through the lens and the eye, and by evaluating the spread of light at the retina considering the best human accommoda- tion effort. The blur field is stored as texture data and referred to by the vertex shader that displaces each vertex. With an interactive frame rate, blending the multiple rendering results produces a blurred image comparable to distribution ray tracing output.
The Depth of Field of the Human Eye
F. W. Campbell
Journal of Modern Optics  4  157--164  (1957)

(résumé en anglais également disponible) 1 . Description d'une méhode simple mais exacte permettant de mesurer la profondeur du champ visuel oculaire. 2 . Pour une ouverture donnée de la pupille, la distance hyperfocale augmente en raison directe du log 10 de la luminance du fond. 3 . Si l'illumination de la rétine reste constante et que l'ouverture de la pupille varie, la profondeur du champ visuel varie approximativement en raison inverse du diamètre de la pupille. Lorsque le diamètre dépasse 2 5 mm l'observation ne correspond plus à la théorie, l'écart peut être expliqué par l'intervention de l'effet Stiles-Crawford. 4 . Des mesures effectuées avec des champs de couleurs différentes mais de luminances égales ont confirmé que la profondeur du champ visuel se trouve modifiée par l'effet Stiles-Crawford. 5 . La correction des aberrations chromatiques au moyen d'une lentille spéciale diminue la profondeur du champ. 6 . La valeur minimum de la profondeur du champ constatée dans les meilleures conditions est de 0 3 D, le diamètre de la pupille étant de 3 mm.
Photo-Realistic Depth-of-Field Effects Synthesis Based on Real Camera Parameters
H.-Y. Lin and K.-D. Gu
  4841  298--309  (2007)

Depth-of-field (DOF) is an important visual cue used for computer graphics and photography to illustrate the focus of attention. In this work, we present a method for photo-realistic DOF simulation based on the characteristics of a real camera system. Both the depth-blur relation for different camera focus settings and the nonlinear intensity response of image sensors are modeled. The camera parameters are calibrated and used for defocus blur synthesis. For a well-focused real scene image, the DOF effect is generated by spatial convolution with a distance dependent circular Gaussian mask. Experiments have shown that the difference between the images synthesized using the proposed method and the real images captured by a camera is almost indistinguishable.
Depth of focus, eye size and visual acuity
D. G. Green and M. K. Powers and M. S. Banks
Vision Research  20  827--835  (1980)
We develop formulas for calculating the approximate depth of focus of any eye. They show that the magnitude of depth of focus is inversely proportional to the size of the eye and to its visual acuity. One particular implication of these quantitative relations, which is supported by previous data from rats and human infants, is that small eyes with low acuity should have large depths of focus. We show that the observed relation between defocus and contrast sensitivity in rats in predicted by our formulas. We also analyze recent findings in human infants and show that they demonstrate a good correspondence between the improvement in accuracy of the accommodative response with age and the reduction in depth of focus (predicted from our formulas) as acuity and eye size increase over the same age range. Optical factors such as astigmatism, refractive error and chromatic and spherical aberration should have no effect on visual resolution unless they exceed the depth of focus of an eye. Thus, our arguments imply that these factors may be relatively unimportant in small eyes with low acuity.
The depth-of-field of the human eye from objective and subjective measurements
S. Marcos and E. Moreno and R. Navarro
Vision Research  39  2039--2049  (1999)

The depth-of-field (DOF) measured through psychophysical methods seems to depend on the target's characteristics. We use objective and subjective methods to determine the DOF of the eye for different pupil diameters and wavelengths in three subjects. Variation of image quality with focus is evaluated with a double-pass technique. Objective DOF is defined as the dioptric range for which the image quality does not change appreciably, based on optical criteria. Subjective DOF is based on the accuracy of focusing a point source. Additional DOFs are obtained by simulation from experimental wavefront aberration data from the same subjects. Objective and subjective measurements of DOF are only slightly affected by pupil size, wavelength and spectral composition. Comparison of DOF from double-pass and wavefront aberration data allows us to evaluate the role of ocular aberrations and Stiles-Crawford effect.
Effects of an Eyeglass-free {3-D} Display on the Human Visual System
Y. Suzuki and Y. Onda and S. Katada and S. Ino and T. Ifukube
Japanese Journal of Ophthalmology  48  1--6  (2004)

Purpose To investigate the effect on the human visual system of viewing 3-dimensional (3-D) computer graphics (CG) images with an eyeglass-free rear-cross-lenticular-type 3-D display. Methods Positive accommodation velocity (GRAD) during the accommodative step response was measured in ten healthy young adults before and after they viewed CG images. Although the distance between the viewer and the 3-D display was 600thinspmm, the apparent distance between the viewer and the virtual object was varied (515, 600, and 722thinspmm) by changing the visual disparity. Results A significant slowdown of average GRAD was observed by 60thinspmin after a 30-min 3-D viewing of 3-D CG images [P Lt 0.05, analysis of variance (ANOVA)] but not after a 30-min viewing of the CG images on a 2-D display or after a 15-min 3-D viewing. When the virtual object was at 722thinspmm, a significant slowdown of average GRAD was observed only at 30thinspmin after the 30-min 3-D viewing (P Lt 0.05, ANOVA). When the virtual object was at 515thinspmm, a significant slowdown of average GRAD was observed at 30 and 60thinspmin after the 3-D viewing (P Lt 0.05, ANOVA). Conclusions The effect on the human visual system of 3-D viewing of 3-D CG images depends on both the duration of the viewing and the apparent distance between the viewer and the virtual objects.
Fixation disparity and accommodation as a function of viewing distance and prism load
W. Jaschinski
Ophthalmic and Physiological Optics  17  324--339  (1997)

Fixation disparity was measured with dichoptically presented nonius lines at viewing distances of 20, 30, 40, 60, and 100cm, so that both vergence and accommodation were stimulated adequately as in normal vision. As the viewing distance was shortened, mean fixation disparity changed monotonically from 1 min arc eso (i.e., the eyes converged in front of the target) to 3 min arc exo. The average standard deviation of the psychometric function of fixation disparity, which is a measure of the temporal variability of vergence, was smallest at 100cm (when fixation disparity was eso) and increased as viewing distance decreased. Fixation disparity itself and the change of fixation disparity with distance differed reliably among subjects with normal binocular vision, but neither was related to the momentary degree of accommodation. Fixation disparity was also measured at a constant distance of 40 cm, but with prisms in front of the eyes that induced the same vergence angles as viewing distances between 20 and 100cm. The slope of these conventional fixation disparity curves as a function of prism load was generally larger than the slope of fixation disparity as a function of viewing distance (which can be explained by accommodative vergence), but the slopes of the two types of curves were correlated (r= 0.39, P= 0.02, n= 25).
Misperceptions in stereoscopic displays: a vision science perspective
R. T. Held and M. S. Banks
    23--32  (2008)

3d shape and scene layout are often misperceived when viewing stereoscopic displays. For example, viewing from the wrong distance alters an object's perceived size and shape. It is crucial to understand the causes of such misperceptions so one can determine the best approaches for minimizing them. The standard model of misperception is geometric. The retinal images are calculated by projecting from the stereo images to the viewer's eyes. Rays are back-projected from corresponding retinal-image points into space and the ray intersections are determined. The intersections yield the coordinates of the predicted percept. We develop the mathematics of this model. In many cases its predictions are close to what viewers perceive. There are three important cases, however, in which the model fails: 1) when the viewer's head is rotated about a vertical axis relative to the stereo display (yaw rotation); 2) when the head is rotated about a forward axis (roll rotation); 3) when there is a mismatch between the camera convergence and the way in which the stereo images are displayed. In these cases, most rays from corresponding retinal-image points do not intersect, so the standard model cannot provide an estimate for the 3d percept. Nonetheless, viewers in these situations have coherent 3d percepts, so the visual system must use another method to estimate 3d structure. We show that the non-intersecting rays generate vertical disparities in the retinal images that do not arise otherwise. Findings in vision science show that such disparities are crucial signals in the visual system's interpretation of stereo images. We show that a model that incorporates vertical disparities predicts the percepts associated with improper viewing of stereoscopic displays. Improving the model of misperceptions will aid the design and presentation of 3d displays.
A study of visual fatigue and visual comfort for {3D} {HDTV}/{HDTV} images
S. Yano and S. Ide and T. Mitsuhashi and H. Thwaites
Displays  23  191--201  (2002)

We compared visual fatigue caused by HDTV and stereoscopic HDTV at the viewing distance of 4.5 m. We measured the degree of visual fatigue using a subjective test method and compared it to an objective measure, namely the change in accommodation before and after viewing images. We also detected visual discomfort image scenes from the best program using the single stimulus continuous quality method. As a result, the mechanism between convergence eye movement and the accommodation function in the depth of focus in addition to conflict of convergence eye movement and accommodation function appears to affect visual fatigue. From the examination between the results of single stimulus quality evaluation and the feature characteristics of test stereoscopic images, we found that a local low subjective evaluation appeared for both high degree of parallax and amount of motion in the test stereoscopic images. But motion components were very small, the subjective evaluation value was rarely very low.
The Stereoscopic Cinema: From Film to Digital Projection
L. Lipton
SMPTE Journal    586--593  (2001)

A noteworthy improvement in the projection of stereoscopic moving images is taking place; the image is clear and easy to view. Moreover, the setup of projection is simplified, and requires no tweaking for continued performance at a high-quality level. The new system of projection relies on the Texas Instruments Digital Micromirror Device (DMD), and the basis for this paper is the Christie Digital Mirage 5000 projector and StereoGraphics selection devices, CrystalEyes and the ZScreen.
The Stereographics Developer's Handbook - Background on Creating Images for CrystalEyes® and SimulEyes®.

Image Distortions in Stereoscopic Video Systems
A. Woods and T. Docherty and R. Koch
  1915  36--48  (1993)
This paper discusses the origins, characteristics andeffects of image distortions instereoscopic videosystems. The geometry of stereoscopic camera and display systems is presented first. This is followed by the analysis and diagrammatic presentationof various image distortions suchas depthplane curvature, depthnon-linearity, depth and size magnification, shearingdistortionandkeystone distortion. The variation of system parameters is also analysed with the help of plots of image geometry to show their effects on image distortions. The converged(toed-in) and parallel camera configurations are compared and the amount of vertical parallax induced by lens distortion and keystone distortion are discussed. The range of acceptable vertical parallax and the convergence / accommodation limitations on depth range are also discussed. It is shown that a number of these distortions can be eliminated by the appropriate choice of camera and display system parameters. There are some image distortions, however, which cannot be avoided due to the nature of human vision and limitations of current stereoscopic video display techniques.
Foundations of the Stereoscopic Cinema
L. Lipton

Since its abortive boom in the early fif- ties, the three-dimensional, or stereo- scopic cinem,a has been the stepchild of filmmaking, discredited in the minds of filmmakers, producers, and audiences. Lenny Lipton, author of the classic In- dependent Filmmaking, n ow shows the way toward a 3-D revival. After six years of experimentation, design work,. and filmmaking, Lipton has distilled his ex- periences in a spirited and profusely il- lustrated account that will appeal to pro- fessional and student filmmakers, cinema historians, optical engineers, and equipment designers, and even to avid lovers of film. For the first time a coherent system of stereoscopic cinematography is pre- sented. No longer need each worker entering the field start from scratch. The thoroughly original concepts outlined are theoretically sound, yet based entirely on practical experience. Lipton describes the basic principles for engineering p ractica ble stereoscopic motion picture systems, essential information since so little off-the-shelf equipment is available at this time. The principles and specifications will serve as a guide to engineers and filmmaker-designers. Also included are discussions of the psy- chology of binocular depth perception, the physics of light polarization, a com- prehensive historical-technical account of the three-dimensional medium, as well as the author's original work con- cerning the new method of stereoscopic cinematography and the overall engineering parameters of such systems. Although the concepts presented in this book are based largely on experiments performed with small formats, Lipton and his colleagues have provided 35mm motion picture technology based on these experiments to produce a promo- tional film for the Oldsmobile Division of General Motors and for E.O. Corpora- tion's feature film, Rottweiler. Foundations of the Stereoscopic Cinema is the essential book on three-dimen- sional filmmaking. With it, Lenny Lipton has given us another book that will dramatically affect the way in which films are made and thought about. Lenny Lipton is the author of Independ- ent Filmmaking, The Super 8 Book, and Lipton on Filmmaking, and his articles on films and filmmaking have appeared in numerous magazines and iournals, in- cluding Super-8 Filmmaker and Amer- ican Cinemutogrupher. His more than twenty-five independently produced films have been screened at the Whitney Museum, Cinematheque Francaise, the Smithsonian Institution, and the Pacific Film Archives, and several have won festival prizes. In addition to an award from the American Film Institute, Lipton has received grants from the National Education Association and from the California Arts Council for his research in stereoscopic filmmaking. He is president of Stereographics Corp., a company devoted to research and development in stereoscopic imaging techniques. A member of the Society of Motion Picture and Television Engineers, he has taught filmmaking at the San Francisco Art In- stitute and the University of California.
{3-D} Cinematography
R. Hummel
American Cinematographer    52-63  (2008)

Scale-Space and Edge Detection Using Anisotropic Diffusion
P. Perona and J. Malik
IEEE Transactions on Pattern Analysis and Machine Intelligence  12  629-639  (1990)

New view synthesis for stereo cinema by hybrid disparity remapping
F. Devernay and S. Duchêne
    5 -8  (2010)

The 3-D shape perceived from viewing a stereoscopic movie depends on the viewing conditions, most notably on the screen size and distance, and depth and size distortions appear because of the differences between the shooting and viewing geometries. When the shooting geometry is constrained, or when the same stereoscopic movie must be displayed with different viewing geometries (e.g. in a movie theater and on a 3DTV), these depth distortions may be reduced by new view synthesis techniques. They usually involve three steps: computing the stereo disparity, computing a disparity-dependent 2-D mapping from the original stereo pair to the synthesized views, and finally composing the synthesized views. In this paper, we compare different disparity-dependent mappings in terms of perceived shape distortion and alteration of the images, and we propose a hybrid mapping which does not distort depth and minimizes modifications of the image content.
Probabilistic reliability based view synthesis for {FTV}
L. Yang and T. Yendo and M. Panahpour Tehrani and T. Fujii and M. Tanimoto
    1785 -1788  (2010)

View synthesis using depth maps is an important application in 3D image processing. In this paper, a novel method is proposed for the plausible view synthesis of Free-viewpoint TV (FTV), using two input images and their depth maps. The depth estimation based on stereo matching is known to be error-prone, leading to noticeable artifacts in the synthesized new views. To produce high-quality view synthesis, we introduce a probabilistic framework which constrains the reliability of each pixel of new view by Maximizing Likelihood (ML). The spatial adaptive reliability is provided by incorporating Gamma hyper-prior and the synthesis error approximation. Furthermore, we generate the virtual view by solving a Maximum a Posterior (MAP) problem using graph cuts. We compare the proposed method with other depth based view synthesis approaches on MPEG test sequences. The results show the outperformance of our method both at subjective artifacts reduction and objective PSNR improvement.
A Viewer-Centric Editor for {3D} Movies
S. J. Koppal and C. L. Zitnick and M. Cohen and S. B. Kang and B. Ressler and A. Colburn
IEEE Computer Graphics and Applications  31  20-35  (2011)

A digital editor provides the timeline control necessary to tell a story through film. Current technology, although sophisticated, does not easily extend to 3D cinema because stereoscopy is a fundamentally different medium for expression and requires new tools. We formulated a mathematical framework for use in a viewer-centric digital editor for stereoscopic cinema driven by the audience's perception of the scene. Our editing tool implements this framework and allows both shot planning and after-the-fact digital manipulation of the perceived scene shape. The mathematical framework abstracts away the mechanics of converting this interaction into stereo parameters, such as interocular, field of view, and location. We demonstrate cut editing techniques to direct audience attention and ease scene transitions. User studies were performed to examine these effects.
Novel view synthesis for stereoscopic cinema: detecting and removing artifacts
F. Devernay and A. R. Peon
    25--30  (2010)
Novel view synthesis methods consist in using several images or video sequences of the same scene, and creating new images of this scene, as if they were taken by a camera placed at a different viewpoint. They can be used in stereoscopic cinema to change the camera parameters (baseline, vergence, focal length...) a posteriori, or to adapt a stereoscopic broadcast that was shot for given viewing conditions (such as a movie theater) to a different screen size and distance (such as a 3DTV in a living room). View synthesis from stereoscopic movies usually proceeds in two phases: First, disparity maps and other viewpoint-independent data (such as scene layers and matting information) are extracted from the original sequences, and second, this data and the original images are used to synthesize the new sequence, given geometric information about the synthesized viewpoints. Unfortunately, since no known stereo method gives perfect results in all situations, the results of the first phase will most probably contain errors, which will result in 2D or 3D artifacts in the synthesized stereoscopic movie. We propose to add a third phase where these artifacts are detected and removed is each stereoscopic image pair, while keeping the perceived quality of the stereoscopic movie close to the original.