Time-of-Flight Camera - An Introduction
by Larry Li, Texas Instruments
1. Introduction
3D Time-of-Flight (TOF) technology is revolutionizing the machine vision industry by providing 3D imaging using a
low-cost CMOS pixel array
together with an active modulated light source. Compact construction, easy-of-use, together with high accuracy and
frame-rate makes TOF cameras an
attractive solution for a wide range of applications. In this article, we will cover the basic s of TOF operation,
and compare TOF with other 2D/3D vision
technologies. Then various applications that benefit from TOF sensing, such as gesturing and 3D scanning and
printing, are explored. Finally, resources that
help readers get started with Texas Instruments' 3D TOF solution are provided.
2. Theory of Operation
A 3D time-of-flight (TOF) camera works by illuminating the scene with a modulated light source, and observing the
reflected light. The phase
shift between the illumination and the reflection is measured and translated to distance. Figure 1 illustrates the
basic TOF concept. Typically, the illumination is from a solid-state laser or a LED
operating in the near-infrared range (~850nm) invisible to the human eyes. An imaging sensor designed to respond to
the same spectrum receives the light and converts the photonic energy to
electrical current. Note that the light entering the sensor has an ambient component and a reflected component.
Distance (depth) information is only embedded in the reflected component.
Therefore, high ambient component reduces the signal to noise ratio (SNR).
Figure 1: 3D time-of-flight camera operation.
To detect phase shifts between the illumination and the reflection, the light source is pulsed or modulated by a
continuous-wave (CW), source, typically a sinusoid or square wave. Square wave modulation is more common because it
can be easily realized using digital circuits [5].
Pulsed modulation can be achieved by integrating photoelectrons from the reflected light, or by starting a fast
counter at the first detection of the reflection. The latter requires a fast photo-detector, usually a single-photon
avalanche diode (SPAD). This counting approach necessitates fast electronics, since achieving 1 millimeter accuracy
requires timing a pulse of 6.6 picoseconds in duration. This level of accuracy is nearly impossible to achieve in
silicon at room temperature [1].
Figure 2: Two time-of-flight methods: pulsed (top) and continuous-wave
(bottom).
The pulsed method is straightforward. The light source illuminates for a brief period (∆t), and the reflected
energy is sampled at every pixel, in parallel, using two out-of-phase windows, C1 and C2, with the same ∆t.
Electrical charges accumulated during these samples, Q1 and Q2, are measured and used to compute distance using the
formula:
In contrast, the CW method takes multiple samples per measurement, with each sample phase-stepped by 90 degrees,
for a total of four samples. Using this technique, the phase angle between illumination and reflection, φ, and the
distance, d, can be calculated by
It follows that the measured pixel intensity (A) and offset (B) can be computed by:
In all of the equations, c is the speed-of-light constant.
At first glance, the complexity of the CW method, as compared to the pulsed method, may seemed unjustified, but a
closer look at the CW equations reveals that the terms, (Q3 - Q4) and (Q1 - Q2) reduces the effect of constant
offset from the measurements. Furthermore, the quotient in the phase equation reduces the effects of constant gains
from the distance measurements, such as system amplification and attenuation, or the reflected intensity. These are
desirable properties.
The reflected amplitude (A) and offset (B) do have an impact the depth measurement accuracy. The depth measurement
variance can be approximated by:
The modulation contrast, ð‘ð‘‘, describes how well the TOF sensor separates and collects the photoelectrons. The
reflected amplitude, ð´, is a function of the optical power. The offset, ðµ, is a function of the ambient light
and residual system offset. One may infer from Equation 6 that high amplitude, high modulation frequency and high
modulation contrast will increase accuracy; while high offset can lead to saturation and reduce accuracy.
At high frequency, the modulation contrast can begin to attenuate due to the physical property of the silicon. This
puts a practical upper limit on the modulation frequency. TOF sensors with high roll-off frequency generally can
deliver higher accuracy.
The fact that the CW measurement is based on phase, which wraps around every 2Ï€, means the distance will also have
an aliasing distance. The distance where aliasing occurs is called the ambiguity distance, 􀀃amb, and is defined
as:
Since the distance wraps, 􀀃amb is also the maximum measurable distance. If one wishes to extend the measurable
distance, one may reduce the modulation frequency, but at the cost of reduced accuracy, as according to Equation 6.
Instead of accepting this compromise, advanced TOF systems deploy multi-frequency techniques to extend the distance
without reducing the modulation frequency. Multi-frequency techniques work by adding one or more modulation
frequencies to the mix. Each modulation frequency will have a different ambiguity distance, but true location is the
one where the different frequencies agree. The frequency of when the two modulations agree, called the beat
frequency, is usually lower, and corresponds to a much longer ambiguity distance. The dual-frequency concept is
illustrated below.
Figure 3: Extending distance using a multi-frequency technique [6].
3. Point Cloud
In TOF sensors, distance is measured for every pixel in a 2D addressable array, resulting in a depth map. A depth
map is a collection of 3D points (each point also known as a voxel). As an example, a QVGA sensor will have a depth
map of 320 x 240 voxels. 2D representation of a depth map is a gray-scale image, as is illustrated by the soda cans
example in Figure 4- the brighter the intensity, the closer the voxel. Figure 4 shows the depth map of a group of
soda cans.
Figure 4: Depth map of soda cans.
Alternatively, a depth map can be rendered in a three-dimensional space as a collection of points, or point-cloud.
The 3D points can be mathematically connected to form a mesh onto which a texture surface can be mapped. If the
texture is from a real-time color image of the same subject, a life-like 3D rendering of the subject will emerge, as
is illustrated by the avatar in Figure 5. One may be able to rotate the avatar to view different perspectives.
Figure 5: Avatar formed from point-cloud.
4. Other Vision Technologie
Time-of-flight technology is not the only vision technology available. In this section, we will compare TOF to the
classical 2D machine vision and other 3D vision technologies. A table summarizing the comparison is included at the
end of this section.
2D Machine Vision
Most machine vision systems deployed today are 2D, a cost-effective approach when lighting is closely controlled.
They are well-suited for inspection applications where defects are detected using well-known image processing
techniques, such as edge detection, template matching and morphology open/close. These algorithms extract critical
feature parameters that are compared to a database for pass-fail determination. To detect defects along the z-axis,
an additional 1D sensor or 3D vision is often deployed.
2D vision could be used in unstructured environment as well with the aid of advanced image processing algorithms to
get around complications caused by varying illumination and shading conditions. Take the images in Figure 6 for
example. These images are from the same face, but under very different lighting. The shading differences can make
face recognition difficult even for humans.
In contrast, computer recognition using point cloud data from TOF sensors is largely unaffected by shading, since
illumination is provided by the TOF sensor itself, and the depth measurement is extracted from phase measurement,
not image intensity.
Figure 6: Same face, different shading.
3D Machine Vision
Robust 3D vision overcomes many problems of 2D vision, as the depth measurement can be used to easily separate
foreground from background. This is particularly useful for scene understanding, where the first step is to segment
the subject of interest (foreground) from other parts of the image (background).
Gesture recognition, for example, involves scene understanding. Using distance as a discriminator, a TOF sensor
enables separation of the face, hands, and fingers from the rest of the image, so gesture recognition can be
achieved with high confidence.
Figure 7: Advantages of 3D vision over 2D.
In the next two subsections we will compare the TOF technology with two other 3D vision technologies: stereo vision
and structured-light.
Stereo Vision vs. TOF
Stereo vision generally uses two cameras separated by a distance, in a physical arrangement similar to the human
eyes. Given a point-like object in space, the camera separation will lead to measurable disparity of the object
positions in the two camera images. Using a simple pin-hole camera model, the object position in each image can be
computed, which we will represent them by α and β. With these angles, the depth, z, can be computed.
Figure 8: Stereopsis--depth through disparity measurement.
A major challenge in stereo vision is solving the correspondence problem: giving a point in one image, how to find
the same point in the other camera? Until the correspondence can be established, disparity, and therefore depth,
cannot be accurately determined. Solving the correspondence problem involves complex, computationally intensive
algorithms for feature extraction and matching. Feature extraction and matching also require sufficient intensity
and color variation in the image for robust correlation. This requirement renders stereo vision less effective if
the subject lacks these variations-for example, measuring the distance to a uniformly colored wall. TOF sensing does
not have this limitation because it does not depend on color or texture to measure the distance.
In stereo vision, the depth resolution error is a quadratic function of the distance. By comparison, a TOF sensor,
which works off reflected light, is also sensitive to distance. However, the difference is that, for TOF this
shortcoming is remedied by increasing the illumination energy when necessary; and the intensity information is used
by TOF as a "confidence" metric to maximize accuracy using Kalman filter-like techniques.
Stereo vision has some advantages. The implementation cost is very low, as most common off-the-shelf cameras can be
used. Also, the human-like physical configuration makes stereo vision well-suited for capturing images for intuitive
presentation to humans, so that both humans and machines are looking at the same images.
Structured-Light vs. TOF
Structured-Light works by projecting known patterns onto the subject and inspecting the pattern distortion [4].
Successive projections of coded or phase-shifted patterns are often required to extract a single depth frame, which
leads to a lower frame rate. Low frame rate means the subject must remain relatively still during the projection
sequence to avoid blurring. The reflected pattern is sensitive to optical interference from the environment;
therefore, structured-light tends to be better suited for indoor applications. A major advantage of structured-light
is that it can achieve relatively high spatial (X-Y) resolution by using off-the-shelf DLP projectors and HD color
cameras. Figure 9 shows the structured-light concept.
Figure 9: Structured-light concept.
By comparison, TOF is less sensitive to mechanical alignment and environmental lighting conditions, and is more
mechanically compact. The current TOF technology has lower resolution than today's structured-light, but is rapidly
improving.
The comparison of TOF camera with stereo vision and structured-light is summarized in Table 1. The key takeaway is
that TOF is a cost-effective, mechanically compact depth imaging solution unaffected by varying environmental
illumination and vastly simplifies the figure-ground separation commonly required in scene understanding. This
powerful combination makes TOF sensor well-suited for a wide variety of applications.
Figure 10: Comparison of 3D Imaging Technologies
5. Applications
TOF technology can be applied to applications from automotive to industrial to healthcare, to smart advertising,
gaming and entertainment. A TOF sensor could also serve as an excellent input device to both stationary and portable
computing devices. In automotive, TOF sensors could enable autonomous driving and increased surrounding awareness
for safety. In the industrial segment, TOF sensors could be used as HMI, and for enforcing safety envelopes in
automation cells where humans and robots may need to work in close proximity. In smart advertising, using TOF
sensors as gesture input and human recognition, digital signage could become highly interactive, targeting media
contents to the specific live audience. In healthcare, gesture recognition offers non-contact human-machine
interactions, fostering more sanitary operating environment. The gesturing capability is particularly well-suited
for consumer electronics, particularly in gaming, portable computing, and home entertainment. TOF sensors natural
interface provides an intuitive gaming interface for first-person video games. This same interface could also
replace remote controls, mice and touch screens.
Generally speaking, TOF applications can be categorized into Gesture and Non-Gesture. Gesture applications
emphasize human interactions and speed; while non-gesture applications emphasize measurement accuracy.
Figure 11: TOF technology applies to a wide range of applications.
Gesture Applications
Gesture applications translate human movements (faces, hands, fingers or whole-body) into symbolic directives to
command gaming consoles, smart televisions, or portable computing devices. For examples, channel surfing can be done
by waving of hands, and presentation can be scrolled by using finger flickering. These applications usually require
fast response time, low- to medium-range, centimeter-level accuracy and power consumption.
Figure 13: Gesture recognition using a 3D-TOF sensor and SoftKinetic
iisu® middleware.
Non-Gesture Applications
TOF sensors can be used in non-gesture applications as well. For instance, in automotive, a TOF camera can increase
safety by alerting the driver when it detects people and objects in the vicinity of the car, and in computer
assisted driving. In robotics and automation, TOF sensors can help detect product defects and enforce safety
envelopes required for humans and robots to work in close proximity. With 3D printing rapidly becoming popular and
affordable, TOF cameras can be used to perform 3D scanning to enable "3D copier" capability. In all of these
applications, spatial accuracy is important.
Summary
In this paper, we introduced the TOF technology and the theory of operation. We also compared TOF sensors with 2D
machine vision and other 3D vision technologies and highlighted TOF sensors' differentiating advantages. We also
explored a wide range of applications that TOF sensors enable or enhance. To help readers get started, we introduced
TI 3D-TOF chipset and CDK, as well as third-party software resources.