Camera matrix
This article needs additional citations for verification. (July 2010) |
In computer vision a camera matrix or (camera) projection matrix is a Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 3 \times 4} matrix which describes the mapping of a pinhole camera from 3D points in the world to 2D points in an image.
Let Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{x} } be a representation of a 3D point in homogeneous coordinates (a 4-dimensional vector), and let Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{y} } be a representation of the image of this point in the pinhole camera (a 3-dimensional vector). Then the following relation holds
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{y} \sim \mathbf{C} \, \mathbf{x} }
where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C} } is the camera matrix and the Failed to parse (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \,\sim } sign implies that the left and right hand sides are equal except for a multiplication by a non-zero scalar Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k \neq 0} :
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{y} = k \, \mathbf{C} \, \mathbf{x} . }
Since the camera matrix Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C} } is involved in the mapping between elements of two projective spaces, it too can be regarded as a projective element. This means that it has only 11 degrees of freedom since any multiplication by a non-zero scalar results in an equivalent camera matrix.
Derivation
The mapping from the coordinates of a 3D point P to the 2D image coordinates of the point's projection onto the image plane, according to the pinhole camera model, is given by
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} = \frac{f}{x_3} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} }
where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (x_1, x_2, x_3) } are the 3D coordinates of P relative to a camera centered coordinate system, Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (y_1, y_2) } are the resulting image coordinates, and f is the camera's focal length for which we assume f > 0. Furthermore, we also assume that x3 > 0.
To derive the camera matrix, the expression above is rewritten in terms of homogeneous coordinates. Instead of the 2D vector Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (y_1,y_2) } we consider the projective element (a 3D vector) Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{y} = (y_1,y_2,1) } and instead of equality we consider equality up to scaling by a non-zero number, denoted Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \, \sim } . First, we write the homogeneous image coordinates as expressions in the usual 3D coordinates.
Finally, also the 3D coordinates are expressed in a homogeneous representation Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{x} } and this is how the camera matrix appears:
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{pmatrix} y_1 \\ y_2 \\ 1 \end{pmatrix} \sim \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \frac{1}{f} & 0 \end{pmatrix} \, \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ 1 \end{pmatrix} } or Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{y} \sim \mathbf{C} \, \mathbf{x} }
where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C} } is the camera matrix, which here is given by
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \frac{1}{f} & 0 \end{pmatrix} } ,
and the corresponding camera matrix now becomes
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \frac{1}{f} & 0 \end{pmatrix} \sim \begin{pmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \end{pmatrix} }
The last step is a consequence of Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C} } itself being a projective element.
The camera matrix derived here may appear trivial in the sense that it contains very few non-zero elements. This depends to a large extent on the particular coordinate systems which have been chosen for the 3D and 2D points. In practice, however, other forms of camera matrices are common, as will be shown below.
Camera position
The camera matrix derived in the previous section has a null space which is spanned by the vector
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{n} = \begin{pmatrix} 0 \\ 0 \\ 0 \\ 1 \end{pmatrix} }
This is also the homogeneous representation of the 3D point which has coordinates (0,0,0), that is, the "camera center" (aka the entrance pupil; the position of the pinhole of a pinhole camera) is at O. This means that the camera center (and only this point) cannot be mapped to a point in the image plane by the camera (or equivalently, it maps to all points on the image as every ray on the image goes through this point).
For any other 3D point with Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_3 = 0} , the result Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{y} \sim\mathbf{C}\,\mathbf{x} } is well-defined and has the form Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{y} = (y_1\,y_2\,0)^\top } . This corresponds to a point at infinity in the projective image plane (even though, if the image plane is taken to be a Euclidean plane, no corresponding intersection point exists).
Normalized camera matrix and normalized image coordinates
The camera matrix derived above can be simplified even further if we assume that f = 1:
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C}_{0} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \end{pmatrix} = \left ( \begin{array}{c|c} \mathbf{I} & \mathbf{0} \end{array} \right ) }
where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{I} } here denotes a Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 3 \times 3 } identity matrix. Note that matrix Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C} } here is divided into a concatenation of a Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 3 \times 3 } matrix and a 3-dimensional vector. The camera matrix Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C}_{0} } is sometimes referred to as a canonical form.
So far all points in the 3D world have been represented in a camera centered coordinate system, that is, a coordinate system which has its origin at the camera center (the location of the pinhole of a pinhole camera). In practice however, the 3D points may be represented in terms of coordinates relative to an arbitrary coordinate system (X1', X2', X3'). Assuming that the camera coordinate axes (X1, X2, X3) and the axes (X1', X2', X3') are of Euclidean type (orthogonal and isotropic), there is a unique Euclidean 3D transformation (rotation and translation) between the two coordinate systems. In other words, the camera is not necessarily at the origin looking along the z axis.
The two operations of rotation and translation of 3D coordinates can be represented as the two Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 4 \times 4 } matrices
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \left ( \begin{array}{c|c} \mathbf{R} & \mathbf{0} \\ \hline \mathbf{0} & 1 \end{array} \right ) } and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \left ( \begin{array}{c|c} \mathbf{I} & \mathbf{t} \\ \hline \mathbf{0} & 1 \end{array} \right ) }
where is a Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 3 \times 3 } rotation matrix and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{t} } is a 3-dimensional translation vector. When the first matrix is multiplied onto the homogeneous representation of a 3D point, the result is the homogeneous representation of the rotated point, and the second matrix performs instead a translation. Performing the two operations in sequence, i.e. first the rotation and then the translation (with translation vector given in the already rotated coordinate system), gives a combined rotation and translation matrix
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \left ( \begin{array}{c|c} \mathbf{R} & \mathbf{t} \\ \hline \mathbf{0} & 1 \end{array} \right ) }
Assuming that Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{R} } and Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{t} } are precisely the rotation and translations which relate the two coordinate system (X1,X2,X3) and (X1',X2',X3') above, this implies that
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{x} = \left ( \begin{array}{c|c} \mathbf{R} & \mathbf{t} \\ \hline \mathbf{0} & 1 \end{array} \right ) \mathbf{x}' }
where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{x}' } is the homogeneous representation of the point P in the coordinate system (X1',X2',X3').
Assuming also that the camera matrix is given by Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C}_{0} } , the mapping from the coordinates in the (X1,X2,X3) system to homogeneous image coordinates becomes
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{y} \sim \mathbf{C}_{0} \, \mathbf{x} = \left ( \begin{array}{c|c} \mathbf{I} & \mathbf{0} \end{array} \right ) \, \left ( \begin{array}{c|c} \mathbf{R} & \mathbf{t} \\ \hline \mathbf{0} & 1 \end{array} \right ) \mathbf{x}' = \left ( \begin{array}{c|c} \mathbf{R} & \mathbf{t} \end{array} \right ) \, \mathbf{x}' }
Consequently, the camera matrix which relates points in the coordinate system (X1',X2',X3') to image coordinates is
- Failed to parse (Conversion error. Server ("https://wikimedia.org/api/rest_") reported: "Cannot get mml. Server problem."): {\displaystyle \mathbf {C} _{N}=\left({\begin{array}{c|c}\mathbf {R} &\mathbf {t} \end{array}}\right)}
a concatenation of a 3D rotation matrix and a 3-dimensional translation vector.
This type of camera matrix is referred to as a normalized camera matrix, it assumes focal length = 1 and that image coordinates are measured in a coordinate system where the origin is located at the intersection between axis X3 and the image plane and has the same units as the 3D coordinate system. The resulting image coordinates are referred to as normalized image coordinates.
The camera position
Again, the null space of the normalized camera matrix, Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C}_{N} } described above, is spanned by the 4-dimensional vector
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{n} = \begin{pmatrix} -\mathbf{R}^{-1} \, \mathbf{t} \\ 1 \end{pmatrix} = \begin{pmatrix} \tilde{\mathbf{n}} \\ 1 \end{pmatrix}}
This is also, again, the coordinates of the camera center, now relative to the (X1',X2',X3') system. This can be seen by applying first the rotation and then the translation to the 3-dimensional vector Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \tilde{\mathbf{n}} } and the result is the homogeneous representation of 3D coordinates (0,0,0).
This implies that the camera center (in its homogeneous representation) lies in the null space of the camera matrix, provided that it is represented in terms of 3D coordinates relative to the same coordinate system as the camera matrix refers to.
The normalized camera matrix Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C}_{N} } can now be written as
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C}_{N} = \mathbf{R} \, \left ( \begin{array}{c|c} \mathbf{I} & \mathbf{R}^{-1} \, \mathbf{t} \end{array} \right ) = \mathbf{R} \, \left ( \begin{array}{c|c} \mathbf{I} & -\tilde{\mathbf{n}} \end{array} \right ) }
where Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \tilde{\mathbf{n}} } is the 3D coordinates of the camera relative to the (X1',X2',X3') system.
General camera matrix
Given the mapping produced by a normalized camera matrix, the resulting normalized image coordinates can be transformed by means of an arbitrary 2D homography. This includes 2D translations and rotations as well as scaling (isotropic and anisotropic) but also general 2D perspective transformations. Such a transformation can be represented as a Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 3 \times 3 } matrix Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{H} } which maps the homogeneous normalized image coordinates Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{y} } to the homogeneous transformed image coordinates Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{y}' } :
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{y}' = \mathbf{H} \, \mathbf{y} }
Inserting the above expression for the normalized image coordinates in terms of the 3D coordinates gives
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{y}' = \mathbf{H} \, \mathbf{C}_{N} \, \mathbf{x}' }
This produces the most general form of camera matrix
- Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf{C} = \mathbf{H} \, \mathbf{C}_{N} = \mathbf{H} \, \left ( \begin{array}{c|c} \mathbf{R} & \mathbf{t} \end{array} \right ) }
See also
References
- Richard Hartley and Andrew Zisserman (2003). Multiple View Geometry in computer vision. Cambridge University Press. ISBN 0-521-54051-8.