Pinhole Camera Model

Camera Model Fundamentals
Pinhole Camera Model
The functions in this section use a so-called pinhole camera model. The view of a scene is obtained by projecting a scene’s 3D point \(P_w\) into the image plane using a perspective transformation which forms the corresponding pixel \(p\). Both \(P_w\) and \(p\) are represented in homogeneous coordinates, i.e. as 3D and 2D homogeneous vector respectively.
The distortion-free projective transformation given by a pinhole camera model is:
\[\lambda \; p = K \begin{bmatrix} R|t \end{bmatrix} P_w\]
where:
\(P_w\) is a 3D point expressed with respect to the world coordinate system
\(p\) is a 2D pixel in the image plane
\(K\) is the camera intrinsic matrix
\(R\) and \(t\) are the rotation and translation that describe the change of coordinates from world to camera coordinate systems
\(\lambda\) is the projective transformation’s arbitrary scaling
Camera Intrinsic Matrix
The camera intrinsic matrix \(K\)projects 3D points given in the camera coordinate system to 2D pixel coordinates:
\[p = K P_c\]
The camera intrinsic matrix \(K\) is composed of the focal lengths \(f_x\) and \(f_y\), which are expressed in pixel units, and the principal point \((c_x, c_y)\), that is usually close to the image center:
\[K = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix}\]
and thus:
\[\lambda \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} X_c \\ Y_c \\ Z_c \end{bmatrix}\]
Coordinate Transformations
The joint rotation-translation matrix \([R|t]\) is the matrix product of a projective transformation and a homogeneous transformation. The 3-by-4 projective transformation maps 3D points represented in camera coordinates to 2D points in the image plane and represented in normalized camera coordinates \(x' = X_c / Z_c\) and \(y' = Y_c / Z_c\).
The homogeneous transformation is encoded by the extrinsic parameters \(R\) and \(t\) and represents the change of basis from world coordinate system \(w\) to the camera coordinate system \(c\):
\[P_c = \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix} P_w\]
This gives us the complete transformation:
\[\lambda \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} = \begin{bmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & r_{13} & t_x \\ r_{21} & r_{22} & r_{23} & t_y \\ r_{31} & r_{32} & r_{33} & t_z \end{bmatrix} \begin{bmatrix} X_w \\ Y_w \\ Z_w \\ 1 \end{bmatrix}\]
If \(Z_c \neq 0\), this is equivalent to:
\[\begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} f_x X_c/Z_c + c_x \\ f_y Y_c/Z_c + c_y \end{bmatrix}\]
Lens Distortion Model
Real lenses introduce distortions (radial and tangential).

The extended camera model accounts for this:
\[\begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} f_x x'' + c_x \\ f_y y'' + c_y \end{bmatrix}\]
where:
\[\begin{bmatrix} x'' \\ y'' \end{bmatrix} = \begin{bmatrix} x' \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + 2 p_1 x' y' + p_2(r^2 + 2 x'^2) + s_1 r^2 + s_2 r^4 \\ y' \frac{1 + k_1 r^2 + k_2 r^4 + k_3 r^6}{1 + k_4 r^2 + k_5 r^4 + k_6 r^6} + p_1 (r^2 + 2 y'^2) + 2 p_2 x' y' + s_3 r^2 + s_4 r^4 \end{bmatrix}\]
with \(r^2 = x'^2 + y'^2\) and \(\begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} X_c/Z_c \\ Y_c/Z_c \end{bmatrix}\) if \(Z_c \neq 0\).
Distortion Parameters:
Radial coefficients: \(k_1\), \(k_2\), \(k_3\), \(k_4\), \(k_5\), \(k_6\)
Tangential coefficients: \(p_1\), \(p_2\)
Thin prism coefficients: \(s_1\), \(s_2\), \(s_3\), \(s_4\)
The distortion coefficients are passed as:
\[(k_1, k_2, p_1, p_2[, k_3[, k_4, k_5, k_6 [, s_1, s_2, s_3, s_4[, \tau_x, \tau_y]]]])\]
Types of Distortion:
Barrel distortion: \((1 + k_1 r^2 + k_2 r^4 + k_3 r^6)\) monotonically decreasing
Pincushion distortion: \((1 + k_1 r^2 + k_2 r^4 + k_3 r^6)\) monotonically increasing