Homogeneous coordinates

Figure 1: Perspective projection and point at infinity where tracks intersect at the horizon.

All of us are familiar with heterogenous coordinates that describe points in the Cartesian space as a tuple of three numbers\((x,y,z)\).

A point in 3D Cartesian space is represented as:

\[ \mathbf{p}_{\text{cartesian}} = (x, y, z) \]

In computer vision and in computer graphics, we need to work in another space called the projective space where the coordinates of a 3D point is four dimensional and where the additional dimension \(w\) is called the scale. The four coordinates in this space are called homogeneous coordinates and is written as a 4-tuple or a 4-vector:

\[ \mathbf{p}_{\text{homogeneous}} = (x, y, z, w) \]

Where \(w \neq 0\), and:

\[ (x, y, z) = \left(\frac{x}{w}, \frac{y}{w}, \frac{z}{w}\right) \]

Homogeneous coordinates represent a family of equivalent points along a ray in projective space — all scaled versions of the same Cartesian point.

Homogeneous coordinates allow us to:

Represent points at infinity (when \(w = 0\)) — useful for parallel lines in perspective projections.
Encode various transformations (e.g., perspective camera models, 3D projection) as linear matrix operations as shown next.

In this section we show common 2D transformations can also be expressed in homogeneous coordinates:

Rigid Transformation

A rigid transformation preserves lengths and angles — it includes rotation and translation, but no scaling or shearing.

Matrix form:

\[ \mathbf{T}_{\text{rigid}} = \begin{bmatrix} \cos\theta & -\sin\theta & t_x \\ \sin\theta & \cos\theta & t_y \\ 0 & 0 & 1 \end{bmatrix} \]

\(\theta\): rotation angle
\((t_x, t_y)\): translation

Similarity Transformation

A similarity transformation includes rotation, translation, and uniform scaling. It preserves shape but not necessarily size.

\[ \mathbf{T}_{\text{sim}} = \begin{bmatrix} s \cos\theta & -s \sin\theta & t_x \\ s \sin\theta & s \cos\theta & t_y \\ 0 & 0 & 1 \end{bmatrix} \]

Where \(s\) is the scaling factor.

Affine Transformation

Affine transformations include translation, rotation, scaling, shearing, and combinations. They preserve parallelism of lines but not necessarily lengths or angles.

\[ \mathbf{T}_{\text{affine}} = \begin{bmatrix} a_{11} & a_{12} & t_x \\ a_{21} & a_{22} & t_y \\ 0 & 0 & 1 \end{bmatrix} \]

This is the most general linear 2D transformation with translation.

Example: Affine Transformation

import matplotlib.pyplot as plt
import numpy as np

# Original square: corner at origin, size 1
square = np.array(
    [
        [0, 0, 1],
        [1, 0, 1],
        [1, 1, 1],
        [0, 1, 1],
        [0, 0, 1],  # close the square
    ]
).T


# Define transformations
def apply_transform(matrix, shape):
    return (matrix @ shape).T[:, :2]


# Rigid: rotate 45 deg and translate (1, 1)
theta = np.pi / 4
T_rigid = np.array([[np.cos(theta), -np.sin(theta), 1], [np.sin(theta), np.cos(theta), 1], [0, 0, 1]])

# Similarity: scale by 1.5, rotate 30 deg, translate (0.5, 0.5)
theta_sim = np.pi / 6
s = 1.5
T_similarity = np.array(
    [
        [s * np.cos(theta_sim), -s * np.sin(theta_sim), 0.5],
        [s * np.sin(theta_sim), s * np.cos(theta_sim), 0.5],
        [0, 0, 1],
    ]
)

# Affine: shear and scale
T_affine = np.array([[1.2, 0.5, 1.5], [0.2, 1.0, 1.0], [0, 0, 1]])

# Translation only
T_translation = np.array([[1, 0, 2], [0, 1, 1], [0, 0, 1]])

# Apply transformations
square_rigid = apply_transform(T_rigid, square)
square_similarity = apply_transform(T_similarity, square)
square_affine = apply_transform(T_affine, square)
square_translation = apply_transform(T_translation, square)
square_original = square.T[:, :2]

# Plotting
fig, ax = plt.subplots(figsize=(8, 8))
ax.plot(*square_original.T, label="Original", linewidth=2)
ax.plot(*square_rigid.T, label="Rigid", linestyle="--")
ax.plot(*square_similarity.T, label="Similarity", linestyle="-.")
ax.plot(*square_affine.T, label="Affine", linestyle=":")
ax.plot(*square_translation.T, label="Translation", linestyle="-")

ax.set_aspect("equal")
ax.grid(True)
ax.legend()
ax.set_title("2D Transformations of Unit Square Anchored at Origin")
plt.show()