Homogeneous coordinates

Figure 1: Perspective projection and point at infinity where tracks intersect at the horizon.

All of us are familiar with heterogenous coordinates that describe points in the Cartesian space as a tuple of three numbers\((x,y,z)\).

A point in 3D Cartesian space is represented as:

\[ \mathbf{p}_{\text{cartesian}} = (x, y, z) \]

In computer vision and in computer graphics, we need to work in another space called the projective space where the coordinates of a 3D point is four dimensional and where the additional dimension \(w\) is called the scale. The four coordinates in this space are called homogeneous coordinates and is written as a 4-tuple or a 4-vector:

\[ \mathbf{p}_{\text{homogeneous}} = (x, y, z, w) \]

Where \(w \neq 0\), and:

\[ (x, y, z) = \left(\frac{x}{w}, \frac{y}{w}, \frac{z}{w}\right) \]

Homogeneous coordinates represent a family of equivalent points along a ray in projective space — all scaled versions of the same Cartesian point.

Homogeneous coordinates allow us to:

In this section we show common 2D transformations can also be expressed in homogeneous coordinates:

Rigid Transformation

A rigid transformation preserves lengths and angles — it includes rotation and translation, but no scaling or shearing.

Matrix form:

\[ \mathbf{T}_{\text{rigid}} = \begin{bmatrix} \cos\theta & -\sin\theta & t_x \\ \sin\theta & \cos\theta & t_y \\ 0 & 0 & 1 \end{bmatrix} \]

  • \(\theta\): rotation angle
  • \((t_x, t_y)\): translation

Similarity Transformation

A similarity transformation includes rotation, translation, and uniform scaling. It preserves shape but not necessarily size.

\[ \mathbf{T}_{\text{sim}} = \begin{bmatrix} s \cos\theta & -s \sin\theta & t_x \\ s \sin\theta & s \cos\theta & t_y \\ 0 & 0 & 1 \end{bmatrix} \]

Where \(s\) is the scaling factor.

Affine Transformation

Affine transformations include translation, rotation, scaling, shearing, and combinations. They preserve parallelism of lines but not necessarily lengths or angles.

\[ \mathbf{T}_{\text{affine}} = \begin{bmatrix} a_{11} & a_{12} & t_x \\ a_{21} & a_{22} & t_y \\ 0 & 0 & 1 \end{bmatrix} \]

This is the most general linear 2D transformation with translation.

Example: Affine Transformation

import matplotlib.pyplot as plt
import numpy as np

# Original square: corner at origin, size 1
square = np.array(
    [
        [0, 0, 1],
        [1, 0, 1],
        [1, 1, 1],
        [0, 1, 1],
        [0, 0, 1],  # close the square
    ]
).T


# Define transformations
def apply_transform(matrix, shape):
    return (matrix @ shape).T[:, :2]


# Rigid: rotate 45 deg and translate (1, 1)
theta = np.pi / 4
T_rigid = np.array([[np.cos(theta), -np.sin(theta), 1], [np.sin(theta), np.cos(theta), 1], [0, 0, 1]])

# Similarity: scale by 1.5, rotate 30 deg, translate (0.5, 0.5)
theta_sim = np.pi / 6
s = 1.5
T_similarity = np.array(
    [
        [s * np.cos(theta_sim), -s * np.sin(theta_sim), 0.5],
        [s * np.sin(theta_sim), s * np.cos(theta_sim), 0.5],
        [0, 0, 1],
    ]
)

# Affine: shear and scale
T_affine = np.array([[1.2, 0.5, 1.5], [0.2, 1.0, 1.0], [0, 0, 1]])

# Translation only
T_translation = np.array([[1, 0, 2], [0, 1, 1], [0, 0, 1]])

# Apply transformations
square_rigid = apply_transform(T_rigid, square)
square_similarity = apply_transform(T_similarity, square)
square_affine = apply_transform(T_affine, square)
square_translation = apply_transform(T_translation, square)
square_original = square.T[:, :2]

# Plotting
fig, ax = plt.subplots(figsize=(8, 8))
ax.plot(*square_original.T, label="Original", linewidth=2)
ax.plot(*square_rigid.T, label="Rigid", linestyle="--")
ax.plot(*square_similarity.T, label="Similarity", linestyle="-.")
ax.plot(*square_affine.T, label="Affine", linestyle=":")
ax.plot(*square_translation.T, label="Translation", linestyle="-")

ax.set_aspect("equal")
ax.grid(True)
ax.legend()
ax.set_title("2D Transformations of Unit Square Anchored at Origin")
plt.show()