import matplotlib.pyplot as plt
import numpy as np
# Original square: corner at origin, size 1
= np.array(
square
[0, 0, 1],
[1, 0, 1],
[1, 1, 1],
[0, 1, 1],
[0, 0, 1], # close the square
[
]
).T
# Define transformations
def apply_transform(matrix, shape):
return (matrix @ shape).T[:, :2]
# Rigid: rotate 45 deg and translate (1, 1)
= np.pi / 4
theta = np.array([[np.cos(theta), -np.sin(theta), 1], [np.sin(theta), np.cos(theta), 1], [0, 0, 1]])
T_rigid
# Similarity: scale by 1.5, rotate 30 deg, translate (0.5, 0.5)
= np.pi / 6
theta_sim = 1.5
s = np.array(
T_similarity
[* np.cos(theta_sim), -s * np.sin(theta_sim), 0.5],
[s * np.sin(theta_sim), s * np.cos(theta_sim), 0.5],
[s 0, 0, 1],
[
]
)
# Affine: shear and scale
= np.array([[1.2, 0.5, 1.5], [0.2, 1.0, 1.0], [0, 0, 1]])
T_affine
# Translation only
= np.array([[1, 0, 2], [0, 1, 1], [0, 0, 1]])
T_translation
# Apply transformations
= apply_transform(T_rigid, square)
square_rigid = apply_transform(T_similarity, square)
square_similarity = apply_transform(T_affine, square)
square_affine = apply_transform(T_translation, square)
square_translation = square.T[:, :2]
square_original
# Plotting
= plt.subplots(figsize=(8, 8))
fig, ax *square_original.T, label="Original", linewidth=2)
ax.plot(*square_rigid.T, label="Rigid", linestyle="--")
ax.plot(*square_similarity.T, label="Similarity", linestyle="-.")
ax.plot(*square_affine.T, label="Affine", linestyle=":")
ax.plot(*square_translation.T, label="Translation", linestyle="-")
ax.plot(
"equal")
ax.set_aspect(True)
ax.grid(
ax.legend()"2D Transformations of Unit Square Anchored at Origin")
ax.set_title( plt.show()
Homogeneous coordinates

All of us are familiar with heterogenous coordinates that describe points in the Cartesian space as a tuple of three numbers\((x,y,z)\).
A point in 3D Cartesian space is represented as:
\[ \mathbf{p}_{\text{cartesian}} = (x, y, z) \]
In computer vision and in computer graphics, we need to work in another space called the projective space where the coordinates of a 3D point is four dimensional and where the additional dimension \(w\) is called the scale. The four coordinates in this space are called homogeneous coordinates and is written as a 4-tuple or a 4-vector:
\[ \mathbf{p}_{\text{homogeneous}} = (x, y, z, w) \]
Where \(w \neq 0\), and:
\[ (x, y, z) = \left(\frac{x}{w}, \frac{y}{w}, \frac{z}{w}\right) \]
Homogeneous coordinates represent a family of equivalent points along a ray in projective space — all scaled versions of the same Cartesian point.
Homogeneous coordinates allow us to:
- Represent points at infinity (when \(w = 0\)) — useful for parallel lines in perspective projections.
- Encode various transformations (e.g., perspective camera models, 3D projection) as linear matrix operations as shown next.
In this section we show common 2D transformations can also be expressed in homogeneous coordinates:
Rigid Transformation
A rigid transformation preserves lengths and angles — it includes rotation and translation, but no scaling or shearing.
Matrix form:
\[ \mathbf{T}_{\text{rigid}} = \begin{bmatrix} \cos\theta & -\sin\theta & t_x \\ \sin\theta & \cos\theta & t_y \\ 0 & 0 & 1 \end{bmatrix} \]
- \(\theta\): rotation angle
- \((t_x, t_y)\): translation
Similarity Transformation
A similarity transformation includes rotation, translation, and uniform scaling. It preserves shape but not necessarily size.
\[ \mathbf{T}_{\text{sim}} = \begin{bmatrix} s \cos\theta & -s \sin\theta & t_x \\ s \sin\theta & s \cos\theta & t_y \\ 0 & 0 & 1 \end{bmatrix} \]
Where \(s\) is the scaling factor.
Affine Transformation
Affine transformations include translation, rotation, scaling, shearing, and combinations. They preserve parallelism of lines but not necessarily lengths or angles.
\[ \mathbf{T}_{\text{affine}} = \begin{bmatrix} a_{11} & a_{12} & t_x \\ a_{21} & a_{22} & t_y \\ 0 & 0 & 1 \end{bmatrix} \]
This is the most general linear 2D transformation with translation.