Imagine a self-driving car navigating a busy street. To perceive its surroundings accurately, it relies on multiple sensors:
LiDAR: Provides precise 3D depth measurements.
Camera: Captures rich visual details, such as colors, textures, and object appearances.
However, these sensors operate in different coordinate spaces. LiDAR measures distances in 3D space relative to its own sensor frame, while the camera captures a 2D image from its own perspective. To combine their data and create a complete understanding of the environment, we must align them.
In this tutorial, we will project 3D LiDAR points from an OT128 LiDAR onto a 2D image captured by a HIK MV-CA023-10GC camera. This process is fundamental in:
✔ Autonomous driving (object detection, sensor fusion)
✔ Robotics (perception, navigation)
✔ Machine vision (augmented reality, quality inspection)
By the end of this tutorial, you'll be able to:
✔ Load and process LiDAR point cloud data.
✔ Load the corresponding camera image.
✔ Apply intrinsic and extrinsic calibration to align LiDAR and camera data.
✔ Transform LiDAR points into the camera coordinate frame.
✔ Project 3D points onto the 2D image.
✔ Visualize the projected points on the image.
✔ Save the final image with projected points.
Before we start, make sure you have the following installed:
Python 3
Open3D (pip install open3d)
OpenCV (pip install opencv-python)
NumPy (pip install numpy)
Before we can merge information from LiDAR and the camera, we need to understand what each sensor provides:
LiDAR Point Cloud: A collection of 3D points (X, Y, Z) that form a detailed depth map of the surroundings. Think of it as thousands of tiny dots floating in space, each representing a real-world object. However, LiDAR doesn’t capture color—it only sees structure.
Camera Image: A 2D picture of the scene, just like what our eyes see. It provides rich color and texture but lacks depth information.
Since these two sensors operate in different ways, our goal is to combine the 3D depth from LiDAR with the visual details from the camera. But before we can do that, we must first load the data into our Python environment.
Let’s start by reading the LiDAR point cloud and the camera image:
import numpy as np
import open3d as o3d
import cv2
# Load Point Cloud
pcd_path = r'/hesaiLidarPcd/001.pcd'
ptCloud = o3d.io.read_point_cloud(pcd_path)
# Load Image
image_file_path = r'/hikCamImg/001.png'
I = cv2.imread(image_file_path)
Let’s carefully analyze each line of the code to ensure a clear understanding
The first three lines of the code import the necessary libraries: NumPy, Open3D, and OpenCV. NumPy (import numpy as np) is a fundamental library for numerical computing in Python. It provides powerful tools for handling arrays, performing mathematical operations, and managing data efficiently, which is essential for working with large datasets like point clouds and images. Open3D (import open3d as o3d) is a specialized library designed for handling 3D data, particularly LiDAR point clouds. It allows us to read, visualize, and manipulate point cloud data, making it a crucial tool for LiDAR-based applications. OpenCV (import cv2) is a widely used computer vision library that helps process and analyze images. Since we are working with both LiDAR point clouds (which provide depth information in 3D) and camera images (which provide visual details in 2D), these three libraries work together to enable seamless sensor fusion, allowing us to combine depth and visual data for advanced perception tasks.
Next, we define the file path of the LiDAR point cloud with the variable pcd_path. The point cloud file is stored in a .pcd (Point Cloud Data) format, which is commonly used for storing LiDAR data. The function o3d.io.read_point_cloud(pcd_path) reads this file and loads the data into memory as a point cloud object, which we store in the variable ptCloud. This object contains thousands (or even millions) of individual points, each with X, Y, and Z coordinates representing the shape of the environment around the LiDAR sensor. You can think of a point cloud as a collection of tiny dots floating in space, mapping out surfaces like buildings, roads, and obstacles.
Similarly, for the camera image, we define its file path using the variable image_file_path, which points to an image file in .png format. The function cv2.imread(image_file_path) is used to read and load the image into memory. The loaded image is stored in the variable I. OpenCV loads images as numerical matrices, where each pixel has color values (typically in BGR format instead of RGB). This allows us to perform various image processing operations later, such as overlaying LiDAR points onto the image.
At this stage, we have successfully loaded both the LiDAR point cloud, which provides depth information, and the camera image, which provides visual context. This is the foundation for sensor fusion, where we combine data from multiple sensors to create a richer understanding of the environment. The next step will be to define the camera's intrinsic parameters, which will help us correctly map 3D LiDAR points onto the 2D image.
Before we can project LiDAR points onto an image, we need to understand how the camera captures the world. Every camera follows a set of internal parameters, known as intrinsic parameters, which define how 3D points are mapped onto a 2D image.
These include:
Focal Length (fx, fy): Determines how much the camera magnifies objects along the x and y axes.
Principal Point (cx, cy): Represents the center of the image.
Skew Factor: Usually zero, as most cameras have rectangular pixels.
These parameters are stored in an intrinsic matrix, which allows us to transform real-world 3D coordinates into 2D image pixels. Now, let's define them in code.
# Intrinsic Camera Parameters
fx, fy = 1063.782, 1055.446 # Focal lengths
cx, cy = 964.992, 601.665 # Optical center (principal point)
skew=0 # Skew factor (assumed zero)
# Intrinsic matrix
intrinsic_matrix = np.array([[fx, skew, cx],
[0, fy, cy],
[0, 0, 1]])
In this section of the code, we define the intrinsic parameters of the camera, which are essential for projecting real-world 3D points onto a 2D image plane. These parameters help describe how the camera captures the world and how it maps 3D coordinates into the image.
First, we define the focal lengths (fx and fy). These values determine how much the camera scales the scene along the x and y axes. The focal length is measured in pixels and is related to the physical properties of the camera lens. A higher focal length means a more zoomed-in view, while a lower value results in a wider field of view.
Next, we specify the principal point (cx and cy). This represents the optical center of the image, meaning the point where the camera lens ideally focuses light onto the image sensor. In a perfectly calibrated camera, this would be the exact center of the image. However, in real-world cameras, slight misalignments can occur, which is why these values may not always be perfectly centered.
We also define the skew factor (skew = 0). The skew factor is used when the camera sensor’s pixel grid is not perfectly rectangular but slightly slanted. In most modern cameras, the pixels are arranged in a perfect grid, so the skew is usually zero.
Finally, we construct the intrinsic matrix, which combines these intrinsic parameters into a single 3×3 matrix:
This matrix is essential for projecting 3D points into 2D image coordinates. The first row controls how x-coordinates are scaled and shifted, the second row does the same for y-coordinates, and the third row ensures that we are working with homogeneous coordinates (a common representation in computer vision). This intrinsic matrix will be used later to project LiDAR points onto the camera image.
Camera lenses are not perfect—they introduce distortions that affect how images are captured. This distortion must be corrected to accurately align LiDAR data with the camera image.
There are two main types of distortion:
Radial Distortion: Causes straight lines to appear curved.
Tangential Distortion: Occurs due to slight misalignment of the lens, causing image warping.
To correct these effects, we define distortion coefficients, which adjust the image and ensure accurate projections. Now, let's implement this in code.
k1, k2 = -0.1279, 0.0466 # Radial distortion coefficients
p1, p2 = 0.0020, -0.0008 # Tangential distortion coefficients
In this section, we define the distortion coefficients, which help correct the imperfections in how a real-world camera captures an image. In an ideal camera, straight lines in the real world remain straight in the image. However, due to the shape and construction of camera lenses, images often suffer from distortion, which must be corrected for accurate sensor fusion.
First, we define radial distortion coefficients (k1 and k2). Radial distortion occurs when light rays bend as they pass through the camera lens, causing straight lines to appear curved. This effect is most noticeable in wide-angle lenses and is often referred to as the "fish-eye" effect. The values of k1 and k2 help mathematically model this distortion so that we can later correct it.
Next, we define tangential distortion coefficients (p1 and p2). Tangential distortion happens when the lens is slightly misaligned with the camera sensor, causing some parts of the image to appear shifted or stretched. These coefficients help compensate for any misalignment in the lens.
By defining these distortion coefficients, we will later be able to undistort the camera image, ensuring that objects are positioned correctly and that LiDAR points align accurately when projected onto the image.
Since the LiDAR and camera are mounted separately, we need to define their relative position and orientation. This is done using:
Translation Vector (T): Specifies how far the LiDAR is from the camera.
Rotation Matrix (R): Aligns the LiDAR’s coordinate frame with the camera’s.
These extrinsic parameters allow us to accurately map 3D LiDAR points onto the 2D camera image. Now, let’s implement this in code.
# Extrinsic Parameters (Rotation and Translation)
tx, ty, tz = -0.000354, -0.151385, -0.093420
translation = np.array([tx, ty, tz]).reshape(3, 1)
# Quaternion to Rotation Matrix Conversion
w, x, y, z = 0.000121, 0.003551, -0.707062, 0.707143
R = np.array([
[1 - 2*(y**2 + z**2), 2*(x*y - z*w), 2*(x*z + y*w)],
[2*(x*y + z*w), 1 - 2*(x**2 + z**2), 2*(y*z - x*w)],
[2*(x*z - y*w), 2*(y*z + x*w), 1 - 2*(x**2 + y**2)]
])
# Transformation Matrix (Combining Rotation and Translation)
tform_matrix = np.eye(4)
tform_matrix[:3, :3] = R
tform_matrix[:3, 3] = translation.flatten()
Now that we have defined the intrinsic parameters of the camera, we need to understand how the LiDAR and camera are positioned relative to each other. Since these two sensors are mounted separately, they have different coordinate systems. Extrinsic parameters help us describe the spatial relationship between the LiDAR and the camera, enabling us to correctly map LiDAR points onto the camera image.
We begin by defining the translation vector, represented as:
Translation = np.array([-0.00035, -0.15138, -0.09342]).reshape(3, 1)
The translation vector specifies the displacement of the LiDAR sensor relative to the camera in meters. This means that the LiDAR is shifted slightly along the x, y, and z axes with respect to the camera. The .reshape(3,1) ensures that the translation vector is structured as a 3x1 matrix, which is needed for later matrix operations.
Next, we define the rotation matrix using a quaternion representation:
w, x, y, z = 0.00012, 0.00355, -0.70706, 0.70714
Quaternions (w, x, y, z) are a mathematical way to represent rotations in 3D space. They provide a more stable and computationally efficient alternative to Euler angles, avoiding issues like gimbal lock. Using the given quaternion values, we construct a rotation matrix (R), which will be used to align the LiDAR’s coordinate system with the camera’s coordinate system:
R = np.array([
[1 - 2*(y**2 + z**2), 2*(x*y - z*w), 2*(x*z + y*w)],
[2*(x*y + z*w), 1 - 2*(x**2 + z**2), 2*(y*z - x*w)],
[2*(x*z - y*w), 2*(y*z + x*w), 1 - 2*(x**2 + y**2)]
])
This 3x3 rotation matrix is derived from the quaternion and describes how the LiDAR's orientation needs to be adjusted to match the camera's frame of reference. Each element in the matrix is a function of the quaternion values, following standard quaternion-to-matrix conversion formulas.
Now that we have both the translation vector and rotation matrix, we construct the transformation matrix, which combines both:
tform_matrix = np.eye(4)
tform_matrix[:3, :3] = R
tform_matrix[:3, 3] = Translation.flatten()
Here, np.eye(4) initializes a 4x4 identity matrix, which serves as the base for our transformation matrix. The upper-left 3x3 submatrix is replaced with R, ensuring that the rotation is applied when transforming points. The last column (excluding the final row) is set to the flattened translation vector, ensuring that the LiDAR points are also shifted correctly.
This transformation matrix (tform_matrix) is essential because it allows us to convert LiDAR points from their own coordinate system into the camera's coordinate system. This step is a crucial part of sensor fusion, as it ensures that both sensors "see" the world from a common reference point.
After transformation, we project the 3D points into 2D.
To align LiDAR data with the camera image, we:
1️⃣ Transform 3D LiDAR points into the camera’s coordinate frame using extrinsics.
2️⃣ Apply the intrinsic matrix to project them onto the 2D image plane.
3️⃣ Normalize to get pixel coordinates for visualization.
This step bridges 3D perception with 2D imaging, allowing LiDAR data to be overlaid on the camera feed.
# Extract Point Cloud Data
points = np.asarray(ptCloud.points)
num_points = points.shape[0]
projected_points = np.zeros((num_points, 2))
# Project 3D Points onto the 2D Image
for i in range(num_points):
lidar_point = np.append(points[i, :], 1) # Convert to homogeneous coordinates
transformed_point = tform_matrix @ lidar_point # Apply transformation
X, Y, Z = transformed_point[:3] # Extract transformed coordinates
if Z > 0: # Ensure points are in front of the camera
u = (fx * X / Z) + cx # Projected x coordinate
v = (fy * Y / Z) + cy # Projected y coordinate
projected_points[i, :] = [u, v]
else:
projected_points[i, :] = [np.nan, np.nan] # Mark invalid points # Ignore points behind the camera
# Remove Invalid Points for Display
valid_points = projected_points[~np.isnan(projected_points[:, 0])]
After transforming the LiDAR points from the LiDAR coordinate system into the camera coordinate system, the next step is to project these 3D points onto the 2D image plane. This is crucial because we want to visualize where the LiDAR points appear in the camera's field of view.
First, we extract the 3D LiDAR points from the point cloud:
points = np.asarray(ptCloud.points)
Here, ptCloud.points contains all the 3D points captured by the LiDAR sensor. The function np.asarray() converts it into a NumPy array, allowing us to efficiently perform matrix operations.
num_points = points.shape[0]
projected_points = np.zeros((num_points, 2))
We determine the number of LiDAR points using points.shape[0], which tells us how many rows (i.e., individual points) exist in the dataset. We then initialize an empty array, projected_points, with the same number of rows but only two columns (for 2D pixel coordinates). This array will store the projected (u, v) coordinates of each 3D point.
Now, we loop through each LiDAR point and project it onto the image:
for i in range(num_points):
lidar_point = np.append(points[i, :], 1) # Convert to homogeneous coordinates
For every LiDAR point, we append a 1 to convert it into homogeneous coordinates. Homogeneous coordinates allow us to apply transformation matrices easily. Each LiDAR point originally has (X, Y, Z), and by adding 1, we get (X, Y, Z, 1), which makes it compatible with the 4×4 transformation matrix.
transformed_point = tform_matrix @ lidar_point # Apply transformation
Here, we apply the previously computed transformation matrix (tform_matrix) to the LiDAR point. This aligns the LiDAR's coordinate system with the camera's coordinate system, ensuring that the LiDAR points are correctly positioned relative to the camera's viewpoint.
X, Y, Z = transformed_point[:3]
After applying the transformation, we extract the X, Y, and Z coordinates from the transformed point. These represent the 3D location of the LiDAR point in the camera's coordinate system.
Before projecting the point onto the image, we ensure it is in front of the camera:
if Z > 0: # Ensure point is in front of the camera
This check ensures that we only process points with a positive Z value, meaning they are in front of the camera. If Z ≤ 0, the point is behind the camera and should be ignored.
Now, we perform the actual projection from 3D to 2D using the intrinsic camera parameters:
u = (fx * X / Z) + cx # Projected x coordinate
v = (fy * Y / Z) + cy # Projected y coordinate
Here, we use the standard pinhole camera model for projection.
The u coordinate is computed for horizontal pixel position
The v coordinate is computed for vertical pixel position
This formula scales the X and Y values based on the focal lengths (fx and fy) and the depth (Z), then shifts the coordinates by the principal point (cx, cy) to align with the camera's center.
After computing the projected coordinates, we store them:
projected_points[i, :] = [u, v]
This assigns the computed (u, v) pixel location of the LiDAR point to the projected_points array.
If the point was behind the camera (Z ≤ 0), we ignore it by setting its values to NaN:
else:
projected_points[i, :] = [np.nan, np.nan] # Ignore points behind the camera
This ensures that invalid points do not interfere with further processing.
Finally, we remove all invalid (NaN) points:
valid_points = projected_points[~np.isnan(projected_points[:, 0])]
Using ~np.isnan(projected_points[:, 0]), we create a mask that keeps only valid u values (i.e., those not equal to NaN). This effectively filters out all LiDAR points that were behind the camera.
SUMMARY: Thus in this step, we transform the LiDAR points into the camera’s coordinate frame and then project them onto the image plane using the camera’s intrinsic parameters. The resulting valid_points array contains the 2D pixel locations of LiDAR points, which can now be plotted on the camera image to visualize how 3D LiDAR data aligns with 2D images.
We visualize the results by drawing projected points onto the image.
To visualize the projection, we draw each LiDAR point as a red dot on the image using OpenCV. The image updates continuously until the user exits.
To see the projection, we overlay the LiDAR points as red dots on the image using OpenCV. The display updates in a loop until the user exits.
# Define Output File Path
output_path = "/home/sum/Downloads/camLidarProjection1-main/result/undistorted/16-59-15.png"
# Overlay Projected Points on Image and Display
while True:
display_image = I.copy() # Create a fresh copy of the original image to avoid modifying it permanently.
for pt in valid_points:
# Draw a red dot at each projected LiDAR point position on the image.
# (int(pt[0]), int(pt[1])) represents the pixel coordinates in the image.
# radius=1 ensures the dot is small, and thickness=-1 makes it filled.
cv2.circle(display_image, (int(pt[0]), int(pt[1])), radius=1, color=(0, 0, 255), thickness=-1)
cv2.imshow("Projected Points", display_image)
cv2.imwrite(output_path, display_image) # Save result
print(f"Image saved at {output_path}")
if cv2.waitKey(1) == ord('q'): # Press 'q' to exit
break
cv2.destroyAllWindows()
Now that we have projected the 3D LiDAR points onto the 2D image, the next step is to visualize them by overlaying the points on the camera image using OpenCV. This helps us check if the LiDAR points align correctly with the camera’s view.
We start with a continuous loop: while True:
This loop runs indefinitely, continuously updating and displaying the image with the projected points. The loop only stops when the user presses the 'q' key to exit.
Next, we ensure that the original image remains unchanged by creating a copy of it:
display_image = I.copy()
This ensures that every iteration of the loop starts with a fresh copy of the image before overlaying the LiDAR points.
Now, we iterate over each valid projected LiDAR point and draw it on the image:
for pt in valid_points:
This loops through all valid (u, v) pixel coordinates of LiDAR points stored in valid_points.
For each projected point, we use OpenCV’s cv2.circle() function to draw a small red dot on the image:
cv2.circle(display_image, (int(pt[0]), int(pt[1])), radius=1, color=(0, 0, 255), thickness=-1)
Here’s what this function does:
Converts floating-point coordinates to integers using int(pt[0]) and int(pt[1]), as OpenCV requires integer pixel values.
Uses color=(0, 0, 255), which represents red in OpenCV’s BGR format.
Sets thickness=-1 to make the circle completely filled rather than just an outline.
Once all points are drawn, we display the updated image:
cv2.imshow("Projected Points", display_image)
This function opens a window titled "Projected Points" and continuously updates it with the latest image containing the projected LiDAR points.
To save the processed image, we use:
cv2.imwrite(output_path, display_image)
This writes the updated image to the specified output_path. A message print(f"Image saved at {output_path}") is also printed to confirm that the image has been saved.
To check if the user wants to exit, we use:
key = cv2.waitKey(1)
This function waits for 1 millisecond for a key press. If the user presses 'q', detected by if key == ord('q'), the loop breaks, stopping the visualization.
Finally, once the loop exits, we clean up by closing all OpenCV windows using:
cv2.destroyAllWindows()
This ensures that no unnecessary windows remain open and frees up system resources.
In this tutorial, we successfully projected 3D LiDAR points onto a 2D image using Open3D, OpenCV, and NumPy, demonstrating a key step in sensor fusion. By leveraging intrinsic and extrinsic calibrations, we integrated LiDAR depth information with camera visuals—crucial for autonomous navigation, object detection, and environment mapping.
Next, we’ll refine the projection by correcting lens distortion to improve accuracy and enhance the alignment between LiDAR and camera data.
Yongin-si, South Korea
sumairamanzoorpk@gmail.com