Introduction
In the world of autonomous navigation, robotics, and machine vision, combining data from different sensors is a crucial step to build systems that can "see" and understand their environment. Two of the most important sensors used in these fields are LiDAR and cameras.
LiDAR (Light Detection and Ranging) provides highly accurate 3D depth information, creating a detailed map of the surroundings in the form of point clouds. Think of it as a way to measure distances and shapes in 3D space.
Cameras, on the other hand, capture rich 2D images with color and texture details, which help in recognizing objects, patterns, and scenes.
While both sensors are powerful on their own, combining their data can unlock even greater potential. For example, by projecting LiDAR’s 3D points onto a camera’s 2D image, we can create a unified representation that shows both the depth and visual appearance of the environment. This process is called LiDAR-Camera projection, and it’s essential for tasks like object detection, 3D mapping, and autonomous driving.
In this tutorial, we’ll focus on projecting 3D points from an OT128 LiDAR sensor onto images captured by a HIK MV-CA023-10GC camera. A key challenge in this process is dealing with lens distortion—a common issue where the camera’s lens bends light in a way that distorts the image. To handle this, we’ll use two different implementations of the projection process based on flexibility: one using a static approach and the other using a dynamic approach.
The main difference between the two codes lies in how they handle camera calibration parameters:
StaticCalibrationCode uses hardcoded values for the camera’s intrinsic (e.g., focal length, optical center) and extrinsic (e.g., rotation, translation) parameters. While this approach is simple, it’s not very flexible—any change in the camera setup or calibration requires manually updating the code.
DynamicCalibrationCode on the other hand, dynamically loads these parameters from a JSON file. This makes the code more adaptable and easier to use in different scenarios, as you can update the calibration file without modifying the code itself.
Understanding these two approaches highlights the advantages of dynamically handling calibration parameters. The dynamic approach is not only more flexible but also better suited for real-world applications where sensor setups and calibrations frequently change.
By the end of this tutorial, you will be able to:
Load and process LiDAR point clouds using Open3D.
Load and process camera images using OpenCV.
Understand camera calibration and lens distortion correction.
Differentiate between static and dynamic calibration approaches.
Apply camera intrinsic and extrinsic parameters for sensor fusion.
Convert 3D LiDAR points into 2D image coordinates.
Visualize projected LiDAR points on undistorted images.
To follow this tutorial, you should have:
Basic knowledge of Python, linear algebra, and computer vision.
Familiarity with LiDAR data and point cloud processing.
Experience with OpenCV, Open3D, NumPy, SciPy, and JSON.
A working Python environment with required libraries installed (pip install opencv-python open3d numpy scipy).
Step 1: Importing Required Libraries
To start, we need to import several essential Python libraries that will help us process both LiDAR and camera data. Open3D is used to handle and manipulate LiDAR point clouds, making it a key tool for working with 3D data. OpenCV is included for image processing tasks, such as loading, undistorting, and displaying images. SciPy's Rotation module is necessary for quaternion-based transformations, which allow us to convert rotation data into a usable format. NumPy provides support for numerical operations, making it easier to manipulate matrices and perform mathematical computations. Lastly, JSON is imported to handle configuration files in the dynamic approach, allowing us to load camera calibration parameters from an external file instead of hardcoding them.
import numpy as np
import open3d as o3d
import cv2
import json
from scipy.spatial.transform import Rotation as R
The first line, import numpy as np, imports NumPy, which we use extensively for handling numerical operations, such as matrix manipulations and mathematical computations needed for transformations. Next, import open3d as o3d brings in Open3D, a library specifically designed for 3D data processing, allowing us to load, visualize, and manipulate LiDAR point clouds. The line import cv2 imports OpenCV, a powerful library for computer vision tasks, which we use for image loading, distortion correction, and visualization. The import json statement is essential for reading calibration parameters from a JSON file in the dynamic approach, making our implementation more flexible and adaptable to different sensor configurations. Finally, from scipy.spatial.transform import Rotation as R imports SciPy’s Rotation module, which allows us to work with quaternion-based rotations, making it easier to convert LiDAR data into the camera coordinate frame.
By importing these libraries, we ensure that our code is equipped with the necessary tools for sensor fusion, making it efficient and adaptable for both static and dynamic calibration approaches.
Step 2: Loading the LiDAR Point Cloud
To process and visualize 3D spatial information, we need to load the LiDAR point cloud data. A point cloud is a collection of 3D points that represent the physical environment around the LiDAR sensor. Each point in the cloud has coordinates (X, Y, Z) that define its position in space. We use Open3D, a popular library for working with 3D data, to read and manipulate the point cloud file. The file is stored in the PCD (Point Cloud Data) format, a standard format for LiDAR data storage.
pcd_path = r'/path/to/pointcloud.pcd' # Adjust this path
ptCloud = o3d.io.read_point_cloud(pcd_path)
The first line, pcd_path = r'/path/to/pointcloud.pcd', specifies the path to the LiDAR point cloud file. This is where the .pcd file is stored on your system. The r before the string indicates a "raw" string, ensuring that special characters like backslashes (\) in file paths are treated literally. You need to replace '/path/to/pointcloud.pcd' with the actual path to your LiDAR data file.
The second line, ptCloud = o3d.io.read_point_cloud(pcd_path), reads the point cloud file using Open3D’s read_point_cloud function. This function loads the 3D data into memory and stores it in the variable ptCloud. From here, we can process, visualize, and manipulate the point cloud data as needed.
By successfully loading the LiDAR point cloud, we now have a 3D representation of the environment that can be used for sensor fusion, 3D mapping, and projection onto the camera image.
Step 3: Loading the Camera Image
To integrate LiDAR data with camera images, we first need to load the image captured by the camera. This image provides color and texture information that complements the 3D spatial data from the LiDAR. We use OpenCV, a powerful computer vision library, to read the image from a file. If the image file is missing or the path is incorrect, OpenCV will fail to load the image, which could lead to errors in later processing steps.
image_file_path = r'/path/to/image.png' # Adjust this path
I = cv2.imread(image_file_path)
The first line, image_file_path = r'/path/to/image.png', defines the path to the image file. Just like with the LiDAR data, the r before the string ensures that special characters in the file path are interpreted correctly. You need to replace '/path/to/image.png' with the actual location of your image file.
The second line, I = cv2.imread(image_file_path), loads the image from the specified path using OpenCV’s imread function. This function reads the image and stores it as a NumPy array in the variable I. If the image is successfully loaded, I will contain pixel values; otherwise, it will be None, indicating that the file could not be found or opened.
Loading the camera image is essential for sensor fusion because we will later project the 3D LiDAR points onto this 2D image. This step ensures that we have the necessary visual data for overlaying depth information and improving perception in robotics and autonomous navigation.
Camera intrinsic parameters define how a camera maps 3D points in the real world to 2D pixel coordinates in an image. These parameters include the focal length (fx, fy), which determines how much the camera magnifies the scene, and the principal point (cx, cy), which represents the optical center of the image. Additionally, cameras introduce distortions due to their lenses, so we also need distortion coefficients (k1, k2, p1, p2) to correct radial and tangential distortions. Another important factor is the skew factor, which corrects any slight shearing effects caused by misalignment between the x and y axes of the image. These intrinsic parameters are essential for accurately projecting LiDAR points onto the image, ensuring proper alignment between the 3D and 2D data.
In the static approach, these values are manually defined in the script. While this method is simple, it lacks flexibility because any change in the camera setup requires modifying the code. The static intrinsic parameters are defined as follows:
Code Implementation (Static Approach)
cx = 964.99
cy = 601.66
fx = 1063.78
fy = 1055.44
k1 = -0.1279
k2 = 0.0466
p1 = 0.0020
p2 = -0.0008
skew = 1.22
First, cx and cy represent the optical center of the image. This is the point where the principal axis of the camera intersects the image plane. A camera’s principal point is usually near the center of the image but may vary slightly due to lens imperfections. Next, fx and fy define the focal length along the x and y axes, which determines how much the camera magnifies the real-world scene. These values are measured in pixels, meaning they define how many pixels correspond to a unit distance in the real world.
Since camera lenses are not perfect, they introduce distortions that can bend or shift image points. The coefficients k1 and k2 account for radial distortion, which causes straight lines near the edges of an image to appear curved. Meanwhile, p1 and p2 handle tangential distortion, which occurs when the camera lens is slightly misaligned, causing parts of the image to shift. Lastly, the skew parameter compensates for any slight tilt between the x and y axes of the image sensor, ensuring that pixels remain correctly proportioned.
To make this process more adaptable, we can use the dynamic approach, which reads these parameters from a JSON file instead of hardcoding them. This allows for greater flexibility because if the camera is recalibrated, we only need to update the JSON file rather than modifying the code. This approach is particularly useful in real-world applications where multiple cameras or different configurations may be used. The dynamic approach extracts parameters using the following code:
Code Implementation (Dynamic Approach)
with open('camera_lidar_calibration.json', 'r') as file:
calibration_data = json.load(file)
intrinsic = calibration_data['01_camera']['3_intrinsic']
cx = intrinsic['cx']
cy = intrinsic['cy']
fx = intrinsic['fx']
fy = intrinsic['fy']
k1 = intrinsic['k1']
k2 = intrinsic['k2']
k3 = intrinsic['k3']
k4 = intrinsic['k4']
k5 = intrinsic['k5']
k6 = intrinsic['k6']
p1 = intrinsic['p1']
p2 = intrinsic['p2']
skew = intrinsic['skew']
Breaking It Down: Line-by-Line Explanation
In this approach, the script first opens and reads the JSON file, which contains precomputed calibration parameters. The json.load(file) function loads this data into a dictionary called calibration_data. From this dictionary, we extract the intrinsic parameters from the section labeled "3_intrinsic" inside "01_camera". The extracted values are then assigned to the corresponding variables cx, cy, fx, fy, k1, k2, p1, p2, and skew.
One major advantage of this approach is that it supports additional distortion coefficients (k3, k4, k5, k6) if needed, making it more robust for different types of camera models. If a different camera is used or if the lens properties change, we simply update the JSON file, and the script will automatically use the new parameters without requiring manual edits.
By implementing this dynamic approach, we significantly improve the scalability and maintainability of our system. This method ensures that our camera calibration remains adaptable to different environments and configurations, making it easier to integrate with LiDAR data in real-world applications.
The camera intrinsic matrix is a key component in computer vision and 3D reconstruction. It defines how a camera projects 3D points from the real world onto a 2D image plane. This matrix is built using the intrinsic parameters extracted in the previous step, including the focal lengths (fx, fy), the principal point (cx, cy), and the skew factor, which accounts for any slight tilt between the image axes. Understanding this matrix is essential for camera calibration, perspective projection, and 3D reconstruction, as it allows us to align LiDAR points with the camera image accurately.
Code Implementation
intrinsic_matrix = np.array([[fx, skew, cx],
[0, fy, cy],
[0, 0, 1]])
This 3×3 matrix plays a critical role in converting 3D coordinates into 2D image coordinates. Let's analyze it step by step:
The first row [fx, skew, cx] contains three important values.
fx is the focal length in the x-direction. It defines how much the camera magnifies objects along the horizontal axis.
skew is a factor that accounts for any non-orthogonality (tilt) between the x and y axes. In most modern cameras, the skew is zero, meaning the image axes are perfectly perpendicular, but it is included for completeness.
cx is the x-coordinate of the principal point, which represents the optical center of the camera in the image plane.
The second row [0, fy, cy] handles the parameters for the y-direction.
The first value (0) indicates that there is no skew in the y-direction.
fy is the focal length in the y-direction, defining the vertical magnification of the image.
cy is the y-coordinate of the principal point, representing the vertical center of the image.
The third row [0, 0, 1] is a fixed row used in homogeneous coordinate transformations.
This row is required for proper matrix operations when working with projective geometry.
It ensures that the matrix can be used in camera calibration and perspective projection equations.
By assembling all these values, the intrinsic matrix enables accurate projection of 3D points onto the 2D image. This is particularly useful when aligning LiDAR point clouds with camera images in sensor fusion applications, ensuring that depth information from LiDAR and color information from the camera are properly matched. This matrix is fundamental for tasks like augmented reality, object detection, and 3D mapping, making it an essential concept in computer vision.
The camera extrinsic parameters define the spatial relationship between the LiDAR and the camera, essentially specifying how their coordinate systems align. These parameters consist of two main components: the translation vector and the rotation quaternion. The translation vector (tx, ty, tz) represents how much the camera is shifted in relation to the LiDAR in the x, y, and z directions. Meanwhile, the rotation quaternion (w, x, y, z) describes how the camera is oriented relative to the LiDAR, allowing us to correctly align their coordinate frames. By applying these extrinsic parameters, we can transform LiDAR points into the camera’s reference frame, enabling accurate sensor fusion for applications like 3D reconstruction, robotics, and autonomous vehicles.
Code Implementation (Static Approach)
In the static approach, we define the extrinsic parameters manually within the script:
tx = -0.00035
ty = -0.1513
tz = -0.0934
Translation = np.array([tx, ty, tz]).reshape(3, 1)
w = 0.00012
x = 0.00355
y = -0.70706
z = 0.70714
Let’s analyze the code step by step. The translation vector (tx, ty, tz) determines how much the camera is offset from the LiDAR along the three spatial axes. We store these values in a NumPy array and reshape it into a 3×1 column matrix using reshape(3,1), ensuring compatibility with transformation operations. Without this reshaping, certain mathematical operations required for coordinate transformations might not work correctly.
Next, we define the rotation quaternion (w, x, y, z), which describes how the camera is rotated relative to the LiDAR. Unlike rotation matrices, quaternions are commonly used in robotics and computer vision because they eliminate problems like gimbal lock, which can cause issues in 3D transformations. This quaternion helps us correctly align the LiDAR and camera coordinate frames, ensuring that 3D LiDAR points are projected accurately onto the camera’s 2D image. While this static approach is straightforward, it has a major drawback—if the calibration setup changes (for example, if the camera or LiDAR is repositioned), we need to manually update these values in the script, which is not ideal for real-world applications.
Code Implementation (Dynamic Approach)
To overcome the limitations of the static approach, we use a dynamic method where extrinsic parameters are loaded from a JSON file instead of being hardcoded. The implementation looks like this:
extrinsic = calibration_data['01_camera']['4_extrinsic']
tx = extrinsic['tx']
ty = extrinsic['ty']
tz = extrinsic['tz']
Translation = np.array([tx, ty, tz]).reshape(3, 1)
w = extrinsic['w']
x = extrinsic['x']
y = extrinsic['y']
z = extrinsic['z']
In this approach, instead of manually defining the values, we extract them dynamically from a camera-LiDAR calibration file. This makes the system much more scalable and adaptable to different setups. First, we load the extrinsic parameters from the calibration_data dictionary, which stores precomputed values from a calibration process. The translation vector (tx, ty, tz) is extracted and reshaped into a 3×1 matrix, just like in the static approach, to ensure compatibility with transformation operations. Similarly, the rotation quaternion (w, x, y, z) is retrieved from the file, ensuring that the most up-to-date rotation data is always used.
This dynamic approach offers several advantages over the static method. If a new camera or LiDAR setup is introduced, we do not need to modify the script manually—we only need to update the calibration file. This makes the system much more flexible and robust, especially in scenarios where multiple cameras and LiDAR sensors are used.
Extrinsic parameters are a critical component in sensor fusion, where data from multiple sensors—such as LiDAR and cameras—need to be combined. By applying these parameters, we can accurately project 3D LiDAR points onto the 2D camera image, allowing us to align depth information with visual data. This is essential in applications such as autonomous driving, robotics, augmented reality, and 3D scene reconstruction. Additionally, using a dynamic approach enables seamless adaptation to different camera and LiDAR configurations, eliminating the need for frequent manual recalibrations.
Now that we have extracted the extrinsic parameters, the next crucial step is to compute the rotation matrix, which is essential for transforming LiDAR points into the camera’s coordinate system. Since the LiDAR and the camera are mounted at different angles and positions, their coordinate frames do not naturally align. To properly overlay the LiDAR points onto the camera image, we need to rotate the LiDAR coordinate system so that it matches the camera’s frame of reference.
Rather than working directly with a rotation matrix, many calibration systems prefer using quaternions—a four-number representation (w, x, y, z) that is more stable for 3D rotations. Compared to Euler angles, quaternions eliminate gimbal lock issues and provide a smooth and continuous way to represent orientation. However, most mathematical operations in sensor fusion require a 3×3 rotation matrix, so before applying transformations to the LiDAR data, we must convert the quaternion into a rotation matrix. Fortunately, in Python, this conversion is simple using the SciPy Rotation module, which enables seamless transformation between different rotation representations.
Code Implementation
rotation = R.from_quat([x, y, z, w])
Rotation = rotation.as_matrix()
To understand how this code works, let’s break it down step by step.
The first line, creates a rotation object from the given quaternion values. The function R.from_quat() belongs to SciPy’s Rotation module, which provides various utilities for handling 3D rotations. It takes in a list of four values (x, y, z, w) that represent the quaternion and converts them into an internal representation that SciPy can work with. Essentially, this tells SciPy:
"Here is a quaternion. Interpret it as a 3D rotation."
At this stage, the variable rotation holds an abstract representation of the orientation of the LiDAR relative to the camera, but it is not yet in a matrix form that can be applied directly to transform LiDAR points.
To make this rotation usable, we need to extract the corresponding 3×3 rotation matrix. That’s where the second line comes in:
The .as_matrix() function converts the rotation object into a 3×3 rotation matrix, which is the standard form used for transformations in computer vision and robotics. This matrix represents how points in the LiDAR coordinate system should be rotated to align with the camera’s coordinate system.
Think of it this way:
Before calling .as_matrix(), we just have an understanding of how the LiDAR is rotated relative to the camera (stored in rotation).
After calling .as_matrix(), we now have an actual numerical representation of that rotation, which can be used for matrix operations.
Without computing this rotation matrix, the LiDAR points would remain in their original coordinate system, which does not match the camera’s view. This misalignment would lead to incorrect projections of LiDAR points onto the camera image, making sensor fusion inaccurate.
By computing the rotation matrix, we ensure that the LiDAR data is properly rotated into the camera's frame, enabling precise alignment between the two sensors. This step is particularly crucial for real-world applications such as autonomous vehicles, robotics, and augmented reality, where accurate fusion of LiDAR and camera data is necessary for perception and decision-making.
Now that we have computed the rotation matrix, the next step is to construct the transformation matrix, which allows us to convert points from the LiDAR coordinate system to the camera coordinate system. This matrix is essential because LiDAR and cameras have different frames of reference, meaning their coordinate systems do not naturally align.
To perform an accurate transformation, we need to combine two key components:
Rotation Matrix (R) – This represents how the LiDAR coordinate system must be rotated to match the camera’s orientation.
Translation Vector (T) – This accounts for the positional difference between the LiDAR and the camera.
Together, these components form a 4×4 transformation matrix, which is the standard way of representing 3D transformations in computer vision and robotics. By applying this transformation matrix, we can convert any LiDAR point into the camera’s frame, enabling proper sensor fusion.
Code Implementation
Transformation_Matrix = np.hstack((Rotation, Translation))
Transformation_Matrix = np.vstack((Transformation_Matrix, [0, 0, 0, 1]))
Let’s break this down to understand exactly what’s happening.
The first line, takes the rotation matrix and the translation vector and stacks them horizontally to form a 3×4 matrix.
Rotation is a 3×3 matrix that defines how the LiDAR coordinate system is rotated relative to the camera.
Translation is a 3×1 vector, which defines how the LiDAR is shifted relative to the camera in 3D space.
np.hstack() stands for horizontal stacking, meaning we place the translation vector next to the rotation matrix.
At this stage, the transformation matrix looks like this:
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
This 3×4 matrix can transform 3D points from the LiDAR frame to the camera frame, but it is still incomplete because we need a homogeneous transformation matrix for proper mathematical operations in 3D space.
To convert it into a 4×4 homogeneous transformation matrix, we use the second line:
np.vstack() stands for vertical stacking, meaning we add a new row to the bottom of the existing 3×4 matrix.
The row [0, 0, 0, 1] is required to make it a homogeneous transformation matrix, ensuring that it can be used in matrix multiplication for 3D transformations.
Now, the final 4×4 transformation matrix looks like this:
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
Constructing this 4×4 transformation matrix is a crucial step in sensor fusion. It allows us to:
Transform any LiDAR point into the camera’s frame, enabling accurate depth alignment in images.
Work with homogeneous coordinates, which makes mathematical transformations (such as translation and rotation) easier.
Use a single matrix multiplication to convert LiDAR data into the camera’s reference frame efficiently.
Without this transformation matrix, LiDAR points would remain in their original coordinate system, causing misalignment with the camera data. By computing this matrix, we ensure that the LiDAR data is properly aligned with the camera, making it possible to accurately overlay LiDAR depth information onto images for applications like autonomous vehicles, robotics, and augmented reality.
Since camera lenses introduce distortion, the images captured by the camera may not represent the real-world scene accurately. This distortion occurs due to the curvature of the lens, especially in wide-angle and fisheye lenses, causing straight lines to appear bent and objects near the edges to be stretched or compressed. To ensure accurate sensor fusion, we must first correct these distortions and obtain an undistorted image before projecting LiDAR points onto it.
To achieve this, we use OpenCV’s cv2.undistort() function, which corrects distortions using two key parameters:
Camera Intrinsic Matrix (Camera Matrix) – This defines the intrinsic parameters of the camera, including focal length and optical center.
Distortion Coefficients – These values describe the distortion introduced by the lens and help to remove it.
After applying the undistortion process, the image appears as if it were captured with an ideal pinhole camera, ensuring that any projected LiDAR points align correctly with the scene.
Code Implementation
camera_matrix = np.array([[fx, skew, cx],
[0, fy, cy],
[0, 0, 1]])
dist_coeffs = np.array([k1, k2, p1, p2, 0]) # Higher-order coefficients are optional
undistorted_image = cv2.undistort(I, camera_matrix, dist_coeffs)
To understand how the code works, let’s go through it step by step. The first part of the code defines the camera intrinsic matrix, which is a 3×3 matrix representing the internal properties of the camera. This matrix contains essential parameters such as the focal lengths (fx and fy) in the x and y directions, which determine how much the camera magnifies the image along each axis. It also includes cx and cy, which specify the optical center (principal point)—the location where light rays converge on the camera sensor. Additionally, there is a skew term, which is usually zero unless the camera has a non-square pixel aspect ratio. This matrix is fundamental because it defines the geometric relationship between the camera sensor and the captured image, allowing us to correctly map 3D points into 2D image space.
Next, we define the distortion coefficients, which describe how much the camera lens distorts the image. Lenses, especially wide-angle ones, introduce distortions that can cause straight lines to appear curved and objects near the edges to be stretched or compressed. The coefficients k1 and k2 represent radial distortion, which primarily affects how straight lines appear curved in the image. The parameters p1 and p2 account for tangential distortion, which occurs when the lens is slightly misaligned with the camera sensor, causing the image to shift unevenly. Additionally, a higher-order term, k3, can be used for extreme distortions, though it is not always necessary.
Finally, we apply the undistortion function using OpenCV’s cv2.undistort(). This function takes the original distorted image I, the camera matrix containing the intrinsic parameters, and the distortion coefficients that define the lens-induced distortion. The function processes the image and outputs an undistorted version, where the distortions have been corrected. The result is an image that appears as if it were captured with an ideal pinhole camera, ensuring that straight lines remain straight and that objects are represented with correct proportions. By performing this correction, we make sure that any projected LiDAR points align correctly with the image, which is crucial for accurate sensor fusion in applications such as autonomous vehicles, robotics, and augmented reality.
Correcting image distortion is critical for accurate sensor fusion. If we attempt to project LiDAR points onto a distorted image, the points will not align correctly with real-world objects. By undistorting the image first:
Projected LiDAR points align accurately with objects in the image.
Geometric measurements (such as distances and object sizes) remain correct.
Computer vision algorithms (like object detection and feature matching) work more reliably.
This step ensures that we have a clear, geometrically accurate image, making the subsequent fusion of LiDAR and camera data much more precise, especially in autonomous vehicles, robotics, and augmented reality applications.
Step 10: Transforming LiDAR Points to the Camera Frame
LiDAR sensors capture the surrounding environment in 3D point clouds, where each point represents a real-world position in space relative to the LiDAR coordinate system. However, since cameras capture 2D images, we need to align the LiDAR points with the camera’s frame before projecting them onto the image. This step involves transforming the LiDAR points from the LiDAR coordinate system to the camera coordinate system using the transformation matrix we built earlier.
The transformation matrix consists of two components:
Rotation matrix: Aligns the orientation of the LiDAR points to match the camera’s viewpoint.
Translation vector: Shifts the LiDAR points so that their positions are expressed relative to the camera instead of the LiDAR.
By applying this transformation, we ensure that each LiDAR point is expressed in the same coordinate frame as the camera, enabling accurate sensor fusion.
pcd_points = np.asarray(ptCloud.points)
num_points = pcd_points.shape[0]
# Convert to homogeneous coordinates
homogeneous_points = np.hstack((pcd_points, np.ones((num_points, 1))))
# Apply transformation
transformed_points = Transformation_Matrix @ homogeneous_points.T
To better understand how this transformation works, let’s break it down step by step.
The first line extracts the 3D LiDAR points from the point cloud object. The variable pcd_points stores the x, y, and z coordinates of each point in a NumPy array, making it easier to manipulate mathematically. The num_points variable simply counts the total number of points in the LiDAR scan, which is useful for structuring the data in later steps.
Since we need to apply a rigid body transformation (a combination of rotation and translation), we must first convert the LiDAR points to homogeneous coordinates. Homogeneous coordinates are a mathematical representation that adds an extra fourth dimension to each point. This is done using np.hstack(), which concatenates a column of ones to the pcd_points array. As a result, each LiDAR point is now represented as (x, y, z, 1), rather than just (x, y, z). This additional 1 allows us to apply the rotation and translation in a single matrix multiplication, making the transformation more efficient.
Once the points are in homogeneous coordinates, we apply the transformation matrix. The transformation matrix, which we computed earlier, contains both the rotation matrix and the translation vector. By multiplying this matrix with the homogeneous LiDAR points, we effectively rotate and shift the LiDAR points so that they are expressed in the camera’s coordinate frame instead of the LiDAR’s.
The result of this operation, stored in transformed_points, contains the new (x, y, z) positions of each LiDAR point, now aligned with the camera’s viewpoint.
This transformation step is crucial for sensor fusion because cameras and LiDAR sensors capture the environment from different viewpoints. Without aligning the LiDAR points to the camera’s frame, they cannot be accurately overlaid onto the image.
By transforming the points into the camera’s coordinate system, we ensure that:
LiDAR depth data correctly aligns with the camera image, making it possible to overlay 3D information onto the 2D scene.
Object detection and recognition models can utilize both visual and depth data, improving perception accuracy.
Real-time autonomous systems, such as self-driving cars and robots, can make better decisions based on an integrated understanding of their surroundings.
This step is a fundamental part of sensor fusion in computer vision and is widely used in applications like autonomous vehicles, robotic navigation, and augmented reality, where precise environmental understanding is essential.
After transforming the LiDAR points into the camera coordinate system in the previous step, we now need to project them onto the 2D image plane. Since LiDAR provides 3D spatial information, while the camera captures 2D images, we need to establish a mathematical transformation to map the 3D LiDAR coordinates (X,Y,Z)(X, Y, Z)(X,Y,Z) in the camera frame to 2D pixel coordinates (u,v)(u, v)(u,v) in the image.
This transformation is achieved using the camera intrinsic matrix, which encodes the camera’s focal length and optical center. The intrinsic matrix transforms 3D points from the camera frame into the image plane’s coordinate system, ensuring that each LiDAR point is assigned a pixel location on the image.
This step ensures that each 3D LiDAR point is accurately mapped to a pixel in the image, allowing us to overlay depth information onto the visual scene.
Code Implementation
projected_points = intrinsic_matrix @ transformed_points[:3, :]
# Normalize by dividing by the third row to get (x, y) pixel coordinates
projected_points /= projected_points[2, :]
The first step in the code involves multiplying the intrinsic matrix with the transformed LiDAR points. The intrinsic matrix is a 3×3 matrix that defines the camera’s internal properties, including focal lengths and the principal point. The transformed_points[:3, :] extracts only the X, Y, and Z coordinates from the LiDAR points that have already been converted into the camera’s coordinate frame. By performing this matrix multiplication, we transform these 3D points into a new space where they are represented in homogeneous image coordinates (u, v, w).
At this stage, the points are not yet in standard pixel format. The result includes a third coordinate (w), which accounts for the depth of each point relative to the camera. To convert these into valid 2D pixel coordinates, we perform a normalization step. This is done by dividing each point’s x and y values by its corresponding w value. The operation ensures that the points are scaled correctly and account for perspective distortion, meaning that objects farther from the camera appear smaller in the image.
By completing this step, the LiDAR points are now properly projected onto the camera’s image plane, allowing them to be overlaid accurately onto the image. This alignment is crucial in sensor fusion applications, enabling depth perception in 2D imagery for tasks such as object detection, autonomous navigation, and augmented reality.
Step 12: Visualizing the Projection
After projecting the LiDAR points onto the 2D image plane, the final step is to visualize the alignment between the LiDAR and camera data. This step involves overlaying the projected points onto the undistorted image, helping us confirm whether the transformation and projection steps were correctly applied. If everything is done properly, the LiDAR points should align with the objects in the image, accurately representing depth and structure.
To achieve this, we iterate through the projected points and plot them as small circles on the undistorted image using OpenCV’s cv2.circle() function. Each projected LiDAR point corresponds to a pixel location on the image, representing where the 3D point appears in the 2D scene. By displaying this image, we can visually verify the accuracy of the projection and identify any misalignment issues.
Code Implementation
for i in range(projected_points.shape[1]):
x, y = int(projected_points[0, i]), int(projected_points[1, i])
if 0 <= x < undistorted_image.shape[1] and 0 <= y < undistorted_image.shape[0]:
cv2.circle(undistorted_image, (x, y), 2, (0, 0, 255), -1)
cv2.imshow('Projected LiDAR Points', undistorted_image)
cv2.waitKey(0)
cv2.destroyAllWindows()
The code starts by iterating over all the projected LiDAR points. Each point’s x and y coordinates are extracted from projected_points, which contains the transformed 2D locations of the LiDAR points. Since pixel coordinates must be integers, the values are converted using int().
Before drawing the point, we check if it falls within the image boundaries. If x and y are within the valid range—meaning they lie within the width and height of the image—the point is plotted using cv2.circle(). The function takes the undistorted image, the (x, y) coordinates, the circle’s radius (2 pixels), the color (0, 0, 255) (which represents red in BGR format), and -1 to indicate that the circle should be filled.
Once all points are drawn, cv2.imshow() displays the image with the projected LiDAR points overlaid. The function cv2.waitKey(0) waits for a key press before closing the window, and cv2.destroyAllWindows() ensures all OpenCV windows are properly closed.
This visualization step is crucial for validating the entire LiDAR-to-camera projection pipeline. If the points correctly align with the objects in the image, the calibration and transformations were successful. Otherwise, misalignment may indicate errors in the intrinsic/extrinsic parameters, transformation matrix, or projection process.
By completing these steps, we have successfully projected LiDAR points onto a camera image while accounting for lens distortion. This process ensures accurate alignment between 3D LiDAR data and 2D images, which is essential for sensor fusion in applications such as autonomous driving, robotics, and augmented reality. The comparison between static and dynamic approaches highlights the importance of flexibility in real-world scenarios. While the static approach provides a straightforward implementation, it requires manual updates whenever the setup changes. On the other hand, the dynamic approach leverages calibration files, making it more efficient and adaptable to varying environments.
With the projection successfully implemented, the next step is to enhance the system by integrating advanced filtering and refinement techniques. You can improve accuracy by applying depth-aware filtering to remove outliers, optimizing calibration parameters for better alignment, or using deep learning-based methods for sensor fusion. Additionally, real-time implementation using ROS2 can enable seamless integration into robotic systems. Future enhancements may also include overlaying semantic segmentation or object detection results onto the fused LiDAR-camera data, further improving scene understanding for autonomous systems.
Yongin-si, South Korea
sumairamanzoorpk@gmail.com