Auto-Detect Single Active Camera from Calibration JSON and Convert to KITTI Format

(Hesai OT128 LiDAR to Hikvision Camera)

1: Introduction

In this tutorial, we focus on converting a real-world calibration JSON file of a Hesai OT128 LiDAR and Hikvision camera into a KITTI-format calibration file. The KITTI format is widely used in autonomous driving research, enabling seamless integration with many 3D detection and perception frameworks.

We will automatically detect the single active camera from the JSON file, extract its intrinsics and extrinsics, and build the projection matrix without stereo rectification since only one camera is used. Finally, we'll save the results in KITTI format for future use.

2: What Will You Learn?

✔ Parse and interpret a sensor calibration JSON file
✔ Auto-detect the valid/active camera from the JSON
✔ Extract camera intrinsics (focal lengths, principal points) and extrinsics (rotation and translation).
✔ Convert quaternion rotations to rotation matrices.
✔ Build the camera projection matrix.

✔ Derive the LiDAR-to-camera transform
✔ Save calibration parameters in KITTI format.

3: By the end of this tutorial, you'll be able to:

✔ Process calibration JSON structure.

✔Prepare your own calibration data for advanced sensor fusion pipelines with KITTI standards

✔ Adapt this approach for generating and saving your files in KITTI-compatible calibration matrices

5: Prerequisites

One JSON LiDAR to Camera calibration file (for OT128 to HIK: calib_250507_171326_ot.json)
Basic understanding of:
- - - Camera calibration parameters
    - LiDAR-to-camera transformations
    - Matrix operations
Before we start, make sure you have the following installed:
- - - Python 3
    - Scipy (pip install scipy)
    - NumPy (pip install numpy)

5. System Architecture

Input:

A real-world calibration JSON file for:
1. - Hesai OT128 LiDAR
  - Hikvision camera

Process:

Load the JSON file.
Auto-detect the active camera with valid intrinsics.
Extract and build
- - Intrinsic parameters K (fx, fy, cx, cy, skew).
  - Extrinsic parameters [R|t] (rotation as quaternion, translation vector).
Convert quaternion to rotation matrix.
Build the camera projection matrix P = K * [R | t].
Invert [R|t] to get Tr_lid_to_cam
Format all matrices in KITTI calibration file style.
Write to a .txt file compatible with KITTI format.

Output:

A KITTI-format calibration .txt file containing:

Camera projection matrix P0
LiDAR-to-camera matrix Tr

6. What calib_250507_171326_ot.json is Doing?

This JSON calibration file contains detailed calibration parameters between a Hesai OT128 LiDAR sensor and a Hikvision camera. It stores intrinsic parameters (like focal lengths, principal points) and extrinsic parameters (the camera pose relative to LiDAR) for multiple cameras and sensors.

Our goal is to identify the valid single active camera from this JSON, extract its parameters, and convert them into a standard KITTI calibration file format for easier use in autonomous driving workflows.

7. Detailed Workflow

7.1.0 Step 1: Load Calibration JSON and Auto-Detect Active Camera

7.1.1 What Will We Do?

We will read the JSON calibration file, iterate over the entries to find the first camera with non-zero intrinsic parameters, indicating it is valid and active. This step prevents hard coding and makes the process robust to different calibration files.

7.1.2 Code Implementation

with open(json_path, 'r') as f:

calib_data = json.load(f)

....

...

camera_key = None

...

if "camera" in key and "3_intrinsic" in val:

intr = val["3_intrinsic"]

if all(intr.get(k, 0) != 0 for k in ["fx", "fy", "cx", "cy"]):

camera_key = key

...

raise ValueError("No valid camera with non-zero intrinsics found in calibration file.")

7.1.3 Breaking It Down: Line-by-Line Explanation

The first part imports the json module and loads the calibration file into a calib_data dictionary. We then initialize camera_key as None. Using a for loop, we iterate over all key-value pairs in the calibration data dictionary. We filter keys containing the word "camera" and check if they have the "3_intrinsic" key, which holds the intrinsic parameters.

Within this intrinsic dictionary, for each camera entry, it checks for the presence of intrinsic parameters (3_intrinsic). The validation specifically tests four critical parameters (fx, fy, cx, cy) to ensure they contain non-zero values - - a hallmark of an active Hik camera configuration in Hesai's system. When the first valid camera is found, we assign the key to camera_key and break the loop, ensuring we work with the primary active camera.

If no such camera is found, the script raises an error, halting the program with a clear message. Otherwise, it prints out the name of the detected active camera for confirmation.

7.2.0 Step 2: Extract and Build the Intrinsic Matrix

7.2.1 What Will We Do?

Now that the active camera is detected, we'll extract its intrinsic parameters — focal lengths (fx, fy), principal points (cx, cy), and skew. Then, we'll construct the camera intrinsic matrix K.

7.2.2 Code Implementation

# Get intrinsics

intrinsics = cam_data["3_intrinsic"]

fx = intrinsics["fx"]

fy = intrinsics["fy"]

cx = intrinsics["cx"]

cy = intrinsics["cy"]

skew = intrinsics.get("skew", 0)

# Build intrinsic matrix K

K = np.array([

[fx, skew, cx],

[0, fy, cy],

[0, 0, 1]

])

7.2.3 Breaking It Down: Line-by-Line Explanation

We retrieve the calibration data dictionary for the detected camera and access its "3_intrinsic" section, which holds the intrinsic parameters. The focal lengths fx and fy represent the camera's scaling factors or camera's magnification in pixels along the x and y axes respectively, while cx and cy represent the principal point coordinates in pixels which define the image center.

The skew parameter is optional and typically defaults to 0 for most modern cameras, where the image axes are perpendicular. A non-zero skew is only necessary if the pixels are not perfectly rectangular, which is uncommon in modern imaging systems.

With these values, we construct the 3x3 intrinsic matrix K using NumPy. This matrix is fundamental in camera geometry, transforming 3D points into normalized image coordinates. The intrinsic matrix K is constructed in the standard form where the skew parameter occupies the [0,1] position, focal lengths are diagonal, and principal point offsets are in the last column.

7.3.0 Step 3: Extract and Build the Extrinsic Matrix

7.3.1 What Will We Do?

Next, we'll extract the camera extrinsic parameters from the JSON, convert the rotation quaternion to a 3x3 rotation matrix, and combine it with the translation vector to form the [R|t] matrix.

7.3.2 Code Implementation

# Get extrinsics

extrinsic = cam_data["4_extrinsic"]

quat = [extrinsic["x"], extrinsic["y"], extrinsic["z"], extrinsic["w"]]

R_cam = R.from_quat(quat).as_matrix()

t_cam = np.array([extrinsic["tx"], extrinsic["ty"], extrinsic["tz"]]).reshape(3, 1)

# Build camera projection matrix P = K * [R | t]

RT = np.hstack((R_cam, t_cam))

7.3.3 Breaking It Down: Line-by-Line Explanation

The extrinsic parameters define the camera's orientation and position relative to the LiDAR coordinate frame.

The extrinsic parameters from Hesai's calibration use quaternion notation with components [x, y, z, w] for rotation. The code first converts this to a 3×3 rotation matrix R_cam using SciPy's Rotation class, which properly handles the quaternion normalization and conversion.

The translation vector [tx, ty, tz] is extracted and reshaped into a 3x1 column vector. These two components are horizontally stacked using np.hstack() to form the [R|t] matrix, which describes the full rigid-body transformation from world coordinates (LiDAR space) to camera coordinates.

This matrix will later be multiplied by the intrinsic matrix to form the complete camera projection matrix.

7.4.0 Step 4: Construct Camera Projection Matrix and Prepare LiDAR-to-Camera Transform

7.4.1 What Will We Do?

We will combine the intrinsic matrix K and the extrinsic matrix [R|t] to form the camera projection matrix P. Additionally, since LiDAR-to-camera transform is not explicitly provided, we use an identity matrix as a placeholder.

7.4.2 Code Implementation

P = K @ RT

# Invert camera-to-LiDAR to get LiDAR-to-camera transform

R_lidar_to_cam = R_cam.T

t_lidar_to_cam = -R_lidar_to_cam @ t_cam

Tr_lid_to_cam = np.hstack((R_lidar_to_cam, t_lidar_to_cam))

7.4.3 Breaking It Down: Line-by-Line Explanation

The projection matrix P is computed through matrix multiplication of the intrinsic matrix K and extrinsic matrix [R|t]. This matrix projects 3D points in the LiDAR coordinate frame into 2D pixel coordinates on the camera image plane.

@ operator performs matrix multiplication
P transforms 3D world points → 2D image coordinates
RT is the extrinsic matrix combining rotation (R) and translation (t) from the LiDAR (or world) to camera coordinate system (3×4).
Resulting P is a 3×4 matrix:

[p11 p12 p13 p14]

[p21 p22 p23 p24]

[p31 p32 p33 p34]

The resulting matrix will project Hesai OT128 LiDAR points onto the Hik camera's image plane with proper perspective.

To convert 3D LiDAR points into the camera frame for projection onto the image, we need the transformation matrix that maps LiDAR coordinates to camera coordinates—commonly referred to as Tr_lid_to_cam. However, in many calibration files, the provided transformation describes the pose of the camera with respect to the LiDAR (i.e., a camera-to-LiDAR transform). To derive the required LiDAR-to-camera transformation, we must invert this pose.

The inversion starts with the rotation component. Since the rotation matrix R_cam defines the orientation of the camera with respect to the LiDAR, its transpose R_cam.T gives the inverse rotation, which maps vectors from the camera frame back to the LiDAR frame. This is valid because rotation matrices are orthonormal, and their transpose is equal to their inverse.

Next, the translation vector t_cam, which originally describes the camera’s position in the LiDAR coordinate frame, also needs to be inverted. This is done by applying the negative of the rotated translation vector: -R_lidar_to_cam @ t_cam. This yields the translation from the LiDAR origin to the camera origin in the new (inverted) frame.

Finally, these inverted components are combined horizontally using np.hstack() to form the full 3×4 matrix Tr_lid_to_cam. This transformation matrix can now be used to map any 3D point in the LiDAR coordinate system into the camera coordinate system. This is a crucial step in LiDAR–camera fusion pipelines and is the standard format used in popular datasets such as KITTI for accurate point cloud projection onto image planes.

To summarize, the following three lines invert the camera-to-LiDAR transform (usually provided in the calibration file) to obtain the LiDAR-to-camera transform, which is required for projecting LiDAR points into the camera frame:

R_lidar_to_cam = R_cam.T

This transposes (inverts) the rotation matrix R_cam, which originally represents rotation from LiDAR to camera.
Transposing a rotation matrix gives its inverse (because rotation matrices are orthonormal).
The result is a matrix that rotates from camera back to LiDAR — i.e., R_lidar_to_cam.

t_lidar_to_cam = -R_lidar_to_cam @ t_cam

This line computes the inverse translation vector.
t_cam is the translation vector from LiDAR to camera.
The inverted translation t_lidar_to_cam moves from camera back to LiDAR using the inverse rotation.

Tr_lid_to_cam = np.hstack((R_lidar_to_cam, t_lidar_to_cam))

This concatenates the rotation matrix and translation vector horizontally.
The result is a full 3×4 transformation matrix that converts 3D LiDAR coordinates into the camera coordinate frame.
This matrix is commonly referred to as Tr_lid_to_cam and is used in many datasets like KITTI for point cloud projection.

7.5.0 Step 5: Format and Write KITTI Calibration File

7.5.1 What Will We Do?

We will format the matrices into KITTI's standardized calibration format and write them into a text file. This format is essential for interoperability with many autonomous driving frameworks.

7.5.2 Code Implementation

# Formatter

def format_matrix_line(name, mat):

vals = mat.flatten()

vals_str = ' '.join([f"{v:.12e}" for v in vals])

return f"{name}: {vals_str}\n"

# Output file

output_file = 'KITTI_calib_converstionFrom_OT128_Hesai.txt'

# Write calibration

with open(output_file, 'w') as f:

f.write(format_matrix_line("P0", P)) # Only one camera

f.write(format_matrix_line("Tr_lid_to_cam", Tr_lid_to_cam)) # Optional

print(f"Saved KITTI calibration for camera '{camera_key}' to: {output_file}")

7.5.3 Breaking It Down: Line-by-Line Explanation

We define a helper function that takes a matrix name and matrix, flattens it row-wise, and converts each element into scientific notation with 12 decimal places for precision.

For Hesai OT128 to single Hik camera setups, we only write P0 (unlike multi-camera KITTI files that include P0-P3).

The formatted line follows KITTI calibration file syntax like:

P0: 1.234567890123e+00 0.000000000000e+00 ... 1.234567890123e+00

Next, we open the output file in write mode and write two lines: one for the camera projection matrix P0 and one for the LiDAR-to-camera transform Tr_lid_to_cam.

The program finishes by printing a confirmation message with the output path.

8: Results:

Using camera: 01_camera

Saved KITTI calibration for camera '01_camera' to: KITTI_calib_converstionFrom_OT128_Hesai.txt

When we open this generated file, it shows

P0: -1.048084303535e+03 -9.678176071402e+02 -4.574842846281e+00 -1.096756471123e+02 5.191355532464e+00 -6.024481705420e+02 -1.053167011130e+03 -2.236342773168e+02 1.402471436103e-02 -9.998767996874e-01 -7.049314439184e-03 -1.167375362095e-01

Tr_lid_to_cam: 1.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00

In this file

P0: The projection matrix for the detected active Hikvision camera.
Tr_lid_to_cam: Identity matrix placeholder for LiDAR-to-camera transformation.

This file can now be integrated into KITTI-based sensor fusion pipelines or 3D detection models, enabling further autonomous perception tasks.

9: Conclusion &

This tutorial guided you through converting a Hesai OT128 LiDAR to Hikvision camera JSON calibration into a KITTI-compatible calibration file. By programmatically detecting the active camera, extracting intrinsic and extrinsic parameters, and formatting them properly, you can easily adapt calibration data for downstream perception systems.

10: Next Steps

Validate projections by overlaying LiDAR points on images
Integrate with ROS2or Open3D for visualization
Extend the script to handle stereo camera calibration with rectification. (Already done with AutL LiDARs and two cameras)
Integrate the calibration output with sensor fusion or 3D object detection models.

Yongin-si, South Korea

sumairamanzoorpk@gmail.com

0082-10-7498-2214

Page updated

Google Sites

Report abuse