Sumaira Portfolio - Publications

PUBLICATIONS (as 1st author)

# 1: Year 2025

Distance-Adaptive Sensor Fusion to Enhance 2D–3D Object Localization for Perception in Autonomous Driving Systems

Type: Conference Article

** Accepted in:

ICRCV - International Conference on Robotics and Computer Vision

Abstract: Sensor fusion plays a pivotal role in the perception of autonomous robots and self-driving vehicles, enabling accurate object localization by combining the rich visual cues from cameras with the precise depth information from LiDAR. However, traditional fusion methods, namely early and late fusion, struggle with noise amplification, weak spatial alignment, and limited adaptability across varying distances. We propose a distance-adaptive fusion strategy that integrates 2D and 3D bounding boxes using a weighted averaging mechanism based on object range to address these limitations. This method improves both spatial precision and detection reliability by balancing the visual and depth data contributions according to their relative strengths over distance. Evaluation on the KITTI dataset shows that Our method achieves: (1) 68% lower short-range error (2.8-23.7%) vs early fusion, (2) 0% mid-range error without late fusion’s 25% fragmentation, and (3) 1.8% long-range error where early fusion fails. With consistent 2.8-12.3% errors (48-85% better than early fusion’s worst cases) and 33% higher recall, it outperforms both approaches.

# 2: Year 2023

Edge Deployment of Vision-based Model for Human-Following Robot

Type: Conference Article

Published in:

ICCAS - International Conference on Control, Automation & Systems

Abstract: Mobile robots are proliferating at a significant pace and the continuous interaction between humans and robots opens the doors to facilitate our daily life activities. Following the target person with the robot is an important human-robot interaction (HRI) task that leads to its applications in industrial, domestic, and medical assistant robots. To implement the robotic tasks, traditional solutions rely on cloud servers that cause significant communication overhead due to data offloading. In our work, we overcome this potential issue of cloud-based solutions, by implementing the task of a hum-following robot (HFR) at the Nvidia Jetson Xavier NX edge platform. To perform the HFR task, typical approaches track the target person only from behind. While, our work allows the robot to track the person from behind, front, and side views (left & right). In this article, we combine the latest advances of deep learning and metric learning by presenting two trackers: Single Person Head Detection-based Tracking (SPHDT) model and Single Person full-Body Detection-based Tracking (SPBDT) model. For both models, we leverage a deep learning-based single object detector called MobileNetSSD with a metric learning-based re-identification model, DaSiamRPN. We perform the qualitative analysis considering six major environmental factors: pose change, illumination variations, partial occlusion, full occlusion, wall corner, and different viewing angles. Based on the better performance of SPBDT, compared to SPHDT in the experimental results, we select SPBDT model for the robot to track the target. We also use this vision model to provide the relative position, location, distance, and angle of the target person to control the robot’s movement for performing the human-following task.

# 3: Year 2023

SPT: Single Pedestrian Tracking Framework with Re-Identification-Based Learning Using the Siamese Model

Type: Journal Article

Published in:
Sensors

Abstract: Pedestrian tracking is a challenging task in the area of visual object tracking research and it is a vital component of various vision-based applications such as surveillance systems, human following robots, and autonomous vehicles. In this paper, we proposed a single pedestrian tracking (SPT) framework for identifying each instance of a person across all video frames through a trackingby-detection paradigm that combines deep learning and metric learning-based approaches. The SPT framework comprises three main modules: detection, re-identification, and tracking. Our contribution is a significant improvement in the results by designing two compact metric learningbased models using Siamese architecture in the pedestrian re-identification module and combining one of the most robust re-identification models for data associated with the pedestrian detector in the tracking module. We carried out several analyses to evaluate the performance of our SPT framework for single pedestrian tracking in the videos. The results of the re-identification module validate that our two proposed re-identification models surpass existing state-of-the-art models with increased accuracies of 79.2% and 83.9% on the large dataset and 92% and 96% on the small dataset. Moreover, the proposed SPT tracker, along with six state-of-the-art (SOTA) tracking models, has been tested on various indoor and outdoor video sequences. A qualitative analysis considering six major environmental factors verifies the effectiveness of our SPT tracker under illumination changes, appearance variations due to pose changes, changes in target position, and partial occlusions. In addition, quantitative analysis based on experimental results also demonstrates that our proposed SPT tracker outperforms the GOTURN, CSRT, KCF, and SiamFC trackers with a success rate of 79.7% while beating the DiamSiamRPN, SiamFC, CSRT, GOTURN, and SiamMask trackers with an average of 18 tracking frames per second.

# 4: Year 2022

Edge Deployment Framework of GuardBot for Optimized Face Mask Recognition With Real-Time Inference Using Deep Learning

Type: Journal Article

Published in:
IEEE Access

Abstract: Deep learning based models on the edge devices have received considerable attention as a promising means to handle a variety of AI applications. However, deploying the deep learning models in the production environment with efficient inference on the edge devices is still a challenging task due to computation and memory constraints. This paper proposes a framework for the service robot named GuardBot powered by Jetson Xavier NX and presents a real-world case study of deploying the optimized face mask recognition application with real-time inference on the edge device. It assists the robot to detect whether people are wearing a mask to guard against COVID-19 and gives a polite voice reminder to wear the mask. Our framework contains dual-stage architecture based on convolutional neural networks with three main modules that employ (1) MTCNN for face detection, (2) our proposed CNN model and seven transfer learning based custom models which are Inception-v3, VGG16, denseNet121, resNet50, NASNetMobile, XceptionNet, MobileNet-v2 for face mask classification, (3) TensorRT for optimization of all the models to speedup inference on the Jetson Xavier NX. Our study carries out several analysis based on the models’ performance in terms of their frames per second, execution time and images per second. It also evaluates the accuracy, precision, recall & F1-score and makes the comparison of all models before and after optimization with a main focus on high throughput and low latency. Finally, the framework is deployed on a mobile robot to perform experiments in both outdoor and multi-floor indoor environments with patrolling and non-patrolling modes. Compared to other state-of-the-art models, our proposed CNN model for face mask recognition based on the classification obtains 94.5%, 95.9% and 94.28% accuracy on training, validation and testing datasets respectively which is better than MobileNet-v2, Xception and InceptionNet-v3 while it achieves highest throughput and lowest latency than all other models after optimization at different precision levels.

# 5: Year 2022

Qualitative Analysis of Single Object and Multi Object Tracking Models

Type: Conference Article

Published in:
ICCAS - International Conference on Control, Automation & Systems

Abstract: Tracking the object(s) of interest in the real world is one of the most salient research areas that has gained widespread attention due to its applications. Although different approaches based on traditional machine learning and modern deep learning have been proposed to tackle the single and multi-object tracking problems, these tasks are still challenging to perform. In our work, we conduct a comparative analysis of eleven object trackers to determine the most robust single object tracker (SOT) and multi-object tracker (MOT). The main contributions of our work are (1) employing nine pre-trained tracking algorithms to carry out the analysis for SOT that include: SiamMask, GOTURN, BOOSTING, MIL, KCF, TLD, MedianFlow, MOSSE, CSRT; (2) investigating MOT by integrating object detection models with object trackers using YOLOv4 combined with DeepSort, and CenterNet coupled with SORT; (3) creating our own testing videos dataset to perform experiments; (4) performing the qualitative analysis based on the visual representation of results by considering nine significant factors that are appearance and illumination variations, speed, accuracy, scale, partial and full-occlusion, report failure, and fast motion. Experimental results demonstrate that SiamMask tracker overcomes most of the environmental challenges for SOT while YOLOv+DeepSort tracker obtains good performance for MOT. However, these trackers are not robust enough to handle full occlusion in real-world scenarios and there is always a trade-off between tracking accuracy and speed.

# 6: Year 2021

Ontology-Based Knowledge Representation in Robotic Systems: A Survey Oriented toward Applications

Type: Journal Article

Published in:
Applied Sciences

Abstract: Knowledge representation in autonomous robots with social roles has steadily gained importance through their supportive task assistance in domestic, hospital, and industrial activities. For active assistance, these robots must process semantic knowledge to perform the task more efficiently. In this context, ontology-based knowledge representation and reasoning (KR & R) techniques appear as a powerful tool and provide sophisticated domain knowledge for processing complex robotic tasks in a real-world environment. In this article, we surveyed ontology-based semantic representation unified into the current state of robotic knowledge base systems, with our aim being three-fold: (i) to present the recent developments in ontology-based knowledge representation systems that have led to the effective solutions of real-world robotic applications; (ii) to review the selected knowledge-based systems in seven dimensions: application, idea, development tools, architecture, ontology scope, reasoning scope, and limitations; (iii) to pin-down lessons learned from the review of existing knowledge-based systems for designing better solutions and delineating research limitations that might be addressed in future studies. This survey article concludes with a discussion of future research challenges that can serve as a guide to those who are interested in working on the ontology-based semantic knowledge representation systems for autonomous robots.

# 7: Year 2021

Ontology-based Knowledge Representation for Cognitive Robotic Systems: A Review

Type: Conference Article

Published in:
ICROS - Control, Robots, and Systems Society Conference

Abstract: Ontology-based knowledge representation endows autonomous robots with cognitive skills that are required to perform actions in compliance to goals. In this paper, we will review five knowledge base systems that represent the knowledge using ontologies and enable the robots to model the semantic information to perform variety of tasks in domestic, hospital and industrial environments. We also highlight the research gaps by discussing the limitationsthat might be addressed in future and conclude our review with brief discussion. This review is intended to show recent developmentsfor motivating those who are interested to work in this area.

# 8: Year 2021

3D Recognition Based on Sensor Modalities for Robotic Systems: A Survey

Type: Journal Article

Published in:
Sensors

Abstract: 3D visual recognition is a prerequisite for most autonomous robotic systems operating in the real world. It empowers robots to perform a variety of tasks, such as tracking, understanding the environment, and human–robot interaction. Autonomous robots equipped with 3D recognition capability can better perform their social roles through supportive task assistance in professional jobs and effective domestic services. For active assistance, social robots must recognize their surroundings, including objects and places to perform the task more efficiently. This article first highlights the value-centric role of social robots in society by presenting recently developed robots and describes their main features. Instigated by the recognition capability of social robots, we present the analysis of data representation methods based on sensor modalities for 3D object and place recognition using deep learning models. In this direction, we delineate the research gaps that need to be addressed, summarize 3D recognition datasets, and present performance comparisons. Finally, a discussion of future research directions concludes the article. This survey is intended to show how recent developments in 3D visual recognition based on sensor modalities using deep-learning-based approaches can lay the groundwork to inspire further research and serves as a guide to those who are interested in vision-based robotics applications.

# 9: Year 2021

Performance Evaluation of YOLOv3 and YOLOv4 Detectors on Elevator Button Dataset for Mobile Robot

Type: Conference Article

Published in:
ICCAS - International Conference on Control, Automation & Systems

Abstract: The performance evaluation of an AI network model is the important part for building an effective solution before its deployment in real-world on the robot. In our study, we have implemented YOLOv3-tiny and YOLOv4-tiny darknet based frameworks for performance evaluation of the elevator button recognition task and tested both variants on image and video datasets. The objective of our study is two-fold: First, to overcome the limitation of elevator buttons dataset by creating new dataset and increasing its quantity without compromising the quality; Second, to provide a comparative analysis through experimental results and the performance evaluation of both detectors using four machine learning metrics. The purpose of our work is to assist the researchers and developers in decision making of suitable detector selection for deployment in the elevator robot towards button recognition application. The results show that YOLOv4-tiny outperforms YOLOv3-tiny with an overall accuracy of 98.60% compared to 97.91% at 0.5 IoU.

# 10: Year 2020

Autonomous navigation framework for intelligent robots based on a semantic environment modeling

Type: Journal Article

Published in:
Applied Sciences

Abstract: Humans have an innate ability of environment modeling, perception, and planning while simultaneously performing tasks. However, it is still a challenging problem in the study of robotic cognition. We address this issue by proposing a neuro-inspired cognitive navigation framework, which is composed of three major components: semantic modeling framework (SMF), semantic information processing (SIP) module, and semantic autonomous navigation (SAN) module to enable the robot to perform cognitive tasks. The SMF creates an environment database using Triplet Ontological Semantic Model (TOSM) and builds semantic models of the environment. The environment maps from these semantic models are generated in an on-demand database and downloaded in SIP and SAN modules when required to by the robot. The SIP module contains active environment perception components for recognition and localization. It also feeds relevant perception information to behavior planner for safely performing the task. The SAN module uses a behavior planner that is connected with a knowledge base and behavior database for querying during action planning and execution. The main contributions of our work are the development of the TOSM, integration of SMF, SIP, and SAN modules in one single framework, and interaction between these components based on the findings of cognitive science. We deploy our cognitive navigation framework on a mobile robot platform, considering implicit and explicit constraints for autonomous robot navigation in a real-world environment. The robotic experiments demonstrate the validity of our proposed framework.

# 11 Year 2019

Comparison of Object Recognition Approaches using Traditional Machine Vision and Modern Deep Learning Techniques for Mobile Robot

Type: Conference Article

Published in:
ICCAS - International Conference on Control, Automation & Systems

Abstract: In this paper, we consider the problem of object recognition for a mobile robot in an indoor environment using two different vision approaches. Our first approach uses HOG descriptor with SVM classifier as traditional machine vision model while the second approach uses Tiny-YOLOv3 as modern deep learning model. The purpose of this study is to gain intuitive insight of both approaches for understanding the principles behind these techniques through their practical implementation in real world. We train both approaches with our own dataset for doors. The proposed work is assessed through the real-world implementation of both approaches using mobile robot with Zed camera in real world indoor environment and the robustness has been evaluated by comparing and analyzing the experimental results of both models on same dataset.

# 12 Year 2019

A Novel Semantic SLAM Framework for Humanlike High-Level Interaction and Planning in Global Environment

Type: Workshop Article

Published in:
IROS: SDMM19

Abstract: In this paper, we propose a novel semantic SLAM framework based on human cognitive skills and capabilities that endow the robot with high level interaction and planning in real-world dynamic environment. Two-fold strengths of our framework aims at contributing: 1) A semantic map resulting from the integration of SLAM with the Triplet Ontological Semantic Model (TOSM); 2) Human-like robotic perception system that is optimal and biologically plausible for place and object recognition in dynamic environment proposing semantic descriptor and CNN .We demonstrate the effectiveness of our proposed framework using mobile robot with Zed camera (3D sensor) and a laser range finder (2D sensor) in real-world indoor environment. Experimental results demonstrate the practical merit of our proposed framework.

Google Scholar: ‪Sumaira Manzoor‬ - ‪Google Scholar‬
ORCID: Sumaira Manzoor (0000-0001-5512-7824) - ORCID

Return to Top

Page updated

Google Sites

Report abuse