Search CORE

1,634 research outputs found

Single and multiple stereo view navigation for planetary rovers

Author: Bartolome D R
Publication venue
Publication date: 08/10/2013
Field of study

© Cranfield UniversityThis thesis deals with the challenge of autonomous navigation of the ExoMars rover. The absence of global positioning systems (GPS) in space, added to the limitations of wheel odometry makes autonomous navigation based on these two techniques - as done in the literature - an inviable solution and necessitates the use of other approaches. That, among other reasons, motivates this work to use solely visual data to solve the robot’s Egomotion problem. The homogeneity of Mars’ terrain makes the robustness of the low level image processing technique a critical requirement. In the first part of the thesis, novel solutions are presented to tackle this specific problem. Detection of robust features against illumination changes and unique matching and association of features is a sought after capability. A solution for robustness of features against illumination variation is proposed combining Harris corner detection together with moment image representation. Whereas the first provides a technique for efficient feature detection, the moment images add the necessary brightness invariance. Moreover, a bucketing strategy is used to guarantee that features are homogeneously distributed within the images. Then, the addition of local feature descriptors guarantees the unique identification of image cues. In the second part, reliable and precise motion estimation for the Mars’s robot is studied. A number of successful approaches are thoroughly analysed. Visual Simultaneous Localisation And Mapping (VSLAM) is investigated, proposing enhancements and integrating it with the robust feature methodology. Then, linear and nonlinear optimisation techniques are explored. Alternative photogrammetry reprojection concepts are tested. Lastly, data fusion techniques are proposed to deal with the integration of multiple stereo view data. Our robust visual scheme allows good feature repeatability. Because of this, dimensionality reduction of the feature data can be used without compromising the overall performance of the proposed solutions for motion estimation. Also, the developed Egomotion techniques have been extensively validated using both simulated and real data collected at ESA-ESTEC facilities. Multiple stereo view solutions for robot motion estimation are introduced, presenting interesting benefits. The obtained results prove the innovative methods presented here to be accurate and reliable approaches capable to solve the Egomotion problem in a Mars environment

Efficient Human Facial Pose Estimation

Author: Schimmel James C
Publication venue: RIT Scholar Works
Publication date: 01/01/2004
Field of study

Pose estimation has become an increasingly important area in computer vision and more specifically in human facial recognition and activity recognition for surveillance applications. Pose estimation is a process by which the rotation, pitch, or yaw of a human head is determined. Numerous methods already exist which can determine the angular change of a face, however, these methods vary in accuracy and their computational requirements tend to be too high for real-time applications. The objective of this thesis is to develop a method for pose estimation, which is computationally efficient, while still maintaining a reasonable degree of accuracy. In this thesis, a feature-based method is presented to determine the yaw angle of a human facial pose using a combination of artificial neural networks and template matching. The artificial neural networks are used for the feature detection portion of the algorithm along with skin detection and other image enhancement algorithms. The first head model, referred to as the Frontal Position Model, determines the pose of the face using two eyes and the mouth. The second model, referred to as the Side Position Model, is used when only one eye can be viewed and determines pose based on a single eye, the nose tip, and the mouth. The two models are presented to demonstrate the position change of facial features due to pose and to provide the means to determine the pose as these features change from the frontal position. The effectiveness of this pose estimation method is examined by looking at both the manual and automatic feature detection methods. Analysis is further performed on how errors in feature detection affect the resulting pose determination. The method resulted in the detection of facial pose from 30 to -30 degrees with an average error of 4.28 degrees for the Frontal Position Model and 5.79 degrees for the Side Position Model with correct feature detection. The Intel(R) Streaming SIMD Extensions (SSE) technology was employed to enhance the performance of floating point operations. The neural networks used in the feature detection process require a large amount of floating point calculations, due to the computation of the image data with weights and biases. With SSE optimization the algorithm becomes suitable for processing images in a real-time environment. The method is capable of determining features and estimating the pose at a rate of seven frames per second on a 1.8 GHz Pentium 4 computer

RIT Scholar Works

A Comprehensive Review of YOLO: From YOLOv1 and Beyond

Author: Cordova-Esparza Diana
Terven Juan
Publication venue
Publication date: 19/05/2023
Field of study

YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications. We present a comprehensive analysis of YOLO's evolution, examining the innovations and contributions in each iteration from the original YOLO to YOLOv8 and YOLO-NAS. We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model. Finally, we summarize the essential lessons from YOLO's development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems.Comment: 31 pages, 15 figures, 4 tables, submitted to ACM Computing Surveys This version includes YOLO-NAS and a more detailed description of YOLOv5 and YOLOv8. It also adds three new diagrams for the architectures of YOLOv5, YOLOv8, and YOLO-NA

arXiv.org e-Print Archive

Joint optimization of manifold learning and sparse representations for face and gesture analysis

Author: Ptucha Raymond
Publication venue: RIT Scholar Works
Publication date: 16/04/2013
Field of study

Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras

RIT Scholar Works