Search CORE

4 research outputs found

Learning and Searching Methods for Robust, Real-Time Visual Odometry.

Author: Richardson Andrew Ross
Publication venue
Publication date: 01/01/2015
Field of study

Accurate position estimation provides a critical foundation for mobile robot perception and control. While well-studied, it remains difficult to provide timely, precise, and robust position estimates for applications that operate in uncontrolled environments, such as robotic exploration and autonomous driving. Continuous, high-rate egomotion estimation is possible using cameras and Visual Odometry (VO), which tracks the movement of sparse scene content known as image keypoints or features. However, high update rates, often 30~Hz or greater, leave little computation time per frame, while variability in scene content stresses robustness. Due to these challenges, implementing an accurate and robust visual odometry system remains difficult. This thesis investigates fundamental improvements throughout all stages of a visual odometry system, and has three primary contributions: The first contribution is a machine learning method for feature detector design. This method considers end-to-end motion estimation accuracy during learning. Consequently, accuracy and robustness are improved across multiple challenging datasets in comparison to state of the art alternatives. The second contribution is a proposed feature descriptor, TailoredBRIEF, that builds upon recent advances in the field in fast, low-memory descriptor extraction and matching. TailoredBRIEF is an in-situ descriptor learning method that improves feature matching accuracy by efficiently customizing descriptor structures on a per-feature basis. Further, a common asymmetry in vision system design between reference and query images is described and exploited, enabling approaches that would otherwise exceed runtime constraints. The final contribution is a new algorithm for visual motion estimation: Perspective Alignment Search~(PAS). Many vision systems depend on the unique appearance of features during matching, despite a large quantity of non-unique features in otherwise barren environments. A search-based method, PAS, is proposed to employ features that lack unique appearance through descriptorless matching. This method simplifies visual odometry pipelines, defining one method that subsumes feature matching, outlier rejection, and motion estimation. Throughout this work, evaluations of the proposed methods and systems are carried out on ground-truth datasets, often generated with custom experimental platforms in challenging environments. Particular focus is placed on preserving runtimes compatible with real-time operation, as is necessary for deployment in the field.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113365/1/chardson_1.pd

Deep Blue Documents at the University of Michigan

From Feature Detection in Truncated Signed Distance Fields to Sparse Stable Scene Graphs

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Computational Foundations for Safe and Efficient Human-Robot Collaboration in Assembly Cells

Author: Morato Carlos W
Publication venue
Publication date: 01/01/2016
Field of study

Human and robots have complementary strengths in performing assembly operations. Humans are very good at perception tasks in unstructured environments. They are able to recognize and locate a part from a box of miscellaneous parts. They are also very good at complex manipulation in tight spaces. The sensory characteristics of the humans, motor abilities, knowledge and skills give the humans the ability to react to unexpected situations and resolve problems quickly. In contrast, robots are very good at pick and place operations and highly repeatable in placement tasks. Robots can perform tasks at high speeds and still maintain precision in their operations. Robots can also operate for long periods of times. Robots are also very good at applying high forces and torques. Typically, robots are used in mass production. Small batch and custom production operations predominantly use manual labor. The high labor cost is making it difficult for small and medium manufacturers to remain cost competitive in high wage markets. These manufactures are mainly involved in small batch and custom production. They need to find a way to reduce the labor cost in assembly operations. Purely robotic cells will not be able to provide them the necessary flexibility. Creating hybrid cells where humans and robots can collaborate in close physical proximities is a potential solution. The underlying idea behind such cells is to decompose assembly operations into tasks such that humans and robots can collaborate by performing sub-tasks that are suitable for them. Realizing hybrid cells that enable effective human and robot collaboration is challenging. This dissertation addresses the following three computational issues involved in developing and utilizing hybrid assembly cells: - We should be able to automatically generate plans to operate hybrid assembly cells to ensure efficient cell operation. This requires generating feasible assembly sequences and instructions for robots and human operators, respectively. Automated planning poses the following two challenges. First, generating operation plans for complex assemblies is challenging. The complexity can come due to the combinatorial explosion caused by the size of the assembly or the complex paths needed to perform the assembly. Second, generating feasible plans requires accounting for robot and human motion constraints. The first objective of the dissertation is to develop the underlying computational foundations for automatically generating plans for the operation of hybrid cells. It addresses both assembly complexity and motion constraints issues. - The collaboration between humans and robots in the assembly cell will only be practical if human safety can be ensured during the assembly tasks that require collaboration between humans and robots. The second objective of the dissertation is to evaluate different options for real-time monitoring of the state of human operator with respect to the robot and develop strategies for taking appropriate measures to ensure human safety when the planned move by the robot may compromise the safety of the human operator. In order to be competitive in the market, the developed solution will have to include considerations about cost without significantly compromising quality. - In the envisioned hybrid cell, we will be relying on human operators to bring the part into the cell. If the human operator makes an error in selecting the part or fails to place it correctly, the robot will be unable to correctly perform the task assigned to it. If the error goes undetected, it can lead to a defective product and inefficiencies in the cell operation. The reason for human error can be either confusion due to poor quality instructions or human operator not paying adequate attention to the instructions. In order to ensure smooth and error-free operation of the cell, we will need to monitor the state of the assembly operations in the cell. The third objective of the dissertation is to identify and track parts in the cell and automatically generate instructions for taking corrective actions if a human operator deviates from the selected plan. Potential corrective actions may involve re-planning if it is possible to continue assembly from the current state. Corrective actions may also involve issuing warning and generating instructions to undo the current task

Digital Repository at the University of Maryland

Learning effective binary representation with deep hashing technique for large-scale multimedia similarity search

Author: Wu Gengshen
Publication venue: Lancaster University
Publication date: 01/01/2020
Field of study

The explosive growth of multimedia data in modern times inspires the research of performing an efficient large-scale multimedia similarity search in the existing information retrieval systems. In the past decades, the hashing-based nearest neighbor search methods draw extensive attention in this research field. By representing the original data with compact hash code, it enables the efficient similarity retrieval by only conducting bitwise operation when computing the Hamming distance. Moreover, less memory space is required to process and store the massive amounts of features for the search engines owing to the nature of compact binary code. These advantages make hashing a competitive option in large-scale visual-related retrieval tasks. Motivated by the previous dedicated works, this thesis focuses on learning compact binary representation via hashing techniques for the large-scale multimedia similarity search tasks. Particularly, several novel frameworks are proposed for popular hashing-based applications like a local binary descriptor for patch-level matching (Chapter 3), video-to-video retrieval (Chapter 4) and cross-modality retrieval (Chapter 5). This thesis starts by addressing the problem of learning local binary descriptor for better patch/image matching performance. To this end, we propose a novel local descriptor termed Unsupervised Deep Binary Descriptor (UDBD) for the patch-level matching tasks, which learns the transformation invariant binary descriptor via embedding the original visual data and their transformed sets into a common Hamming space. By imposing a l2,1-norm regularizer on the objective function, the learned binary descriptor gains robustness against noises. Moreover, a weak bit scheme is applied to address the ambiguous matching in the local binary descriptor, where the best match is determined for each query by comparing a series of weak bits between the query instance and the candidates, thus improving the matching performance. Furthermore, Unsupervised Deep Video Hashing (UDVH) is proposed to facilitate large-scale video-to-video retrieval. To tackle the imbalanced distribution issue in the video feature, balanced rotation is developed to identify a proper projection matrix such that the information of each dimension can be balanced in the fixed-bit quantization, thus improving the retrieval performance dramatically with better code quality. To provide comprehensive insights on the proposed rotation, two different video feature learning structures: stacked LSTM units (UDVH-LSTM) and Temporal Segment Network (UDVH-TSN) are presented in Chapter 4. Lastly, we extend the research topic from single-modality to cross-modality retrieval, where Self-Supervised Deep Multimodal Hashing (SSDMH) based on matrix factorization is proposed to learn unified binary code for different modalities directly without the need for relaxation. By minimizing graph regularization loss, it is prone to produce discriminative hash code via preserving the original data structure. Moreover, Binary Gradient Descent (BGD) accelerates the discrete optimization against the bit-by-bit fashion. Besides, an unsupervised version termed Unsupervised Deep Cross-Modal Hashing (UDCMH) is proposed to tackle the large-scale cross-modality retrieval when prior knowledge is unavailable

Lancaster E-Prints