10 research outputs found

    Advancing a Machine's Visual Awareness of People

    Get PDF
    Methods to advance a machine's visual awareness of people with a focus on understanding 'who is where' in video are presented. 'Who' is used in a broad sense that includes not only the identity of a person but attributes of that person as well. Efforts are focused on improving algorithms in four areas of visual recognition: detection, tracking, fine-grained classification and person reidentification. Each of these problems appear to be quite different on the surface; however, there are two broader questions that are answered across each of the works. The first, the machine is able to make better predictions when it has access to the extra information that is available in video. The second, that it is possible to learn on-the-fly from single examples. How each work contributes to answering these over-arching questions as well as its specific contributions to the relevant problem domain are as follows: The first problem studied is one-shot, real-time, instance detection. Given a single image of a person, the task for the machine is to learn a detector that is specific to that individual rather than to an entire category such as faces or pedestrians. In subsequent images, the individual detector indicates the size and location of that particular person in the image. The learning must be done in real-time. To solve this problem, the proposed method starts with a pre-trained boosted category detector from which an individual-object detector is trained, with near-zero computational cost, through elementary manipulations of the thresholds of the category detector. Experiments on two challenging pedestrian and face datasets indicate that it is indeed possible to learn identity classifiers in real-time; besides being faster-trained, the proposed classifier has better detection rates than previous methods. The second problem studied is real-time tracking. Given the initial location of a target person, the task for the machine is to determine the size and location of the target person in subsequent video frames, in real-time. The method proposed for solving this problem treats tracking as a repeated detection problem where potential targets are identified with a pre-trained boosted person detector and identity across frames is established by individual-specific detectors. The individual-specific detectors are learnt using the method proposed to solve the first problem. The proposed algorithm runs in real-time and is robust to drift. The tracking algorithm is benchmarked against nine state-of-the-art trackers on two benchmark datasets. Results show that the proposed method is 10% more accurate and nearly as fast as the fastest of the competing algorithms, and it is as accurate but 20 times faster than the most accurate of the competing algorithms. The third problem studied is the fine-grained classification of people. Given an image of a person, the task for the machine is to estimate characteristics of that person such as age, clothing style, sex, occupation, social status, ethnicity, emotional state and/or body type. Since fine-grained classification using the entire human body is a relatively unexplored area, a large video dataset was collected. To solve this problem, a method that uses deep neural networks and video of a person is proposed. Results show that the class average accuracy when combining information from a sequence of images of an individual and then predicting the label is 3.5-7.1% better than independently predicting the label of each image, when severely under-represented classes are ignored. The final problem studied is person reidentification. Given an image of a person, the task for the machine is to find images that match the identity of that person from a large set of candidate images. This is a challenging task since images of the same individual can vary significantly due to changes in clothing, viewpoint, pose, lighting and background. The method proposed for solving this problem is a two-stage deep neural network architecture that uses body part patches as inputs rather than an entire image of a person. Experiments show that rank-1 matching rates increase by 22-25.6% on benchmark datasets when compared to state-of-the-art methods.</p

    Object Tracking Based on Satellite Videos: A Literature Review

    Get PDF
    Video satellites have recently become an attractive method of Earth observation, providing consecutive images of the Earth’s surface for continuous monitoring of specific events. The development of on-board optical and communication systems has enabled the various applications of satellite image sequences. However, satellite video-based target tracking is a challenging research topic in remote sensing due to its relatively low spatial and temporal resolution. Thus, this survey systematically investigates current satellite video-based tracking approaches and benchmark datasets, focusing on five typical tracking applications: traffic target tracking, ship tracking, typhoon tracking, fire tracking, and ice motion tracking. The essential aspects of each tracking target are summarized, such as the tracking architecture, the fundamental characteristics, primary motivations, and contributions. Furthermore, popular visual tracking benchmarks and their respective properties are discussed. Finally, a revised multi-level dataset based on WPAFB videos is generated and quantitatively evaluated for future development in the satellite video-based tracking area. In addition, 54.3% of the tracklets with lower Difficulty Score (DS) are selected and renamed as the Easy group, while 27.2% and 18.5% of the tracklets are grouped into the Medium-DS group and the Hard-DS group, respectively

    Novel Aggregated Solutions for Robust Visual Tracking in Traffic Scenarios

    Get PDF
    This work proposes novel approaches for object tracking in challenging scenarios like severe occlusion, deteriorated vision and long range multi-object reidentification. All these solutions are only based on image sequence captured by a monocular camera and do not require additional sensors. Experiments on standard benchmarks demonstrate an improved state-of-the-art performance of these approaches. Since all the presented approaches are smartly designed, they can run at a real-time speed

    MIFTel: a multimodal interactive framework based on temporal logic rules

    Get PDF
    Human-computer and multimodal interaction are increasingly used in everyday life. Machines are able to get more from the surrounding world, assisting humans in different application areas. In this context, the correct processing and management of signals provided by the environments is determinant for structuring the data. Different sources and acquisition times can be exploited for improving recognition results. On the basis of these assumptions, we are proposing a multimodal system that exploits Allen’s temporal logic combined with a prevision method. The main object is to correlate user’s events with system’s reactions. After post-elaborating coming data from different signal sources (RGB images, depth maps, sounds, proximity sensors, etc.), the system is managing the correlations between recognition/detection results and events in real-time to create an interactive environment for the user. For increasing the recognition reliability, a predictive model is also associated with the proposed method. The modularity of the system grants a full dynamic development and upgrade with custom modules. Finally, a comparison with other similar systems is shown, underlining the high flexibility and robustness of the proposed event management method

    Truck Trailer Classification Using Side-Fire Light Detection And Ranging (LiDAR) Data

    Get PDF
    Classification of vehicles into distinct groups is critical for many applications, including freight and commodity flow modeling, pavement management and design, tolling, air quality monitoring, and intelligent transportation systems. The Federal Highway Administration (FHWA) developed a standardized 13-category vehicle classification ruleset, which meets the needs of many traffic data user applications. However, some applications need high-resolution data for modeling and analysis. For example, the type of commodity being carried must be known in the freight modeling framework. Unfortunately, this information is not available at the state or metropolitan level, or it is expensive to obtain from current resources. Nevertheless, using current emerging technologies such as Light Detection and Ranging (LiDAR) data, it may be possible to predict commodity type from truck body types or trailers. For example, refrigerated trailers are commonly used to transport perishable produce and meat products, tank trailers are for fuel and other liquid products, and specialized trailers carry livestock. The main goal of this research is to develop methods using side-fired LiDAR data to distinguish between specific types of truck trailers beyond what is generally possible with traditional vehicle classification sensors (e.g., piezoelectric sensors and inductive loop detectors). A multi-array LiDAR sensor enables the construction of 3D-profiles of vehicles since it measures the distance to the object reflecting its emitted light. In this research 16-beam LiDAR sensor data are processed to estimate vehicle speed and extract useful information and features to classify semi-trailer trucks hauling ten different types of trailers: a reefer and non-reefer dry van, 20 ft and 40 ft intermodal containers, a 40 ft reefer intermodal container, platforms, tanks, car transporters, open-top van/dump and aggregated other types (i.e., livestock, logging, etc.). In addition to truck-trailer classification, methods are developed to detect empty and loaded platform semi-trailers. K-Nearest Neighbors (KNN), Multilayer Perceptron (MLP), Adaptive Boosting (AdaBoost), and Support Vector Machines (SVM) supervised machine learning algorithms are implemented on the field data collected on a freeway segment that includes over seven-thousand trucks. The results show that different trailer body types and empty and loaded platform semi-trailers can be classified with a very high level of accuracy ranging from 85% to 98% and 99%, respectively. To enhance the accuracy by which multiple LiDAR frames belonging to the same truck are merged, a new algorithm is developed to estimate the speed while the truck is within the field of view of the sensor. This algorithm is based on tracking tires and utilizes line detection concepts from image processing. The proposed algorithm improves the results and allows creating more accurate 2D and 3D truck profiles as documented in this thesis

    Deep learning-based signal processing approaches for improved tracking of human health and behaviour with wearable sensors

    Get PDF
    This thesis explores two lines of research in the context of sequential data and machine learning in the remote environment, i.e., outside the lab setting - using data acquired from wearable devices. Firstly, we explore Generative Adversarial Networks (GANs) as a reliable tool for time series generation, imputation and forecasting. Secondly, we investigate the applicability of novel deep learning frameworks to sequential data processing and their advantages over traditional methods. More specifically, we use our models to unlock additional insights and biomarkers in human-centric datasets. Our first research avenue concerns the generation of sequential physiological data. Access to physiological data, particularly medical data, has become heavily regulated in recent years, which has presented bottlenecks in developing computational models to assist in diagnosing and treating patients. Therefore, we explore GAN models to generate medical time series data that adhere to privacy-preserving regulations. We present our novel methods of generating and imputing synthetic, multichannel sequential medical data while complying with privacy regulations. Addressing these concerns allows for sharing and disseminating medical data and, in turn, developing clinical research in the relevant fields. Secondly, we explore novel deep learning technologies applied to human-centric sequential data to unlock further insights while addressing the idea of environmentally sustainable AI. We develop novel deep learning processing methods to estimate human activity and heart rate through convolutional networks. We also introduce our ‘time series-to-time series GAN’, which maps photoplethysmograph data to blood pressure measurements. Importantly, we denoise artefact-laden biosignal data to a competitive standard using a custom objective function and novel application of GANs. These deep learning methods help to produce nuanced biomarkers and state-of-the-art insights from human physiological data. The work laid out in this thesis provides a foundation for state-of-the-art deep learning methods for sequential data processing while keeping a keen eye on sustain- able AI

    A Non-Intrusive Multi-Sensor RGB-D System for Preschool Classroom Behavior Analysis

    Get PDF
    University of Minnesota Ph.D. dissertation. May 2017. Major: Computer Science. Advisor: Nikolaos Papanikolopoulos. 1 computer file (PDF); vii, 121 pages + 2 mp4 video filesMental health disorders are a leading cause of disability in North America and can represent a significant source of financial burden. Early intervention is a key aspect in treating mental disorders as it can dramatically increase the probability of a positive outcome. One key factor to early intervention is the knowledge of risk-markers -- genetic, neural, behavioral and/or social deviations -- that indicate the development of a particular mental disorder. Once these risk-markers are known, it is important to have tools for reliable identification of these risk-markers. For visually observable risk-markers, discovery and screening ideally should occur in a natural environment. However, this often incurs a high cost. Current advances in technology allow for the development of assistive systems that could aid in the detection and screening of visually observable risk-markers in every-day environments, like a preschool classroom. This dissertation covers the development of such a system. The system consists of a series of networked sensors that are able to collect data from a wide baseline. These sensors generate color images and depth maps that can be used to create a 3D point cloud reconstruction of the classroom. The wide baseline nature of the setup helps to minimize the effects of occlusion, since data is captured from multiple distinct perspectives. These point clouds are used to detect occupants in the room and track them throughout their activities. This tracking information is then used to analyze classroom and individual behaviors, enabling the screening for specific risk-markers and also the ability to create a corpus of data that could be used to discover new risk-markers. This system has been installed at the Shirley G. Moore Lab school, a research preschool classroom in the Institute of Child Development at the University of Minnesota. Recordings have been taken and analyzed from actual classes. No instruction or pre-conditioning was given to the instructors or the children in these classes. Portions of this data have also been manually annotated to create groundtruth data that was used to validate the efficacy of the proposed system

    Representation in Cognitive Science

    Get PDF
    How can we think about things in the outside world? There is still no widely accepted theory of how mental representations get their meaning. In light of pioneering research, Nicholas Shea develops a naturalistic account of the nature of mental representation with a firm focus on the subpersonal representations that pervade the cognitive sciences

    Representation in Cognitive Science

    Get PDF
    "Our thoughts are meaningful. We think about things in the outside world; how can that be so? This is one of the deepest questions in contemporary philosophy. Ever since the 'cognitive revolution', states with meaning-mental representations-have been the key explanatory construct of the cognitive sciences. But there is still no widely accepted theory of how mental representations get their meaning. Powerful new methods in cognitive neuroscience can now reveal information processing in the brain in unprecedented detail. They show how the brain performs complex calculations on neural representations. Drawing on this cutting-edge research, Nicholas Shea uses a series of case studies from the cognitive sciences to develop a naturalistic account of the nature of mental representation. His approach is distinctive in focusing firmly on the 'subpersonal' representations that pervade so much of cognitive science. The diversity and depth of the case studies, illustrated by numerous figures, make this book unlike any previous treatment. It is important reading for philosophers of psychology and philosophers of mind, and of considerable interest to researchers throughout the cognitive sciences.

    Representation in Cognitive Science

    Get PDF
    "Our thoughts are meaningful. We think about things in the outside world; how can that be so? This is one of the deepest questions in contemporary philosophy. Ever since the 'cognitive revolution', states with meaning-mental representations-have been the key explanatory construct of the cognitive sciences. But there is still no widely accepted theory of how mental representations get their meaning. Powerful new methods in cognitive neuroscience can now reveal information processing in the brain in unprecedented detail. They show how the brain performs complex calculations on neural representations. Drawing on this cutting-edge research, Nicholas Shea uses a series of case studies from the cognitive sciences to develop a naturalistic account of the nature of mental representation. His approach is distinctive in focusing firmly on the 'subpersonal' representations that pervade so much of cognitive science. The diversity and depth of the case studies, illustrated by numerous figures, make this book unlike any previous treatment. It is important reading for philosophers of psychology and philosophers of mind, and of considerable interest to researchers throughout the cognitive sciences.
    corecore