878 research outputs found

    Unsupervised Feature Learning by Autoencoder and Prototypical Contrastive Learning for Hyperspectral Classification

    Full text link
    Unsupervised learning methods for feature extraction are becoming more and more popular. We combine the popular contrastive learning method (prototypical contrastive learning) and the classic representation learning method (autoencoder) to design an unsupervised feature learning network for hyperspectral classification. Experiments have proved that our two proposed autoencoder networks have good feature learning capabilities by themselves, and the contrastive learning network we designed can better combine the features of the two to learn more representative features. As a result, our method surpasses other comparison methods in the hyperspectral classification experiments, including some supervised methods. Moreover, our method maintains a fast feature extraction speed than baseline methods. In addition, our method reduces the requirements for huge computing resources, separates feature extraction and contrastive learning, and allows more researchers to conduct research and experiments on unsupervised contrastive learning

    LiDAR and Camera Detection Fusion in a Real Time Industrial Multi-Sensor Collision Avoidance System

    Full text link
    Collision avoidance is a critical task in many applications, such as ADAS (advanced driver-assistance systems), industrial automation and robotics. In an industrial automation setting, certain areas should be off limits to an automated vehicle for protection of people and high-valued assets. These areas can be quarantined by mapping (e.g., GPS) or via beacons that delineate a no-entry area. We propose a delineation method where the industrial vehicle utilizes a LiDAR {(Light Detection and Ranging)} and a single color camera to detect passive beacons and model-predictive control to stop the vehicle from entering a restricted space. The beacons are standard orange traffic cones with a highly reflective vertical pole attached. The LiDAR can readily detect these beacons, but suffers from false positives due to other reflective surfaces such as worker safety vests. Herein, we put forth a method for reducing false positive detection from the LiDAR by projecting the beacons in the camera imagery via a deep learning method and validating the detection using a neural network-learned projection from the camera to the LiDAR space. Experimental data collected at Mississippi State University's Center for Advanced Vehicular Systems (CAVS) shows the effectiveness of the proposed system in keeping the true detection while mitigating false positives.Comment: 34 page

    Representation learning for minority and subtle activities in a smart home environment

    Get PDF
    Daily human activity recognition using sensor data can be a fundamental task for many real-world applications, such as home monitoring and assisted living. One of the challenges in human activity recognition is to distinguish activities that have infrequent occurrence and less distinctive patterns. We propose a dissimilarity representation-based hierarchical classifier to perform two-phase learning. In the first phase, the classifier learns general features to recognise majority classes, and the second phase is to collect minority and subtle classes to identify fine difference between them. We compare our approach with a collection of state-of-the-art classification techniques on a real-world third-party dataset that is collected in a two-user home setting. Our results demonstrate that our hierarchical classifier approach outperforms the existing techniques in distinguishing users in performing the same type of activities. The key novelty of our approach is the exploration of dissimilarity representations and hierarchical classifiers, which allows us to highlight the difference between activities with subtle difference, and thus allows the identification of well-discriminating features.Postprin

    Perceptual data mining : bootstrapping visual intelligence from tracking behavior

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 161-166).One common characteristic of all intelligent life is continuous perceptual input. A decade ago, simply recording and storing a a few minutes of full frame-rate NTSC video required special hardware. Today, an inexpensive personal computer can process video in real-time tracking and recording information about multiple objects for extended periods of time, which fundamentally enables this research. This thesis is about Perceptual Data Mining (PDM), the primary goal of which is to create a real-time, autonomous perception system that can be introduced into a wide variety of environments and, through experience, learn to model the activity in that environment. The PDM framework infers as much as possible about the presence, type, identity, location, appearance, and activity of each active object in an environment from multiple video sources, without explicit supervision. PDM is a bottom-up, data-driven approach that is built on a novel, robust attention mechanism that reliably detects moving objects in a wide variety of environments. A correspondence system tracks objects through time and across multiple sensors producing sets of observations of objects that correspond to the same object in extended environments. Using a co-occurrence modeling technique that exploits the variation exhibited by objects as they move through the environment, the types of objects, the activities that objects perform, and the appearance of specific classes of objects are modeled. Different applications of this technique are demonstrated along with a discussion of the corresponding issues.(cont.) Given the resulting rich description of the active objects in the environment, it is possible to model temporal patterns. An effective method for modeling periodic cycles of activity is demonstrated in multiple environments. This framework can learn to concisely describe regularities of the activity in an environment as well as determine atypical observations. Though this is accomplished without any supervision, the introduction of a minimal amount of user interaction could be used to produce complex, task-specific perception systems.by Christopher P. Stauffer.Ph.D

    Attention Mechanism for Recognition in Computer Vision

    Get PDF
    It has been proven that humans do not focus their attention on an entire scene at once when they perform a recognition task. Instead, they pay attention to the most important parts of the scene to extract the most discriminative information. Inspired by this observation, in this dissertation, the importance of attention mechanism in recognition tasks in computer vision is studied by designing novel attention-based models. In specific, four scenarios are investigated that represent the most important aspects of attention mechanism.First, an attention-based model is designed to reduce the visual features\u27 dimensionality by selectively processing only a small subset of the data. We study this aspect of the attention mechanism in a framework based on object recognition in distributed camera networks. Second, an attention-based image retrieval system (i.e., person re-identification) is proposed which learns to focus on the most discriminative regions of the person\u27s image and process those regions with higher computation power using a deep convolutional neural network. Furthermore, we show how visualizing the attention maps can make deep neural networks more interpretable. In other words, by visualizing the attention maps we can observe the regions of the input image where the neural network relies on, in order to make a decision. Third, a model for estimating the importance of the objects in a scene based on a given task is proposed. More specifically, the proposed model estimates the importance of the road users that a driver (or an autonomous vehicle) should pay attention to in a driving scenario in order to have safe navigation. In this scenario, the attention estimation is the final output of the model. Fourth, an attention-based module and a new loss function in a meta-learning based few-shot learning system is proposed in order to incorporate the context of the task into the feature representations of the samples and increasing the few-shot recognition accuracy.In this dissertation, we showed that attention can be multi-facet and studied the attention mechanism from the perspectives of feature selection, reducing the computational cost, interpretable deep learning models, task-driven importance estimation, and context incorporation. Through the study of four scenarios, we further advanced the field of where \u27\u27attention is all you need\u27\u27

    Facial Expression Recognition in the Wild Using Convolutional Neural Networks

    Get PDF
    Facial Expression Recognition (FER) is the task of predicting a specific facial expression given a facial image. FER has demonstrated remarkable progress due to the advancement of deep learning. Generally, a FER system as a prediction model is built using two sub-modules: 1. Facial image representation model that learns a mapping from the input 2D facial image to a compact feature representation in the embedding space, and 2. A classifier module that maps the learned features to the label space comprising seven labels of neutral, happy, sad, surprise, anger, fear, or disgust. Ultimately, the prediction model aims to predict one of the seven aforementioned labels for the given input image. This process is carried out using a supervised learning algorithm where the model minimizes an objective function that measures the error between the prediction and true label by searching for the best mapping function. Our work is inspired by Deep Metric Learning (DML) approaches to learn an efficient embedding space for the classifier module. DML fundamentally aims to achieve maximal separation in the embedding space by creating compact and well-separated clusters with the capability of feature discrimination. However, conventional DML methods ignore the underlying challenges associated with wild FER datasets, where images exhibit large intra-class variation and inter-class similarity. First, we tackle the extreme class imbalance that leads to a separation bias toward facial expression classes populated with more data (e.g., happy and neutral) against minority classes (e.g., disgust and fear). To eliminate this bias, we propose a discriminant objective function to optimize the embedding space to enforce inter-class separation of features for both majority and minority classes. Second, we design an adaptive mechanism to selectively discriminate features in the embedding space to promote generalization to yield a prediction model that classifies unseen images more accurately. We are inspired by the human visual attention model described as the perception of the most salient visual cues in the observed scene. Accordingly, our attentive mechanism adaptively selects important features to discriminate in the DML\u27s objective function. We conduct experiments on two popular large-scale wild FER datasets (RAF-DB and AffectNet) to show the enhanced discriminative power of our proposed methods compared with several state-of-the-art FER methods

    Techniques for data pattern selection and abstraction

    Get PDF
    This thesis concerns the problem of prototype reduction in instance-based learning. In order to deal with problems such as storage requirements, sensitivity to noise and computational complexity, various algorithms have been presented that condense the number of stored prototypes, while maintaining competent classification accuracy. Instance selection, which recovers a smaller subset of the original training set, is the most widely used technique for instance reduction. But, prototype abstraction that generates new prototypes to replace the initial ones has also gained a lot of interest recently. The major contribution of this work is the proposal of four novel frameworks for performing prototype reduction, the Class Boundary Preserving algorithm (CBP), a hybrid method that uses both selection and generation of prototypes, Instance Seriation for Prototype Abstraction (ISPA), which is an abstraction algorithm, and two selective techniques, Spectral Instance Reduction (SIR) and Direct Weight Optimization (DWO). CBP is a multi-stage method based on a simple heuristic that is very effective in identifying samples close to class borders. Using a noise filter harmful instances are removed, while the powerful heuristic determines the geometrical distribution of patterns around every instance. Together with the concepts of nearest enemy pairs and mean shift clustering this algorithm decides on the final set of retained prototypes. DWO is a selection model whose output set of prototypes is decided by a set of binary weights. These weights are computed according to an objective function composed of the ratio between the nearest friend and nearest enemy of every sample. In order to obtain good quality results DWO is optimized using a genetic algorithm. ISPA is an abstraction technique that employs the concept of data seriation to organize instances in an arrangement that favours merging between them. As a result, a new set of prototypes is created. Results show that CBP, SIR and DWO, the three major algorithms presented in this thesis, are competent and efficient in terms of at least one of the two basic objectives, classification accuracy and condensation ratio. The comparison against other successful condensation algorithms illustrates the competitiveness of the proposed models. The SIR algorithm presents a set of border discriminating features (BDFs) that depicts the local distribution of friends and enemies of all samples. These are then used along with spectral graph theory to partition the training set in to border and internal instances

    Wearable and automotive systems for affect recognition from physiology

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 152-158).Novel systems and algorithms have been designed and built to recognize affective patterns in physiological signals. Experiments were conducted for evaluation of the new systems and algorithms in three types of settings: a highly constrained laboratory setting, a largely unconstrained ambulatory environment, and a less unconstrained automotive environment. The laboratory experiment was designed to test for the presence of unique physiological patterns in each of eight different emotions given a relatively motionless seated subject, intentionally feeling and expressing these states. This experiment generated a large dataset of physiological signals containing many day-to-day variations, and the proposed features contributed to a success rate of 81% for discriminating all eight emotions and rates of up to 100% for subsets of emotion based on similar emotion qualities. New wearable computer systems and sensors were developed and tested on subjects who walked, jogged, talked, and otherwise went about daily activities. Although in the unconstrained ambulatory setting, physical motion often overwhelmed affective signals, the systems developed in this thesis are currently useful as activity monitors, providing an image diary correlated with physiological signals. Automotive systems were used to detect physiological stress during the natural but physically driving task. This generated a large database of physiological signals covering over 36 hours of driving. Algorithms for detecting driver stress achieved a recognition rates of 96% using stress ratings based on task conditions for validation and 89% accuracy using questionnaires analysis for validation. Further results in which metrics of stress from video tape annotations of the drive were correlated with physiological features showed highly significant correlations (up to r = .77 for over 4000 samples). Together, these three experiments show a range of success in recognizing affect from physiology, showing high recognition rates in somewhat constrained conditions and highlighting the need for more automatic context sensing in unconmore automatic context sensing in unconstrained conditions. The recognition rates obtained thus far lend support to the hypothesis that many emotional differences can be automatically discriminated in patterns of physiological changes.by Jennifer A. Healey.Ph.D
    • …
    corecore