187 research outputs found

    Activity Analysis; Finding Explanations for Sets of Events

    Get PDF
    Automatic activity recognition is the computational process of analysing visual input and reasoning about detections to understand the performed events. In all but the simplest scenarios, an activity involves multiple interleaved events, some related and others independent. The activity in a car park or at a playground would typically include many events. This research assumes the possible events and any constraints between the events can be defined for the given scene. Analysing the activity should thus recognise a complete and consistent set of events; this is referred to as a global explanation of the activity. By seeking a global explanation that satisfies the activity’s constraints, infeasible interpretations can be avoided, and ambiguous observations may be resolved. An activity’s events and any natural constraints are defined using a grammar formalism. Attribute Multiset Grammars (AMG) are chosen because they allow defining hierarchies, as well as attribute rules and constraints. When used for recognition, detectors are employed to gather a set of detections. Parsing the set of detections by the AMG provides a global explanation. To find the best parse tree given a set of detections, a Bayesian network models the probability distribution over the space of possible parse trees. Heuristic and exhaustive search techniques are proposed to find the maximum a posteriori global explanation. The framework is tested for two activities: the activity in a bicycle rack, and around a building entrance. The first case study involves people locking bicycles onto a bicycle rack and picking them up later. The best global explanation for all detections gathered during the day resolves local ambiguities from occlusion or clutter. Intensive testing on 5 full days proved global analysis achieves higher recognition rates. The second case study tracks people and any objects they are carrying as they enter and exit a building entrance. A complete sequence of the person entering and exiting multiple times is recovered by the global explanation

    Real Time Fusion of Radioisotope Direction Estimation and Visual Object Tracking

    Get PDF
    Research into discovering prohibited nuclear material plays an integral role in providing security from terrorism. Although many diverse methods contribute to defense, there exists a capability gap in localizing moving sources. This thesis introduces a real time radioisotope tracking algorithm assisted by visual object tracking methods to fill the capability gap. The proposed algorithm can estimate carrier likelihood for objects in its field of view, and is designed to assist a pedestrian agent wearing a backpack detector. The complex, crowd-filled, urban environments where this algorithm must function combined with the size and weight limitations of a pedestrian system makes designing a functioning algorithm challenging.The contribution of this thesis is threefold. First, a generalized directional estimator is proposed. Second, two state-of-the-art visual object detection and visual object tracking methods are combined into a single tracking algorithm. Third, those outputs are fused to produce a real time radioisotope tracking algorithm. This algorithm is designed for use with the backpack detector built by the IDEAS for WIND research group. This setup takes advantage of recent advances in detector, camera, and computer technologies to meet the challenging physical limitations.The directional estimator operates via gradient boosting regression to predict radioisotope direction with a variance of 50 degrees when trained on a simple laboratory dataset. Under conditions similar to other state-of-the-art methods, the accuracy is comparable. YOLOv3 and SiamFC are chosen by evaluating advanced visual tracking methods in terms of speed and efficiency across multiple architectures, and in terms of accuracy on datasets like the Visual Object Tracking (VOT) Challenge and Common Objects in Context (COCO). The resultant tracking algorithm operates in real time. The outputs of direction estimation and visual tracking are fused using sequential Bayesian inference to predict carrier likelihood. Using lab trials evaluated by hand on visual and nuclear data, and a synthesized challenge dataset using visual data from the Boston Marathon attack, it can be observed that this prototype system advances the state-of-the-art towards localization of a moving source

    Automatic human behaviour anomaly detection in surveillance video

    Get PDF
    This thesis work focusses upon developing the capability to automatically evaluate and detect anomalies in human behaviour from surveillance video. We work with static monocular cameras in crowded urban surveillance scenarios, particularly air- ports and commercial shopping areas. Typically a person is 100 to 200 pixels high in a scene ranging from 10 - 20 meters width and depth, populated by 5 to 40 peo- ple at any given time. Our procedure evaluates human behaviour unobtrusively to determine outlying behavioural events, agging abnormal events to the operator. In order to achieve automatic human behaviour anomaly detection we address the challenge of interpreting behaviour within the context of the social and physical environment. We develop and evaluate a process for measuring social connectivity between individuals in a scene using motion and visual attention features. To do this we use mutual information and Euclidean distance to build a social similarity matrix which encodes the social connection strength between any two individuals. We de- velop a second contextual basis which acts by segmenting a surveillance environment into behaviourally homogeneous subregions which represent high tra c slow regions and queuing areas. We model the heterogeneous scene in homogeneous subgroups using both contextual elements. We bring the social contextual information, the scene context, the motion, and visual attention features together to demonstrate a novel human behaviour anomaly detection process which nds outlier behaviour from a short sequence of video. The method, Nearest Neighbour Ranked Outlier Clusters (NN-RCO), is based upon modelling behaviour as a time independent se- quence of behaviour events, can be trained in advance or set upon a single sequence. We nd that in a crowded scene the application of Mutual Information-based social context permits the ability to prevent self-justifying groups and propagate anomalies in a social network, granting a greater anomaly detection capability. Scene context uniformly improves the detection of anomalies in all the datasets we test upon. We additionally demonstrate that our work is applicable to other data domains. We demonstrate upon the Automatic Identi cation Signal data in the maritime domain. Our work is capable of identifying abnormal shipping behaviour using joint motion dependency as analogous for social connectivity, and similarly segmenting the shipping environment into homogeneous regions

    A Survey on Human-aware Robot Navigation

    Full text link
    Intelligent systems are increasingly part of our everyday lives and have been integrated seamlessly to the point where it is difficult to imagine a world without them. Physical manifestations of those systems on the other hand, in the form of embodied agents or robots, have so far been used only for specific applications and are often limited to functional roles (e.g. in the industry, entertainment and military fields). Given the current growth and innovation in the research communities concerned with the topics of robot navigation, human-robot-interaction and human activity recognition, it seems like this might soon change. Robots are increasingly easy to obtain and use and the acceptance of them in general is growing. However, the design of a socially compliant robot that can function as a companion needs to take various areas of research into account. This paper is concerned with the navigation aspect of a socially-compliant robot and provides a survey of existing solutions for the relevant areas of research as well as an outlook on possible future directions.Comment: Robotics and Autonomous Systems, 202

    Collection and Analysis of Driving Videos Based on Traffic Participants

    Full text link
    Autonomous vehicle (AV) prototypes have been deployed in increasingly varied environments in recent years. An AV must be able to reliably detect and predict the future motion of traffic participants to maintain safe operation based on data collected from high-quality onboard sensors. Sensors such as camera and LiDAR generate high-bandwidth data that requires substantial computational and memory resources. To address these AV challenges, this thesis investigates three related problems: 1) What will the observed traffic participants do? 2) Is an anomalous traffic event likely to happen in near future? and 3) How should we collect fleet-wide high-bandwidth data based on 1) and 2) over the long-term? The first problem is addressed with future traffic trajectory and pedestrian behavior prediction. We propose a future object localization (FOL) method for trajectory prediction in first person videos (FPV). FOL encodes heterogeneous observations including bounding boxes, optical flow features and ego camera motions with multi-stream recurrent neural networks (RNN) to predict future trajectories. Because FOL does not consider multi-modal future trajectories, its accuracy suffers from accumulated RNN prediction error. We then introduce BiTraP, a goal-conditioned bidirectional multi-modal trajectory prediction method. BiTraP estimates multi-modal trajectories and uses a novel bi-directional decoder and loss to improve longer-term trajectory prediction accuracy. We show that different choices of non-parametric versus parametric target models directly influence predicted multi-modal trajectory distributions. Experiments with two FPV and six bird's-eye view (BEV) datasets show the effectiveness of our methods compared to state-of-the-art. We define pedestrian behavior prediction as a combination of action and intent. We hypothesize that current and future actions are strong intent priors and propose a multi-task learning RNN encoder-decoder network to detect and predict future pedestrian actions and street crossing intent. Experimental results show that one task helps the other so they together achieve state-of-the-art performance on published datasets. To identify likely traffic anomaly events, we introduce an unsupervised video anomaly detection (VAD) method based on trajectories. We predict locations of traffic participants over a near-term future horizon and monitor accuracy and consistency of these predictions as evidence of an anomaly. Inconsistent predictions tend to indicate an anomaly has happened or is about to occur. A supervised video action recognition method can then be applied to classify detected anomalies. We introduce a spatial-temporal area under curve (STAUC) metric as a supplement to the existing area under curve (AUC) evaluation and show it captures how well a model detects temporal and spatial locations of anomalous events. Experimental results show the proposed method and consistency-based anomaly score are more robust to moving cameras than image generation based methods; our method achieves state-of-the-art performance over AUC and STAUC metrics. VAD and action recognition support event-of-interest (EOI) distinction from normal driving data. We introduce a Smart Black Box (SBB), an intelligent event data recorder, to prioritize EOI data in long-term driving. The SBB compresses high-bandwidth data based on EOI potential and on-board storage limits. The SBB is designed to prioritize newer and anomalous driving data and discard older and normal data. An optimal compression factor is selected based on the trade-off between data value and storage cost. Experiments in a traffic simulator and with real-world datasets show the efficiency and effectiveness of using a SBB to collect high-quality videos over long-term driving.PHDRoboticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168035/1/brianyao_1.pd

    Combined Learned and Classical Methods for Real-Time Visual Perception in Autonomous Driving

    Full text link
    Autonomy, robotics, and Artificial Intelligence (AI) are among the main defining themes of next-generation societies. Of the most important applications of said technologies is driving automation which spans from different Advanced Driver Assistance Systems (ADAS) to full self-driving vehicles. Driving automation is promising to reduce accidents, increase safety, and increase access to mobility for more people such as the elderly and the handicapped. However, one of the main challenges facing autonomous vehicles is robust perception which can enable safe interaction and decision making. With so many sensors to perceive the environment, each with its own capabilities and limitations, vision is by far one of the main sensing modalities. Cameras are cheap and can provide rich information of the observed scene. Therefore, this dissertation develops a set of visual perception algorithms with a focus on autonomous driving as the target application area. This dissertation starts by addressing the problem of real-time motion estimation of an agent using only the visual input from a camera attached to it, a problem known as visual odometry. The visual odometry algorithm can achieve low drift rates over long-traveled distances. This is made possible through the innovative local mapping approach used. This visual odometry algorithm was then combined with my multi-object detection and tracking system. The tracking system operates in a tracking-by-detection paradigm where an object detector based on convolution neural networks (CNNs) is used. Therefore, the combined system can detect and track other traffic participants both in image domain and in 3D world frame while simultaneously estimating vehicle motion. This is a necessary requirement for obstacle avoidance and safe navigation. Finally, the operational range of traditional monocular cameras was expanded with the capability to infer depth and thus replace stereo and RGB-D cameras. This is accomplished through a single-stream convolution neural network which can output both depth prediction and semantic segmentation. Semantic segmentation is the process of classifying each pixel in an image and is an important step toward scene understanding. Literature survey, algorithms descriptions, and comprehensive evaluations on real-world datasets are presented.Ph.D.College of Engineering & Computer ScienceUniversity of Michiganhttps://deepblue.lib.umich.edu/bitstream/2027.42/153989/1/Mohamed Aladem Final Dissertation.pdfDescription of Mohamed Aladem Final Dissertation.pdf : Dissertatio

    Will they buy?

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 127-137).The proliferation of inexpensive video recording hardware and enormous storage capacity has enabled the collection of retail customer behavior at an unprecedented scale. The vast majority of this data is used for theft prevention and never used to better understand the customer. In what ways can this huge corpus be leveraged to improve the experience of customer and the performance of the store? This thesis presents MIMIC, a system that processes video captured in a retail store into predictions about customer proclivity to purchase. MIMIC relies on the observation that aggregate patterns of all of a store's patrons-the gestalt-captures behavior indicative of an imminent transaction. Video is distilled into a homogenous feature vector that captures the activity distribution by first tracking the locations of customers, then discretizing their movements into a feature vector using a collection of functional locations-areas of the store relevant to the tasks of patrons and employees. A time series of these feature vectors can then be classified as predictive-of-transaction using a Hidden Markov Model. MIMIc is evaluated on a small operational retail store located in the Mall of America near Minneapolis, Minnesota. Its performance is characterized across a wide cross-section of the model's parameters. Through manipulation of the training data supplied to MiMic, the behavior of customers in the store can be examined at fine levels of detail without foregoing the potential afforded by big data. MIMIC enables a suite of valuable tools. For ethnographic researchers, it offers a technique for identifying key moments in hundreds or thousands of hours of raw video. Retail managers gain a fine-grained metric to evaluate the performance of their stores, and interior designers acquire a critical component in a store layout optimization framework.by Rony Daniel Kubat.Ph.D
    corecore