4,368 research outputs found

    Machine learning for activity recognition

    Get PDF
    This paper surveys the activity recognition task from a machine learning perspective. I give a definition of this problem, and I classify different activity recognition problems into two categories. I show the activities can be hierarchical, and based on such hierarchies I synthesize a language to describe activities. I give a general criteria set to evaluate activity recognition methods. I summarize some off-the-shelf machine learning methods for activity recognition and evaluate them based on this criteria set. Finally, I discuss some methods that I believe can improve the activity recognition performance

    Hybrid SRL with Optimization Modulo Theories

    Full text link
    Generally speaking, the goal of constructive learning could be seen as, given an example set of structured objects, to generate novel objects with similar properties. From a statistical-relational learning (SRL) viewpoint, the task can be interpreted as a constraint satisfaction problem, i.e. the generated objects must obey a set of soft constraints, whose weights are estimated from the data. Traditional SRL approaches rely on (finite) First-Order Logic (FOL) as a description language, and on MAX-SAT solvers to perform inference. Alas, FOL is unsuited for con- structive problems where the objects contain a mixture of Boolean and numerical variables. It is in fact difficult to implement, e.g. linear arithmetic constraints within the language of FOL. In this paper we propose a novel class of hybrid SRL methods that rely on Satisfiability Modulo Theories, an alternative class of for- mal languages that allow to describe, and reason over, mixed Boolean-numerical objects and constraints. The resulting methods, which we call Learning Mod- ulo Theories, are formulated within the structured output SVM framework, and employ a weighted SMT solver as an optimization oracle to perform efficient in- ference and discriminative max margin weight learning. We also present a few examples of constructive learning applications enabled by our method

    A Probabilistic Logic Programming Event Calculus

    Full text link
    We present a system for recognising human activity given a symbolic representation of video content. The input of our system is a set of time-stamped short-term activities (STA) detected on video frames. The output is a set of recognised long-term activities (LTA), which are pre-defined temporal combinations of STA. The constraints on the STA that, if satisfied, lead to the recognition of a LTA, have been expressed using a dialect of the Event Calculus. In order to handle the uncertainty that naturally occurs in human activity recognition, we adapted this dialect to a state-of-the-art probabilistic logic programming framework. We present a detailed evaluation and comparison of the crisp and probabilistic approaches through experimentation on a benchmark dataset of human surveillance videos.Comment: Accepted for publication in the Theory and Practice of Logic Programming (TPLP) journa

    Building a Large-scale Multimodal Knowledge Base System for Answering Visual Queries

    Full text link
    The complexity of the visual world creates significant challenges for comprehensive visual understanding. In spite of recent successes in visual recognition, today's vision systems would still struggle to deal with visual queries that require a deeper reasoning. We propose a knowledge base (KB) framework to handle an assortment of visual queries, without the need to train new classifiers for new tasks. Building such a large-scale multimodal KB presents a major challenge of scalability. We cast a large-scale MRF into a KB representation, incorporating visual, textual and structured data, as well as their diverse relations. We introduce a scalable knowledge base construction system that is capable of building a KB with half billion variables and millions of parameters in a few hours. Our system achieves competitive results compared to purpose-built models on standard recognition and retrieval tasks, while exhibiting greater flexibility in answering richer visual queries

    Visual Affordance and Function Understanding: A Survey

    Full text link
    Nowadays, robots are dominating the manufacturing, entertainment and healthcare industries. Robot vision aims to equip robots with the ability to discover information, understand it and interact with the environment. These capabilities require an agent to effectively understand object affordances and functionalities in complex visual domains. In this literature survey, we first focus on Visual affordances and summarize the state of the art as well as open problems and research gaps. Specifically, we discuss sub-problems such as affordance detection, categorization, segmentation and high-level reasoning. Furthermore, we cover functional scene understanding and the prevalent functional descriptors used in the literature. The survey also provides necessary background to the problem, sheds light on its significance and highlights the existing challenges for affordance and functionality learning.Comment: 26 pages, 22 image

    Semi-Supervised Online Structure Learning for Composite Event Recognition

    Full text link
    Online structure learning approaches, such as those stemming from Statistical Relational Learning, enable the discovery of complex relations in noisy data streams. However, these methods assume the existence of fully-labelled training data, which is unrealistic for most real-world applications. We present a novel approach for completing the supervision of a semi-supervised structure learning task. We incorporate graph-cut minimisation, a technique that derives labels for unlabelled data, based on their distance to their labelled counterparts. In order to adapt graph-cut minimisation to first order logic, we employ a suitable structural distance for measuring the distance between sets of logical atoms. The labelling process is achieved online (single-pass) by means of a caching mechanism and the Hoeffding bound, a statistical tool to approximate globally-optimal decisions from locally-optimal ones. We evaluate our approach on the task of composite event recognition by using a benchmark dataset for human activity recognition, as well as a real dataset for maritime monitoring. The evaluation suggests that our approach can effectively complete the missing labels and eventually, improve the accuracy of the underlying structure learning system

    Statistical interaction modeling of bovine herd behaviors

    Get PDF
    While there has been interest in modeling the group behavior of herds or flocks, much of this work has focused on simulating their collective spatial motion patterns which have not accounted for individuality in the herd and instead assume a homogenized role for all members or sub-groups of the herd. Animal behavior experts have noted that domestic animals exhibit behaviors that are indicative of social hierarchy: leader/follower type behaviors are present as well as dominance and subordination, aggression and rank order, and specific social affiliations may also exist. Both wild and domestic cattle are social species, and group behaviors are likely to be influenced by the expression of specific social interactions. In this paper, Global Positioning System coordinate fixes gathered from a herd of beef cows tracked in open fields over several days at a time are utilized to learn a model that focuses on the interactions within the herd as well as its overall movement. Using these data in this way explores the validity of existing group behavior models against actual herding behaviors. Domain knowledge, location geography and human observations, are utilized to explain the causes of these deviations from this idealized behavior

    Exploring sensor data management

    Get PDF
    The increasing availability of cheap, small, low-power sensor hardware and the ubiquity of wired and wireless networks has led to the prediction that `smart evironments' will emerge in the near future. The sensors in these environments collect detailed information about the situation people are in, which is used to enhance information-processing applications that are present on their mobile and `ambient' devices.\ud \ud Bridging the gap between sensor data and application information poses new requirements to data management. This report discusses what these requirements are and documents ongoing research that explores ways of thinking about data management suited to these new requirements: a more sophisticated control flow model, data models that incorporate time, and ways to deal with the uncertainty in sensor data

    Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph

    Full text link
    Visual relationship reasoning is a crucial yet challenging task for understanding rich interactions across visual concepts. For example, a relationship 'man, open, door' involves a complex relation 'open' between concrete entities 'man, door'. While much of the existing work has studied this problem in the context of still images, understanding visual relationships in videos has received limited attention. Due to their temporal nature, videos enable us to model and reason about a more comprehensive set of visual relationships, such as those requiring multiple (temporal) observations (e.g., 'man, lift up, box' vs. 'man, put down, box'), as well as relationships that are often correlated through time (e.g., 'woman, pay, money' followed by 'woman, buy, coffee'). In this paper, we construct a Conditional Random Field on a fully-connected spatio-temporal graph that exploits the statistical dependency between relational entities spatially and temporally. We introduce a novel gated energy function parametrization that learns adaptive relations conditioned on visual observations. Our model optimization is computationally efficient, and its space computation complexity is significantly amortized through our proposed parameterization. Experimental results on benchmark video datasets (ImageNet Video and Charades) demonstrate state-of-the-art performance across three standard relationship reasoning tasks: Detection, Tagging, and Recognition.Comment: CVPR 2019. Supplementary included. Fixing a small typ

    Using Abduction in Markov Logic Networks for Root Cause Analysis

    Full text link
    IT infrastructure is a crucial part in most of today's business operations. High availability and reliability, and short response times to outages are essential. Thus a high amount of tool support and automation in risk management is desirable to decrease outages. We propose a new approach for calculating the root cause for an observed failure in an IT infrastructure. Our approach is based on Abduction in Markov Logic Networks. Abduction aims to find an explanation for a given observation in the light of some background knowledge. In failure diagnosis, the explanation corresponds to the root cause, the observation to the failure of a component, and the background knowledge to the dependency graph extended by potential risks. We apply a method to extend a Markov Logic Network in order to conduct abductive reasoning, which is not naturally supported in this formalism. Our approach exhibits a high amount of reusability and enables users without specific knowledge of a concrete infrastructure to gain viable insights in the case of an incident. We implemented the method in a tool and illustrate its suitability for root cause analysis by applying it to a sample scenario
    • …
    corecore