239 research outputs found

    A system for large-scale image and video retrieval on everyday scenes

    Get PDF
    There has been a growing amount of multimedia data generated on the web todayin terms of size and diversity. This has made accurate content retrieval with these large and complex collections of data a challenging problem. Motivated by the need for systems that can enable scalable and efficient search, we propose QIK (Querying Images Using Contextual Knowledge). QIK leverages advances in deep learning (DL) and natural language processing (NLP) for scene understanding to enable large-scale multimedia retrieval on everyday scenes with common objects. The system consists of three major components: Indexer, Query Processor, and Video Processor. Given an image, the Indexer performs probabilistic image understanding (PIU). The PIU generated consists of the most probable captions, parsed and represented by tree structures using NLP techniques, and detected objects. The PIU's are stored and indexed in a database system. For a query image, the Query Processor generates the most probable caption and parses it into the corresponding tree structure. Then an optimized tree-pattern query is constructed and executed on the database to retrieve a set of candidate images. The candidate images fetched are ranked using the tree-edit distance metric computed on the tree structures. Given a video, the Video Processor extracts a sequence of key scenes that are posed to the Query Processor to retrieve a set of candidate scenes. The candidate scene parse trees corresponding to a video are extracted and are ranked based on the number of matching scenes. We evaluated the performance of our system for large-scale image and video retrieval tasks on datasets containing everyday scenes and observed that our system could outperform state-ofthe- art techniques in terms of mean average precision.Includes bibliographical references

    A computational framework for unsupervised analysis of everyday human activities

    Get PDF
    In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner. A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity. Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments.Ph.D.Committee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: David Hogg; Committee Member: Irfan Essa; Committee Member: James Reh

    Planning Algorithms for Multi-Robot Active Perception

    Get PDF
    A fundamental task of robotic systems is to use on-board sensors and perception algorithms to understand high-level semantic properties of an environment. These semantic properties may include a map of the environment, the presence of objects, or the parameters of a dynamic field. Observations are highly viewpoint dependent and, thus, the performance of perception algorithms can be improved by planning the motion of the robots to obtain high-value observations. This motivates the problem of active perception, where the goal is to plan the motion of robots to improve perception performance. This fundamental problem is central to many robotics applications, including environmental monitoring, planetary exploration, and precision agriculture. The core contribution of this thesis is a suite of planning algorithms for multi-robot active perception. These algorithms are designed to improve system-level performance on many fronts: online and anytime planning, addressing uncertainty, optimising over a long time horizon, decentralised coordination, robustness to unreliable communication, predicting plans of other agents, and exploiting characteristics of perception models. We first propose the decentralised Monte Carlo tree search algorithm as a generally-applicable, decentralised algorithm for multi-robot planning. We then present a self-organising map algorithm designed to find paths that maximally observe points of interest. Finally, we consider the problem of mission monitoring, where a team of robots monitor the progress of a robotic mission. A spatiotemporal optimal stopping algorithm is proposed and a generalisation for decentralised monitoring. Experimental results are presented for a range of scenarios, such as marine operations and object recognition. Our analytical and empirical results demonstrate theoretically-interesting and practically-relevant properties that support the use of the approaches in practice

    Proceedings of the 6th International Workshop on Folk Music Analysis, 15-17 June, 2016

    Get PDF
    The Folk Music Analysis Workshop brings together computational music analysis and ethnomusicology. Both symbolic and audio representations of music are considered, with a broad range of scientific approaches being applied (signal processing, graph theory, deep learning). The workshop features a range of interesting talks from international researchers in areas such as Indian classical music, Iranian singing, Ottoman-Turkish Makam music scores, Flamenco singing, Irish traditional music, Georgian traditional music and Dutch folk songs. Invited guest speakers were Anja Volk, Utrecht University and Peter Browne, Technological University Dublin

    Clustering and Classification for Time Series Data in Visual Analytics: A Survey

    Get PDF
    Visual analytics for time series data has received a considerable amount of attention. Different approaches have been developed to understand the characteristics of the data and obtain meaningful statistics in order to explore the underlying processes, identify and estimate trends, make decisions and predict the future. The machine learning and visualization areas share a focus on extracting information from data. In this paper, we consider not only automatic methods but also interactive exploration. The ability to embed efficient machine learning techniques (clustering and classification) in interactive visualization systems is highly desirable in order to gain the most from both humans and computers. We present a literature review of some of the most important publications in the field and classify over 60 published papers from six different perspectives. This review intends to clarify the major concepts with which clustering or classification algorithms are used in visual analytics for time series data and provide a valuable guide for both new researchers and experts in the emerging field of integrating machine learning techniques into visual analytics

    Denial of Service in Web-Domains: Building Defenses Against Next-Generation Attack Behavior

    Get PDF
    The existing state-of-the-art in the field of application layer Distributed Denial of Service (DDoS) protection is generally designed, and thus effective, only for static web domains. To the best of our knowledge, our work is the first that studies the problem of application layer DDoS defense in web domains of dynamic content and organization, and for next-generation bot behaviour. In the first part of this thesis, we focus on the following research tasks: 1) we identify the main weaknesses of the existing application-layer anti-DDoS solutions as proposed in research literature and in the industry, 2) we obtain a comprehensive picture of the current-day as well as the next-generation application-layer attack behaviour and 3) we propose novel techniques, based on a multidisciplinary approach that combines offline machine learning algorithms and statistical analysis, for detection of suspicious web visitors in static web domains. Then, in the second part of the thesis, we propose and evaluate a novel anti-DDoS system that detects a broad range of application-layer DDoS attacks, both in static and dynamic web domains, through the use of advanced techniques of data mining. The key advantage of our system relative to other systems that resort to the use of challenge-response tests (such as CAPTCHAs) in combating malicious bots is that our system minimizes the number of these tests that are presented to valid human visitors while succeeding in preventing most malicious attackers from accessing the web site. The results of the experimental evaluation of the proposed system demonstrate effective detection of current and future variants of application layer DDoS attacks

    From vision to reasoning

    Get PDF
    corecore