1,207 research outputs found

    Gravitational Clustering: A Simple, Robust and Adaptive Approach for Distributed Networks

    Full text link
    Distributed signal processing for wireless sensor networks enables that different devices cooperate to solve different signal processing tasks. A crucial first step is to answer the question: who observes what? Recently, several distributed algorithms have been proposed, which frame the signal/object labelling problem in terms of cluster analysis after extracting source-specific features, however, the number of clusters is assumed to be known. We propose a new method called Gravitational Clustering (GC) to adaptively estimate the time-varying number of clusters based on a set of feature vectors. The key idea is to exploit the physical principle of gravitational force between mass units: streaming-in feature vectors are considered as mass units of fixed position in the feature space, around which mobile mass units are injected at each time instant. The cluster enumeration exploits the fact that the highest attraction on the mobile mass units is exerted by regions with a high density of feature vectors, i.e., gravitational clusters. By sharing estimates among neighboring nodes via a diffusion-adaptation scheme, cooperative and distributed cluster enumeration is achieved. Numerical experiments concerning robustness against outliers, convergence and computational complexity are conducted. The application in a distributed cooperative multi-view camera network illustrates the applicability to real-world problems.Comment: 12 pages, 9 figure

    Activity understanding and unusual event detection in surveillance videos

    Get PDF
    PhDComputer scientists have made ceaseless efforts to replicate cognitive video understanding abilities of human brains onto autonomous vision systems. As video surveillance cameras become ubiquitous, there is a surge in studies on automated activity understanding and unusual event detection in surveillance videos. Nevertheless, video content analysis in public scenes remained a formidable challenge due to intrinsic difficulties such as severe inter-object occlusion in crowded scene and poor quality of recorded surveillance footage. Moreover, it is nontrivial to achieve robust detection of unusual events, which are rare, ambiguous, and easily confused with noise. This thesis proposes solutions for resolving ambiguous visual observations and overcoming unreliability of conventional activity analysis methods by exploiting multi-camera visual context and human feedback. The thesis first demonstrates the importance of learning visual context for establishing reliable reasoning on observed activity in a camera network. In the proposed approach, a new Cross Canonical Correlation Analysis (xCCA) is formulated to discover and quantify time delayed pairwise correlations of regional activities observed within and across multiple camera views. This thesis shows that learning time delayed pairwise activity correlations offers valuable contextual information for (1) spatial and temporal topology inference of a camera network, (2) robust person re-identification, and (3) accurate activity-based video temporal segmentation. Crucially, in contrast to conventional methods, the proposed approach does not rely on either intra-camera or inter-camera object tracking; it can thus be applied to low-quality surveillance videos featuring severe inter-object occlusions. Second, to detect global unusual event across multiple disjoint cameras, this thesis extends visual context learning from pairwise relationship to global time delayed dependency between regional activities. Specifically, a Time Delayed Probabilistic Graphical Model (TD-PGM) is proposed to model the multi-camera activities and their dependencies. Subtle global unusual events are detected and localised using the model as context-incoherent patterns across multiple camera views. In the model, different nodes represent activities in different decomposed re3 gions from different camera views, and the directed links between nodes encoding time delayed dependencies between activities observed within and across camera views. In order to learn optimised time delayed dependencies in a TD-PGM, a novel two-stage structure learning approach is formulated by combining both constraint-based and scored-searching based structure learning methods. Third, to cope with visual context changes over time, this two-stage structure learning approach is extended to permit tractable incremental update of both TD-PGM parameters and its structure. As opposed to most existing studies that assume static model once learned, the proposed incremental learning allows a model to adapt itself to reflect the changes in the current visual context, such as subtle behaviour drift over time or removal/addition of cameras. Importantly, the incremental structure learning is achieved without either exhaustive search in a large graph structure space or storing all past observations in memory, making the proposed solution memory and time efficient. Forth, an active learning approach is presented to incorporate human feedback for on-line unusual event detection. Contrary to most existing unsupervised methods that perform passive mining for unusual events, the proposed approach automatically requests supervision for critical points to resolve ambiguities of interest, leading to more robust detection of subtle unusual events. The active learning strategy is formulated as a stream-based solution, i.e. it makes decision on-the-fly on whether to request label for each unlabelled sample observed in sequence. It selects adaptively two active learning criteria, namely likelihood criterion and uncertainty criterion to achieve (1) discovery of unknown event classes and (2) refinement of classification boundary. The effectiveness of the proposed approaches is validated using videos captured from busy public scenes such as underground stations and traffic intersections

    Automatic Object Detection and Categorisation in Deep Astronomical Imaging Surveys Using Unsupervised Machine Learning

    Get PDF
    I present an unsupervised machine learning technique that automatically segments and labels galaxies in astronomical imaging surveys using only pixel data. Distinct from previous unsupervised machine learning approaches used in astronomy the technique uses no pre-selection or pre-filtering of target galaxy type to identify galaxies that are similar. I demonstrate the technique on the Hubble Space Telescope (HST) Frontier Fields. By training the algorithm using galaxies from one field (Abell 2744) and applying the result to another (MACS0416.1-2403), I show how the algorithm can cleanly separate early and late type galaxies without any form of pre-directed training for what an ‘early’ or ‘late’ type galaxy is. I present the results of testing the technique for generalisation and to identify its optimal configuration. I then apply the technique to the HST Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) fields, creating a catalogue of 60000 labelled galaxies, grouped by their similarity. I show how the automatically identified groups contain galaxies with similar morphological (and photometric) type. I compare the catalogue to human-classifications from the Galaxy Zoo: CANDELS project. Although there is not a direct mapping, I demonstrate a good level of concordance between them. I publicly release the catalogue and a corresponding visual catalogue and galaxy similarity search facility at www.galaxyml.uk. I show how the technique can be used to identify rarer objects and present lensed galaxy candidates from the CANDELS imaging. Finally, I consider how the technique can be improved and applied to future surveys to identify transient objects

    Robust and Distributed Cluster Enumeration and Object Labeling

    Get PDF
    This dissertation contributes to the area of cluster analysis by providing principled methods to determine the number of data clusters and cluster memberships, even in the presence of outliers. The main theoretical contributions are summarized in two theorems on Bayesian cluster enumeration based on modeling the data as a family of Gaussian and t distributions. Real-world applicability is demonstrated by considering advanced signal processing applications, such as distributed camera networks and radar-based person identification. In particular, a new cluster enumeration criterion, which is applicable to a broad class of data distributions, is derived by utilizing Bayes' theorem and asymptotic approximations. This serves as a starting point when deriving cluster enumeration criteria for specific data distributions. Along this line, a Bayesian cluster enumeration criterion is derived by modeling the data as a family of multivariate Gaussian distributions. In real-world applications, the observed data is often subject to heavy tailed noise and outliers which obscure the true underlying structure of the data. Consequently, estimating the number of data clusters becomes challenging. To this end, a robust cluster enumeration criterion is derived by modeling the data as a family of multivariate t distributions. The family of t distributions is flexible by variation of its degree of freedom parameter (ν) and it contains, as special cases, the heavy tailed Cauchy for ν = 1, and the Gaussian distribution for ν → ∞. Given that ν is sufficiently small, the robust criterion accounts for outliers by giving them less weight in the objective function. A further contribution of this dissertation lies in refining the penalty terms of both the robust and Gaussian criterion for the finite sample regime. The derived cluster enumeration criteria require a clustering algorithm that partitions the data according to the number of clusters specified by each candidate model and provides an estimate of cluster parameters. Hence, a model-based unsupervised learning method is applied to partition the data prior to the calculation of an enumeration criterion, resulting in a two-step algorithm. The proposed algorithm provides a unified framework for the estimation of the number of clusters and cluster memberships. The developed algorithms are applied to two advanced signal processing use cases. Specifically, the cluster enumeration criteria are extended to a distributed sensor network setting by proposing two distributed and adaptive Bayesian cluster enumeration algorithms. The proposed algorithms are applied to a camera network use case, where the task is to estimate the number of pedestrians based on streaming-in data collected by multiple cameras filming a non-stationary scene from different viewpoints. A further research focus of this dissertation is the cluster membership assignment of individual data points and their associated cluster labels given that the number of clusters is either prespecified by the user or estimated by one of the methods described earlier. Solving this task is required in a broad range of applications, such as distributed sensor networks and radar-based person identification. For this purpose, an adaptive joint object labeling and tracking algorithm is proposed and applied to a real data use case of pedestrian labeling in a calibration-free multi-object multi-camera setup with low video resolution and frequent object occlusions. The proposed algorithm is well suited for ad hoc networks, as it requires neither registration of camera views nor a fusion center. Finally, a joint cluster enumeration and labeling algorithm is proposed to deal with the combined problem of estimating the number of clusters and cluster memberships at the same time. The proposed algorithm is applied to person labeling in a real data application of radar-based person identification without prior information on the number of individuals. It achieves comparable performance to a supervised approach that requires knowledge of the number of persons and a considerable amount of training data with known cluster labels. The proposed unsupervised method is advantageous in the considered application of smart assisted living, as it extracts the missing information from the data. Based on these examples, and, also considering the comparably low computational cost, we conjuncture that the proposed methods provide a useful set of robust cluster analysis tools for data science with many potential application areas, not only in the area of engineering

    From pixels to people : recovering location, shape and pose of humans in images

    Get PDF
    Humans are at the centre of a significant amount of research in computer vision. Endowing machines with the ability to perceive people from visual data is an immense scientific challenge with a high degree of direct practical relevance. Success in automatic perception can be measured at different levels of abstraction, and this will depend on which intelligent behaviour we are trying to replicate: the ability to localise persons in an image or in the environment, understanding how persons are moving at the skeleton and at the surface level, interpreting their interactions with the environment including with other people, and perhaps even anticipating future actions. In this thesis we tackle different sub-problems of the broad research area referred to as "looking at people", aiming to perceive humans in images at different levels of granularity. We start with bounding box-level pedestrian detection: We present a retrospective analysis of methods published in the decade preceding our work, identifying various strands of research that have advanced the state of the art. With quantitative exper- iments, we demonstrate the critical role of developing better feature representations and having the right training distribution. We then contribute two methods based on the insights derived from our analysis: one that combines the strongest aspects of past detectors and another that focuses purely on learning representations. The latter method outperforms more complicated approaches, especially those based on hand- crafted features. We conclude our work on pedestrian detection with a forward-looking analysis that maps out potential avenues for future research. We then turn to pixel-level methods: Perceiving humans requires us to both separate them precisely from the background and identify their surroundings. To this end, we introduce Cityscapes, a large-scale dataset for street scene understanding. This has since established itself as a go-to benchmark for segmentation and detection. We additionally develop methods that relax the requirement for expensive pixel-level annotations, focusing on the task of boundary detection, i.e. identifying the outlines of relevant objects and surfaces. Next, we make the jump from pixels to 3D surfaces, from localising and labelling to fine-grained spatial understanding. We contribute a method for recovering 3D human shape and pose, which marries the advantages of learning-based and model- based approaches. We conclude the thesis with a detailed discussion of benchmarking practices in computer vision. Among other things, we argue that the design of future datasets should be driven by the general goal of combinatorial robustness besides task-specific considerations.Der Mensch steht im Zentrum vieler Forschungsanstrengungen im Bereich des maschinellen Sehens. Es ist eine immense wissenschaftliche Herausforderung mit hohem unmittelbarem Praxisbezug, Maschinen mit der Fähigkeit auszustatten, Menschen auf der Grundlage von visuellen Daten wahrzunehmen. Die automatische Wahrnehmung kann auf verschiedenen Abstraktionsebenen erfolgen. Dies hängt davon ab, welches intelligente Verhalten wir nachbilden wollen: die Fähigkeit, Personen auf der Bildfläche oder im 3D-Raum zu lokalisieren, die Bewegungen von Körperteilen und Körperoberflächen zu erfassen, Interaktionen einer Person mit ihrer Umgebung einschließlich mit anderen Menschen zu deuten, und vielleicht sogar zukünftige Handlungen zu antizipieren. In dieser Arbeit beschäftigen wir uns mit verschiedenen Teilproblemen die dem breiten Forschungsgebiet "Betrachten von Menschen" gehören. Beginnend mit der Fußgängererkennung präsentieren wir eine Analyse von Methoden, die im Jahrzehnt vor unserem Ausgangspunkt veröffentlicht wurden, und identifizieren dabei verschiedene Forschungsstränge, die den Stand der Technik vorangetrieben haben. Unsere quantitativen Experimente zeigen die entscheidende Rolle sowohl der Entwicklung besserer Bildmerkmale als auch der Trainingsdatenverteilung. Anschließend tragen wir zwei Methoden bei, die auf den Erkenntnissen unserer Analyse basieren: eine Methode, die die stärksten Aspekte vergangener Detektoren kombiniert, eine andere, die sich im Wesentlichen auf das Lernen von Bildmerkmalen konzentriert. Letztere übertrifft kompliziertere Methoden, insbesondere solche, die auf handgefertigten Bildmerkmalen basieren. Wir schließen unsere Arbeit zur Fußgängererkennung mit einer vorausschauenden Analyse ab, die mögliche Wege für die zukünftige Forschung aufzeigt. Anschließend wenden wir uns Methoden zu, die Entscheidungen auf Pixelebene betreffen. Um Menschen wahrzunehmen, müssen wir diese sowohl praezise vom Hintergrund trennen als auch ihre Umgebung verstehen. Zu diesem Zweck führen wir Cityscapes ein, einen umfangreichen Datensatz zum Verständnis von Straßenszenen. Dieser hat sich seitdem als Standardbenchmark für Segmentierung und Erkennung etabliert. Darüber hinaus entwickeln wir Methoden, die die Notwendigkeit teurer Annotationen auf Pixelebene reduzieren. Wir konzentrieren uns hierbei auf die Aufgabe der Umgrenzungserkennung, d. h. das Erkennen der Umrisse relevanter Objekte und Oberflächen. Als nächstes machen wir den Sprung von Pixeln zu 3D-Oberflächen, vom Lokalisieren und Beschriften zum präzisen räumlichen Verständnis. Wir tragen eine Methode zur Schätzung der 3D-Körperoberfläche sowie der 3D-Körperpose bei, die die Vorteile von lernbasierten und modellbasierten Ansätzen vereint. Wir schließen die Arbeit mit einer ausführlichen Diskussion von Evaluationspraktiken im maschinellen Sehen ab. Unter anderem argumentieren wir, dass der Entwurf zukünftiger Datensätze neben aufgabenspezifischen Überlegungen vom allgemeinen Ziel der kombinatorischen Robustheit bestimmt werden sollte

    Pathway to Future Symbiotic Creativity

    Full text link
    This report presents a comprehensive view of our vision on the development path of the human-machine symbiotic art creation. We propose a classification of the creative system with a hierarchy of 5 classes, showing the pathway of creativity evolving from a mimic-human artist (Turing Artists) to a Machine artist in its own right. We begin with an overview of the limitations of the Turing Artists then focus on the top two-level systems, Machine Artists, emphasizing machine-human communication in art creation. In art creation, it is necessary for machines to understand humans' mental states, including desires, appreciation, and emotions, humans also need to understand machines' creative capabilities and limitations. The rapid development of immersive environment and further evolution into the new concept of metaverse enable symbiotic art creation through unprecedented flexibility of bi-directional communication between artists and art manifestation environments. By examining the latest sensor and XR technologies, we illustrate the novel way for art data collection to constitute the base of a new form of human-machine bidirectional communication and understanding in art creation. Based on such communication and understanding mechanisms, we propose a novel framework for building future Machine artists, which comes with the philosophy that a human-compatible AI system should be based on the "human-in-the-loop" principle rather than the traditional "end-to-end" dogma. By proposing a new form of inverse reinforcement learning model, we outline the platform design of machine artists, demonstrate its functions and showcase some examples of technologies we have developed. We also provide a systematic exposition of the ecosystem for AI-based symbiotic art form and community with an economic model built on NFT technology. Ethical issues for the development of machine artists are also discussed
    • …
    corecore