16 research outputs found

    Tools for Advanced Video Metadata Modeling

    Get PDF
    In this Thesis, we focus on problems in surveillance video analysis and propose advanced metadata modeling techniques to address them. First, we explore the problem of constructing a snapshot summary of people in a video sequence. We propose an algorithm based on the eigen-analysis of faces and present an evaluation of the method. Second, we present an algorithm to learn occlusion points in a scene using long observations of moving objects, provide an implementation and evaluate its performance. Third, to address the problem of availability and storage of surveillance videos, we propose a novel methodology to simulate video metadata. The technique is completely automated and can generate metadata for any scenario with minimal user interaction. Finally, a threat detection model using activity analysis and trajectory data of moving objects is proposed and implemented. The collection of tools presented in this Thesis provides a basis for higher level video analysis algorithms

    Deep Learning for Crowd Anomaly Detection

    Get PDF
    Today, public areas across the globe are monitored by an increasing amount of surveillance cameras. This widespread usage has presented an ever-growing volume of data that cannot realistically be examined in real-time. Therefore, efforts to understand crowd dynamics have brought light to automatic systems for the detection of anomalies in crowds. This thesis explores the methods used across literature for this purpose, with a focus on those fusing dense optical flow in a feature extraction stage to the crowd anomaly detection problem. To this extent, five different deep learning architectures are trained using optical flow maps estimated by three deep learning-based techniques. More specifically, a 2D convolutional network, a 3D convolutional network, and LSTM-based convolutional recurrent network, a pre-trained variant of the latter, and a ConvLSTM-based autoencoder is trained using both regular frames and optical flow maps estimated by LiteFlowNet3, RAFT, and GMA on the UCSD Pedestrian 1 dataset. The experimental results have shown that while prone to overfitting, the use of optical flow maps may improve the performance of supervised spatio-temporal architectures

    Towards Developing Computer Vision Algorithms and Architectures for Real-world Applications

    Get PDF
    abstract: Computer vision technology automatically extracts high level, meaningful information from visual data such as images or videos, and the object recognition and detection algorithms are essential in most computer vision applications. In this dissertation, we focus on developing algorithms used for real life computer vision applications, presenting innovative algorithms for object segmentation and feature extraction for objects and actions recognition in video data, and sparse feature selection algorithms for medical image analysis, as well as automated feature extraction using convolutional neural network for blood cancer grading. To detect and classify objects in video, the objects have to be separated from the background, and then the discriminant features are extracted from the region of interest before feeding to a classifier. Effective object segmentation and feature extraction are often application specific, and posing major challenges for object detection and classification tasks. In this dissertation, we address effective object flow based ROI generation algorithm for segmenting moving objects in video data, which can be applied in surveillance and self driving vehicle areas. Optical flow can also be used as features in human action recognition algorithm, and we present using optical flow feature in pre-trained convolutional neural network to improve performance of human action recognition algorithms. Both algorithms outperform the state-of-the-arts at their time. Medical images and videos pose unique challenges for image understanding mainly due to the fact that the tissues and cells are often irregularly shaped, colored, and textured, and hand selecting most discriminant features is often difficult, thus an automated feature selection method is desired. Sparse learning is a technique to extract the most discriminant and representative features from raw visual data. However, sparse learning with \textit{L1} regularization only takes the sparsity in feature dimension into consideration; we improve the algorithm so it selects the type of features as well; less important or noisy feature types are entirely removed from the feature set. We demonstrate this algorithm to analyze the endoscopy images to detect unhealthy abnormalities in esophagus and stomach, such as ulcer and cancer. Besides sparsity constraint, other application specific constraints and prior knowledge may also need to be incorporated in the loss function in sparse learning to obtain the desired results. We demonstrate how to incorporate similar-inhibition constraint, gaze and attention prior in sparse dictionary selection for gastroscopic video summarization that enable intelligent key frame extraction from gastroscopic video data. With recent advancement in multi-layer neural networks, the automatic end-to-end feature learning becomes feasible. Convolutional neural network mimics the mammal visual cortex and can extract most discriminant features automatically from training samples. We present using convolutinal neural network with hierarchical classifier to grade the severity of Follicular Lymphoma, a type of blood cancer, and it reaches 91\% accuracy, on par with analysis by expert pathologists. Developing real world computer vision applications is more than just developing core vision algorithms to extract and understand information from visual data; it is also subject to many practical requirements and constraints, such as hardware and computing infrastructure, cost, robustness to lighting changes and deformation, ease of use and deployment, etc.The general processing pipeline and system architecture for the computer vision based applications share many similar design principles and architecture. We developed common processing components and a generic framework for computer vision application, and a versatile scale adaptive template matching algorithm for object detection. We demonstrate the design principle and best practices by developing and deploying a complete computer vision application in real life, building a multi-channel water level monitoring system, where the techniques and design methodology can be generalized to other real life applications. The general software engineering principles, such as modularity, abstraction, robust to requirement change, generality, etc., are all demonstrated in this research.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Unusual event detection in real-world surveillance applications

    Get PDF
    Given the near-ubiquity of CCTV, there is significant ongoing research effort to apply image and video analysis methods together with machine learning techniques towards autonomous analysis of such data sources. However, traditional approaches to scene understanding remain dependent on training based on human annotations that need to be provided for every camera sensor. In this thesis, we propose an unusual event detection and classification approach which is applicable to real-world visual monitoring applications. The goal is to infer the usual behaviours in the scene and to judge the normality of the scene on the basis on the model created. The first requirement for the system is that it should not demand annotated data to train the system. Annotation of the data is a laborious task, and it is not feasible in practice to annotate video data for each camera as an initial stage of event detection. Furthermore, even obtaining training examples for the unusual event class is challenging due to the rarity of such events in video data. Another requirement for the system is online generation of results. In surveillance applications, it is essential to generate real-time results to allow a swift response by a security operator to prevent harmful consequences of unusual and antisocial events. The online learning capabilities also mean that the model can be continuously updated to accommodate natural changes in the environment. The third requirement for the system is the ability to run the process indefinitely. The mentioned requirements are necessary for real-world surveillance applications and the approaches that conform to these requirements need to be investigated. This thesis investigates unusual event detection methods that conform with real-world requirements and investigates the issue through theoretical and experimental study of machine learning and computer vision algorithms

    A computational framework for unsupervised analysis of everyday human activities

    Get PDF
    In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner. A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity. Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments.Ph.D.Committee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: David Hogg; Committee Member: Irfan Essa; Committee Member: James Reh

    Perceptual data mining : bootstrapping visual intelligence from tracking behavior

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 161-166).One common characteristic of all intelligent life is continuous perceptual input. A decade ago, simply recording and storing a a few minutes of full frame-rate NTSC video required special hardware. Today, an inexpensive personal computer can process video in real-time tracking and recording information about multiple objects for extended periods of time, which fundamentally enables this research. This thesis is about Perceptual Data Mining (PDM), the primary goal of which is to create a real-time, autonomous perception system that can be introduced into a wide variety of environments and, through experience, learn to model the activity in that environment. The PDM framework infers as much as possible about the presence, type, identity, location, appearance, and activity of each active object in an environment from multiple video sources, without explicit supervision. PDM is a bottom-up, data-driven approach that is built on a novel, robust attention mechanism that reliably detects moving objects in a wide variety of environments. A correspondence system tracks objects through time and across multiple sensors producing sets of observations of objects that correspond to the same object in extended environments. Using a co-occurrence modeling technique that exploits the variation exhibited by objects as they move through the environment, the types of objects, the activities that objects perform, and the appearance of specific classes of objects are modeled. Different applications of this technique are demonstrated along with a discussion of the corresponding issues.(cont.) Given the resulting rich description of the active objects in the environment, it is possible to model temporal patterns. An effective method for modeling periodic cycles of activity is demonstrated in multiple environments. This framework can learn to concisely describe regularities of the activity in an environment as well as determine atypical observations. Though this is accomplished without any supervision, the introduction of a minimal amount of user interaction could be used to produce complex, task-specific perception systems.by Christopher P. Stauffer.Ph.D

    Smartphone-based human activity recognition

    Get PDF
    Cotutela Universitat Politècnica de Catalunya i Università degli Studi di GenovaHuman Activity Recognition (HAR) is a multidisciplinary research field that aims to gather data regarding people's behavior and their interaction with the environment in order to deliver valuable context-aware information. It has nowadays contributed to develop human-centered areas of study such as Ambient Intelligence and Ambient Assisted Living, which concentrate on the improvement of people's Quality of Life. The first stage to accomplish HAR requires to make observations from ambient or wearable sensor technologies. However, in the second case, the search for pervasive, unobtrusive, low-powered, and low-cost devices for achieving this challenging task still has not been fully addressed. In this thesis, we explore the use of smartphones as an alternative approach for performing the identification of physical activities. These self-contained devices, which are widely available in the market, are provided with embedded sensors, powerful computing capabilities and wireless communication technologies that make them highly suitable for this application. This work presents a series of contributions regarding the development of HAR systems with smartphones. In the first place we propose a fully operational system that recognizes in real-time six physical activities while also takes into account the effects of postural transitions that may occur between them. For achieving this, we cover some research topics from signal processing and feature selection of inertial data, to Machine Learning approaches for classification. We employ two sensors (the accelerometer and the gyroscope) for collecting inertial data. Their raw signals are the input of the system and are conditioned through filtering in order to reduce noise and allow the extraction of informative activity features. We also emphasize on the study of Support Vector Machines (SVMs), which are one of the state-of-the-art Machine Learning techniques for classification, and reformulate various of the standard multiclass linear and non-linear methods to find the best trade off between recognition performance, computational costs and energy requirements, which are essential aspects in battery-operated devices such as smartphones. In particular, we propose two multiclass SVMs for activity classification:one linear algorithm which allows to control over dimensionality reduction and system accuracy; and also a non-linear hardware-friendly algorithm that only uses fixed-point arithmetic in the prediction phase and enables a model complexity reduction while maintaining the system performance. The efficiency of the proposed system is verified through extensive experimentation over a HAR dataset which we have generated and made publicly available. It is composed of inertial data collected from a group of 30 participants which performed a set of common daily activities while carrying a smartphone as a wearable device. The results achieved in this research show that it is possible to perform HAR in real-time with a precision near 97\% with smartphones. In this way, we can employ the proposed methodology in several higher-level applications that require HAR such as ambulatory monitoring of the disabled and the elderly during periods above five days without the need of a battery recharge. Moreover, the proposed algorithms can be adapted to other commercial wearable devices recently introduced in the market (e.g. smartwatches, phablets, and glasses). This will open up new opportunities for developing practical and innovative HAR applications.El Reconocimiento de Actividades Humanas (RAH) es un campo de investigación multidisciplinario que busca recopilar información sobre el comportamiento de las personas y su interacción con el entorno con el propósito de ofrecer información contextual de alta significancia sobre las acciones que ellas realizan. Recientemente, el RAH ha contribuido en el desarrollo de áreas de estudio enfocadas a la mejora de la calidad de vida del hombre tales como: la inteligència ambiental (Ambient Intelligence) y la vida cotidiana asistida por el entorno para personas dependientes (Ambient Assisted Living). El primer paso para conseguir el RAH consiste en realizar observaciones mediante el uso de sensores fijos localizados en el ambiente, o bien portátiles incorporados de forma vestible en el cuerpo humano. Sin embargo, para el segundo caso, aún se dificulta encontrar dispositivos poco invasivos, de bajo consumo energético, que permitan ser llevados a cualquier lugar, y de bajo costo. En esta tesis, nosotros exploramos el uso de teléfonos móviles inteligentes (Smartphones) como una alternativa para el RAH. Estos dispositivos, de uso cotidiano y fácilmente asequibles en el mercado, están dotados de sensores embebidos, potentes capacidades de cómputo y diversas tecnologías de comunicación inalámbrica que los hacen apropiados para esta aplicación. Nuestro trabajo presenta una serie de contribuciones en relación al desarrollo de sistemas para el RAH con Smartphones. En primera instancia proponemos un sistema que permite la detección de seis actividades físicas en tiempo real y que, además, tiene en cuenta las transiciones posturales que puedan ocurrir entre ellas. Con este fin, hemos contribuido en distintos ámbitos que van desde el procesamiento de señales y la selección de características, hasta algoritmos de Aprendizaje Automático (AA). Nosotros utilizamos dos sensores inerciales (el acelerómetro y el giroscopio) para la captura de las señales de movimiento de los usuarios. Estas han de ser procesadas a través de técnicas de filtrado para la reducción de ruido, segmentación y obtención de características relevantes en la detección de actividad. También hacemos énfasis en el estudio de Máquinas de soporte vectorial (MSV) que son uno de los algoritmos de AA más usados en la actualidad. Para ello reformulamos varios de sus métodos estándar (lineales y no lineales) con el propósito de encontrar la mejor combinación de variables que garanticen un buen desempeño del sistema en cuanto a precisión, coste computacional y requerimientos de energía, los cuales son aspectos esenciales en dispositivos portátiles con suministro de energía mediante baterías. En concreto, proponemos dos MSV multiclase para la clasificación de actividad: un algoritmo lineal que permite el balance entre la reducción de la dimensionalidad y la precisión del sistema; y asimismo presentamos un algoritmo no lineal conveniente para dispositivos con limitaciones de hardware que solo utiliza aritmética de punto fijo en la fase de predicción y que permite reducir la complejidad del modelo de aprendizaje mientras mantiene el rendimiento del sistema. La eficacia del sistema propuesto es verificada a través de una experimentación extensiva sobre la base de datos RAH que hemos generado y hecho pública en la red. Esta contiene la información inercial obtenida de un grupo de 30 participantes que realizaron una serie de actividades de la vida cotidiana en un ambiente controlado mientras tenían sujeto a su cintura un smartphone que capturaba su movimiento. Los resultados obtenidos en esta investigación demuestran que es posible realizar el RAH en tiempo real con una precisión cercana al 97%. De esta manera, podemos emplear la metodología propuesta en aplicaciones de alto nivel que requieran el RAH tales como monitorizaciones ambulatorias para personas dependientes (ej. ancianos o discapacitados) durante periodos mayores a cinco días sin la necesidad de recarga de baterías.Postprint (published version

    The Relational Bases of Lifestyle Similarity and Clustering of Local Populations.

    Full text link
    Lifestyle clustering is explored as a form of stratification within local populations. The concept of lifestyle cluster is juxtaposed to concepts of lifestyle segment, class, status group, strata, and other models of social inequality, which are argued to be variants of the lifestyle cluster concept. A relational theory is outlined, explaining lifestyle clustering as a result of basic social and psychological processes operating on the distribution of various forms of capital. Marx’s locus of economic capital in relations of production is expanded to conceive of the locus of all forms of capital in social relations generally. Capital situation is thereby equated to network position, placing the network analytic concept of equivalence at the root of lifestyle clusters and class analysis. The resulting model focuses on lifestyle distinction and habitus, following Bourdieu, but welds this framework to an explicitly social network methodology and perspective of relationality. It is argued that relations with specific others serves as a proxy for relationality generally. A method is developed and undertaken for testing model expectations about the distribution of lifestyle in a local population and how that distribution correlates with positions in the local social network of a rural U.S. community. A Social Network Survey is designed and administered. Relation type equivalence is developed as a alternative to regular equivalence, which is computationally prohibitive for such a large network. Lifestyle data for N=1203 persons in a network that includes partial information for over 13,000 nodes were analyzed. Lifestyle profiles are offered for about 50 clusters. Evidence of lifestyle ‘clumpiness’ is found, but no evidence is found of cluster separation or boundedness. Relation Type Equivalence is found to have moderate predictive power with lifestyle cluster membership and dyadic lifestyle similarity. Occupation, age, and gender are also found to have an impact. Asset ownership variables are found to have very little impact. Methodological limitations of the project are discussed.Ph.D.SociologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/58453/1/kaptbly_1.pd

    The Emergence, Establishment och Expansion of Fear of Crime Research in Sweden

    Get PDF
    The purpose of this dissertation is to construct an historical account of the emergence, establishment, and expansion of fear of crime research in Sweden. This dissertation aims to answer questions about the function, spread, and high level of institutional engagement of fear of crime research by analyzing the literature and examining the methodological, theoretical, and epistemological origins of fear of crime research itself. What happened when fear of crime was translated as "otrygghet", a word with a previously established meaning in Swedish? The analysis on the emergence of fear of crime in Sweden is based on documents, a survey of Swedish municipalities, and key informant interviews. The question of conceptual change is addressed through comparing how "otrygghet" is used by Socialdemokraterna and Moderaterna in motions and bills from the Swedish Riksdag across five time periods: 1978, 1988, 1998, 2008 and 2018. The dissertation is theoretically inspired by a Foucauldian interest in the intersection of power and knowledge and by an interest in historicizing the sociological and criminological development that this thesis depicts, using the work of Stuart Hall. The analysis of conceptual change is inspired by the conceptual historian Reinhart Koselleck. The results show a rapid and striking expansion of fear of crime measurements during the 2000s. From 2003 to 2007, the number of national surveys containing fear of crime indicators grew from one to six, to include The Survey of Living Conditions that premiered in 1978, the Local Youth Politics Survey in 2003, The National Public Health survey in 2004, The Citizen Survey in 2005, The Swedish Crime Survey in 2006, and The Swedish Contingencies Agency Survey in 2007. For the municipalities, the period with the most dramatic increase in fear of crime measurements happens in the 2010’s, The percentage of municipalities that don’t do fear of crime surveys decreases from 98 percent before 1995, to 94 percent in 1995–1999, 74 percent in 2000–2004, 51 percent in 2005–2009, 30 percent in 2010–2014 and only 16 percent durign the last examined period, 2015–2018. The analysis also shows that the meaning of "otrygghet" has undergone significant changes. From being used as a descriptive term commonly signifying economic and materialist unpredictability, over time "otrygghet" has come to be used almost exclusively in a crime context. The concept is exclusively used to argue for increased measures of police control and judicial expansion during the last examined period of 2018. This dissertation argues that the development and expansion of fear of crime research can be understood by examining the function that fear of crime research fulfils in legitimizing an increased level of state control, which makes it a good fit for the penal politics of late modernity

    Stories of change : case study challenge 2019-2020

    Get PDF
    Modern India has a history of a vibrant and active social sector. Many local development organisations, community organizations, social movements and non-governmental organisations populate the space of social action. Such organisations imagine a different future and plan and implement social interventions at different scales, many of which have lasting impact on the lives of people and society. However, their efforts and, more importantly, the learning from these initiatives remains largely unknown not only in the public sphere but also in the worlds of ‘development practice’ and ‘development education’. This shortfall impedes the process of learning and growth across interventions, organizations and time
    corecore