1,163 research outputs found

    Detecting abnormal fish trajectories using clustered and labeled data

    Get PDF
    We propose an approach for the analysis of fish trajectories in unconstrained underwater videos. Trajectories are classified into two classes: normal trajectories which contain the usual behavior of fish and abnormal trajectories which indicate the behaviors that are not as common as the normal class. The paper presents two innovations: 1) a novel approach to abnormal trajectory detection and 2) improved performance on video based abnormal trajectory analysis of fish in unconstrained conditions. First we extract a set of features from trajectories and apply PCA. We then perform clustering on a subset of features. Based on the clustering, outlier detection is applied to each cluster. Improved results are obtained which is significant considering the challenges of underwater environments, low video quality, and erratic movement of fish

    Fish4Knowledge: Collecting and Analyzing Massive Coral Reef Fish Video Data

    Get PDF
    This book gives a start-to-finish overview of the whole Fish4Knowledge project, in 18 short chapters, each describing one aspect of the project. The Fish4Knowledge project explored the possibilities of big video data, in this case from undersea video. Recording and analyzing 90 thousand hours of video from ten camera locations, the project gives a 3 year view of fish abundance in several tropical coral reefs off the coast of Taiwan. The research system built a remote recording network, over 100 Tb of storage, supercomputer processing, video target detection and

    Unsupervised Video Anomaly Detection with Diffusion Models Conditioned on Compact Motion Representations

    Full text link
    This paper aims to address the unsupervised video anomaly detection (VAD) problem, which involves classifying each frame in a video as normal or abnormal, without any access to labels. To accomplish this, the proposed method employs conditional diffusion models, where the input data is the spatiotemporal features extracted from a pre-trained network, and the condition is the features extracted from compact motion representations that summarize a given video segment in terms of its motion and appearance. Our method utilizes a data-driven threshold and considers a high reconstruction error as an indicator of anomalous events. This study is the first to utilize compact motion representations for VAD and the experiments conducted on two large-scale VAD benchmarks demonstrate that they supply relevant information to the diffusion model, and consequently improve VAD performances w.r.t the prior art. Importantly, our method exhibits better generalization performance across different datasets, notably outperforming both the state-of-the-art and baseline methods. The code of our method is available at https://github.com/AnilOsmanTur/conditioned_video_anomaly_diffusionComment: Accepted to ICIAP 202

    Detection of Marine Animals in a New Underwater Dataset with Varying Visibility

    Get PDF

    Detection of unusual fish trajectories from underwater videos

    Get PDF
    Fish behaviour analysis is a fundamental research area in marine ecology as it is helpful for detecting environmental changes by observing unusual fish patterns or new fish behaviours. The traditional way of analysing fish behaviour is by visual inspection using human observers, which is very time consuming and also limits the amount of data that can be processed. Therefore, there is a need for automatic algorithms to identify fish behaviours by using computer vision and machine learning techniques. The aim of this thesis is to help marine biologists with their work. We focus on behaviour understanding and analysis of detected and tracked fish with unusual behaviour detection approaches. Normal fish trajectories exhibit frequently observed behaviours while unusual trajectories are outliers or rare trajectories. This thesis proposes 3 approaches to detecting unusual trajectories: i) a filtering mechanism for normal fish trajectories, ii) an unusual fish trajectory classification method using clustered and labelled data and iii) an unusual fish trajectory classification approach using a clustering based hierarchical decomposition. The rule based trajectory filtering mechanism is proposed to remove normal fish trajectories which potentially helps to increase the accuracy of the unusual fish behaviour detection system. The aim is to reject normal fish trajectories as much as possible while not rejecting unusual fish trajectories. The results show that this method successfully filters out normal trajectories with a low false negative rate. This method is useful to assist building a ground truth data set from a very large fish trajectory repository, especially when the amount of normal fish trajectories greatly dominates the unusual fish trajectories. Moreover, it successfully distinguishes true fish trajectories from false fish trajectories which result from errors by the fish detection and tracking algorithms. A key contribution of this thesis is the proposed flat classifier, which uses an outlier detection method based on cluster cardinalities and a distance function to detect unusual fish trajectories. Clustered and labelled data are used to select feature sets which perform best on a training set. To describe fish trajectories 10 groups of trajectory descriptions are proposed which were not previously used for fish behaviour analysis. The proposed flat classifier improved the performance of unusual fish detection compared to the filtering approach. The performance of the flat classifier is further improved by integrating it into a hierarchical decomposition. This hierarchical decomposition method selects more specific features for different trajectory clusters which is useful considering the trajectory variety. Significantly improved results were obtained using this hierarchical decomposition in comparison to the flat classifier. This hierarchical framework is also applied to classification of more general imbalanced data sets which is a key current topic in machine learning. The experiments showed that the proposed hierarchical decomposition method is significantly better than the state of art classification methods, other outlier detection methods and unusual trajectory detection methods. Furthermore, it is successful at classifying imbalanced data sets even though the majority and minority classes contain varieties, and classes overlap which is frequently seen in real-world applications. Finally, we explored the benefits of active learning in the context of the hierarchical decomposition method, where active learning query strategies choose the most informative training data. A substantial performance gain is possible by using less labelled training data compared to learning from larger labelled data sets. Additionally, active learning with feature selection is investigated. The results show that feature selection has a positive effect on the performance of active learning. However, we show that random selection can be as effective as popular active learning query strategies in combination with active learning and feature selection, especially for imbalanced set classification

    Extracting Statistically Significant Behaviour from Fish Tracking Data With and Without Large Dataset Cleaning

    Get PDF
    Extracting a statistically significant result from video of natural phenomenon can be difficult for two reasons: (i) there can be considerable natural variation in the observed behaviour and (ii) computer vision algorithms applied to natural phenomena may not perform correctly on a significant number of samples. This study presents one approach to clean a large noisy visual tracking dataset to allow extracting statistically sound results from the image data. In particular, analyses of 3.6 million underwater trajectories of a fish with the water temperature at the time of acquisition are presented. Although there are many false detections and incorrect trajectory assignments, by a combination of data binning and robust estimation methods, reliable evidence for an increase in fish speed as water temperature increases are demonstrated. Then, a method for data cleaning which removes outliers arising from false detections and incorrect trajectory assignments using a deep learning‐based clustering algorithm is proposed. The corresponding results show a rise in fish speed as temperature goes up. Several statistical tests applied to both cleaned and not‐cleaned data confirm that both results are statistically significant and show an increasing trend. However, the latter approach also generates a cleaner dataset suitable for other analysis

    Genetic programming of the visual forebrain in the absence of retinal input

    Get PDF

    Labeling and modeling large databases of videos

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 91-98).As humans, we can say many things about the scenes surrounding us. For instance, we can tell what type of scene and location an image depicts, describe what objects live in it, their material properties, or their spatial arrangement. These comprise descriptions of a scene and are majorly studied areas in computer vision. This thesis, however, hypotheses that observers have an inherent prior knowledge that can be applied to the scene at hand. This prior knowledge can be translated into the cognisance of which objects move, or in the trajectories and velocities to expect. Conversely, when faced with unusual events such as car accidents, humans are very well tuned to identify them regardless of having observed the scene a priori. This is, in part, due to prior observations that we have for scenes with similar configurations to the current one. This thesis emulates the prior knowledge base of humans by creating a large and heterogeneous database and annotation tool for videos depicting real world scenes. The first application of this thesis is in the area of unusual event detection. Given a short clip, the task is to identify the moving portions of the scene that depict abnormal events. We adopt a data-driven framework powered by scene matching techniques to retrieve the videos nearest to the query clip and integrate the motion information in the nearest videos. The result is a final clip with localized annotations for unusual activity. The second application lies in the area of event prediction. Given a static image, we adapt our framework to compile a prediction of motions to expect in the image. This result is crafted by integrating the knowledge of videos depicting scenes similar to the query image. With the help of scene matching, only scenes relevant to the queries are considered, resulting in reliable predictions. Our dataset, experimentation, and proposed model introduce and explore a new facet of scene understanding in images and videos.by Jenny Yuen.Ph.D

    Taming Crowded Visual Scenes

    Get PDF
    Computer vision algorithms have played a pivotal role in commercial video surveillance systems for a number of years. However, a common weakness among these systems is their inability to handle crowded scenes. In this thesis, we have developed algorithms that overcome some of the challenges encountered in videos of crowded environments such as sporting events, religious festivals, parades, concerts, train stations, airports, and malls. We adopt a top-down approach by first performing a global-level analysis that locates dynamically distinct crowd regions within the video. This knowledge is then employed in the detection of abnormal behaviors and tracking of individual targets within crowds. In addition, the thesis explores the utility of contextual information necessary for persistent tracking and re-acquisition of objects in crowded scenes. For the global-level analysis, a framework based on Lagrangian Particle Dynamics is proposed to segment the scene into dynamically distinct crowd regions or groupings. For this purpose, the spatial extent of the video is treated as a phase space of a time-dependent dynamical system in which transport from one region of the phase space to another is controlled by the optical flow. Next, a grid of particles is advected forward in time through the phase space using a numerical integration to generate a flow map . The flow map relates the initial positions of particles to their final positions. The spatial gradients of the flow map are used to compute a Cauchy Green Deformation tensor that quantifies the amount by which the neighboring particles diverge over the length of the integration. The maximum eigenvalue of the tensor is used to construct a forward Finite Time Lyapunov Exponent (FTLE) field that reveals the Attracting Lagrangian Coherent Structures (LCS). The same process is repeated by advecting the particles backward in time to obtain a backward FTLE field that reveals the repelling LCS. The attracting and repelling LCS are the time dependent invariant manifolds of the phase space and correspond to the boundaries between dynamically distinct crowd flows. The forward and backward FTLE fields are combined to obtain one scalar field that is segmented using a watershed segmentation algorithm to obtain the labeling of distinct crowd-flow segments. Next, abnormal behaviors within the crowd are localized by detecting changes in the number of crowd-flow segments over time. Next, the global-level knowledge of the scene generated by the crowd-flow segmentation is used as an auxiliary source of information for tracking an individual target within a crowd. This is achieved by developing a scene structure-based force model. This force model captures the notion that an individual, when moving in a particular scene, is subjected to global and local forces that are functions of the layout of that scene and the locomotive behavior of other individuals in his or her vicinity. The key ingredients of the force model are three floor fields that are inspired by research in the field of evacuation dynamics; namely, Static Floor Field (SFF), Dynamic Floor Field (DFF), and Boundary Floor Field (BFF). These fields determine the probability of moving from one location to the next by converting the long-range forces into local forces. The SFF specifies regions of the scene that are attractive in nature, such as an exit location. The DFF, which is based on the idea of active walker models, corresponds to the virtual traces created by the movements of nearby individuals in the scene. The BFF specifies influences exhibited by the barriers within the scene, such as walls and no-entry areas. By combining influence from all three fields with the available appearance information, we are able to track individuals in high-density crowds. The results are reported on real-world sequences of marathons and railway stations that contain thousands of people. A comparative analysis with respect to an appearance-based mean shift tracker is also conducted by generating the ground truth. The result of this analysis demonstrates the benefit of using floor fields in crowded scenes. The occurrence of occlusion is very frequent in crowded scenes due to a high number of interacting objects. To overcome this challenge, we propose an algorithm that has been developed to augment a generic tracking algorithm to perform persistent tracking in crowded environments. The algorithm exploits the contextual knowledge, which is divided into two categories consisting of motion context (MC) and appearance context (AC). The MC is a collection of trajectories that are representative of the motion of the occluded or unobserved object. These trajectories belong to other moving individuals in a given environment. The MC is constructed using a clustering scheme based on the Lyapunov Characteristic Exponent (LCE), which measures the mean exponential rate of convergence or divergence of the nearby trajectories in a given state space. Next, the MC is used to predict the location of the occluded or unobserved object in a regression framework. It is important to note that the LCE is used for measuring divergence between a pair of particles while the FTLE field is obtained by computing the LCE for a grid of particles. The appearance context (AC) of a target object consists of its own appearance history and appearance information of the other objects that are occluded. The intent is to make the appearance descriptor of the target object more discriminative with respect to other unobserved objects, thereby reducing the possible confusion between the unobserved objects upon re-acquisition. This is achieved by learning the distribution of the intra-class variation of each occluded object using all of its previous observations. In addition, a distribution of inter-class variation for each target-unobservable object pair is constructed. Finally, the re-acquisition decision is made using both the MC and the AC
    • 

    corecore