5 research outputs found

    Deep learning with self-supervision and uncertainty regularization to count fish in underwater images

    Full text link
    Effective conservation actions require effective population monitoring. However, accurately counting animals in the wild to inform conservation decision-making is difficult. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive but created a need to process and analyse this data efficiently. Counting animals from such data is challenging, particularly when densely packed in noisy images. Attempting this manually is slow and expensive, while traditional computer vision methods are limited in their generalisability. Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals. To this end, we employ deep learning, with a density-based regression approach, to count fish in low-resolution sonar images. We introduce a large dataset of sonar videos, deployed to record wild Lebranche mullet schools (Mugil liza), with a subset of 500 labelled images. We utilise abundant unlabelled data in a self-supervised task to improve the supervised counting task. For the first time in this context, by introducing uncertainty quantification, we improve model training and provide an accompanying measure of prediction uncertainty for more informed biological decision-making. Finally, we demonstrate the generalisability of our proposed counting framework through testing it on a recent benchmark dataset of high-resolution annotated underwater images from varying habitats (DeepFish). From experiments on both contrasting datasets, we demonstrate our network outperforms the few other deep learning models implemented for solving this task. By providing an open-source framework along with training data, our study puts forth an efficient deep learning template for crowd counting aquatic animals thereby contributing effective methods to assess natural populations from the ever-increasing visual data

    Action recognition using single-pixel time-of-flight detection

    Get PDF
    Action recognition is a challenging task that plays an important role in many robotic systems, which highly depend on visual input feeds. However, due to privacy concerns, it is important to find a method which can recognise actions without using visual feed. In this paper, we propose a concept for detecting actions while preserving the test subject's privacy. Our proposed method relies only on recording the temporal evolution of light pulses scattered back from the scene. Such data trace to record one action contains a sequence of one-dimensional arrays of voltage values acquired by a single-pixel detector at 1 GHz repetition rate. Information about both the distance to the object and its shape are embedded in the traces. We apply machine learning in the form of recurrent neural networks for data analysis and demonstrate successful action recognition. The experimental results show that our proposed method could achieve on average 96.47 % accuracy on the actions walking forward, walking backwards, sitting down, standing up and waving hand, using recurrent neural network

    Learning to recognize human actions: from hand-crafted to deep-learning based visual representations

    Get PDF
    [eng] Action recognition is a very challenging and important problem in computer vision. Researchers working on this field aspire to provide computers with the ability to visually perceive human actions – that is, to observe, interpret, and understand human-related events that occur in the physical environment merely from visual data. The applications of this technology are numerous: human-machine interaction, e-health, monitoring/surveillance, and content-based video retrieval, among others. Hand-crafted methods dominated the field until the apparition of the first successful deep learning-based action recognition works. Although earlier deep-based methods underperformed with respect to hand-crafted approaches, these slowly but steadily improved to become state-of-the-art, eventually achieving better results than hand-crafted ones. Still, hand-crafted approaches can be advantageous in certain scenarios, specially when not enough data is available to train very large deep models or simply to be combined with deep-based methods to further boost the performance. Hence, showing how hand-crafted features can provide extra knowledge the deep networks are not able to easily learn about human actions. This Thesis concurs in time with this change of paradigm and, hence, reflects it into two distinguished parts. In the first part, we focus on improving current successful hand-crafted approaches for action recognition and we do so from three different perspectives. Using the dense trajectories framework as a backbone: first, we explore the use of multi-modal and multi-view input data to enrich the trajectory descriptors. Second, we focus on the classification part of action recognition pipelines and propose an ensemble learning approach, where each classifier learns from a different set of local spatiotemporal features to then combine their outputs following an strategy based on the Dempster-Shaffer Theory. And third, we propose a novel hand-crafted feature extraction method that constructs a mid-level feature description to better model long-term spatiotemporal dynamics within action videos. Moving to the second part of the Thesis, we start with a comprehensive study of the current deep-learning based action recognition methods. We review both fundamental and cutting edge methodologies reported during the last few years and introduce a taxonomy of deep-learning methods dedicated to action recognition. In particular, we analyze and discuss how these handle the temporal dimension of data. Last but not least, we propose a residual recurrent network for action recognition that naturally integrates all our previous findings in a powerful and promising framework.[cat] El reconeixement d’accions és un repte de gran rellevància pel que fa a la visió per computador. Els investigadors que treballen en el camp aspiren a proveir als ordinadors l’habilitat de percebre visualment les accions humanes – és a dir, d’observar, interpretar i comprendre a partir de dades visuals els events que involucren humans i que transcorren en l’entorn físic. Les aplicacions d’aquesta tecnologia són nombroses: interacció home-màquina, e-salut, monitoració/vigilància, indexació de videocontingut, etc. Els mètodes de disseny manual han dominat el camp fins l’aparició dels primers treballs exitosos d’aprenentatge profund, els quals han acabat esdevenint estat de l’art. No obstant, els mètodes de disseny manual resulten útils en certs escenaris, com ara quan no es tenen prou dades per a l’entrenament dels mètodes profunds, així com també aportant coneixement addicional que aquests últims no són capaços d’aprendre fàcilment. És per això que sovint els trobem ambdós combinats, aconseguint una millora general del reconeixement. Aquesta Tesi ha concorregut en el temps amb aquest canvi de paradigma i, per tant, ho reflecteix en dues parts ben distingides. En la primera part, estudiem les possibles millores sobre els mètodes existents de característiques manualment dissenyades per al reconeixement d’accions, i ho fem des de diversos punts de vista. Fent ús de les trajectòries denses com a fonament del nostre treball: primer, explorem l’ús de dades d’entrada de múltiples modalitats i des de múltiples vistes per enriquir els descriptors de les trajectòries. Segon, ens centrem en la part de la classificació del reconeixement d’accions, proposant un assemblat de classificadors d’accions que actuen sobre diversos conjunts de característiques i fusionant-ne les sortides amb una estratégia basada en la Teoria de Dempster-Shaffer. I tercer, proposem un nou mètode de disseny manual d’extracció de característiques que construeix una descripció intermèdia dels videos per tal d’aconseguir un millor modelat de les dinàmiques espai-temporals de llarg termini presents en els vídeos d’accions. Pel que fa a la segona part de la Tesi, comencem amb un estudi exhaustiu els mètodes actuals d’aprenentatge profund pel reconeixement d’accions. En revisem les metodologies més fonamentals i les més avançades darrerament aparegudes i establim una taxonomia que en resumeix els aspectes més importants. Més concretament, analitzem com cadascun dels mètodes tracta la dimensió temporal de les dades de vídeo. Per últim però no menys important, proposem una nova xarxa de neurones recurrent amb connexions residuals que integra de manera implícita les nostres contribucions prèvies en un nou marc d’acoblament potent i que mostra resultats prometedors

    Multi-modal ambient intelligence framework for human behavior analysis

    No full text
    Treball acadèmic realitzat a la Facultat de Matemàtiques (UB

    Multi-modal ambient intelligence framework for human behavior analysis

    No full text
    Treball acadèmic realitzat a la Facultat de Matemàtiques (UB
    corecore