20 research outputs found

    Focus on the Positives: Self-Supervised Learning for Biodiversity Monitoring

    Get PDF
    We address the problem of learning self-supervised representations from unlabeled image collections. Unlike existing approaches that attempt to learn useful features by maximizing similarity between augmented versions of each input image or by speculatively picking negative samples, we instead also make use of the natural variation that occurs in image collections that are captured using static monitoring cameras. To achieve this, we exploit readily available context data that encodes information such as the spatial and temporal relationships between the input images. We are able to learn representations that are surprisingly effective for downstream supervised classification, by first identifying high probability positive pairs at training time, i.e. those images that are likely to depict the same visual concept. For the critical task of global biodiversity monitoring, this results in image features that can be adapted to challenging visual species classification tasks with limited human supervision. We present results on four different camera trap image collections, across three different families of self-supervised learning methods, and show that careful image selection at training time results in superior performance compared to existing baselines such as conventional self-supervised training and transfer learning

    Patch based synthesis for single depth image super-resolution

    Get PDF
    We present an algorithm to synthetically increase the resolution of a solitary depth image using only a generic database of local patches. Modern range sensors measure depths with non-Gaussian noise and at lower starting resolutions than typical visible-light cameras. While patch based approaches for upsampling intensity images continue to improve, this is the first exploration of patching for depth images. We match against the height field of each low resolution input depth patch, and search our database for a list of appropriate high resolution candidate patches. Selecting the right candidate at each location in the depth image is then posed as a Markov random field labeling problem. Our experiments also show how important further depth-specific processing, such as noise removal and correct patch normalization, dramatically improves our results. Perhaps surprisingly, even better results are achieved on a variety of real test scenes by providing our algorithm with only synthetic training depth data

    Shazam For Bats: Internet of Things for Continuous Real-Time Biodiversity Monitoring

    Get PDF
    Biodiversity surveys are often required for development projects in cities that could affect protected species such as bats. Bats are important biodiversity indicators of the wider health of the environment and activity surveys of bat species are used to report on the performance of mitigation actions. Typically, sensors are used in the field to listen to the ultrasonic echolocation calls of bats or the audio data is recorded for post-processing to calculate the activity levels. Current methods rely on significant human input and therefore present an opportunity for continuous monitoring and in situ machine learning detection of bat calls in the field. Here, we show the results from a longitudinal study of 15 novel Internet connected bat sensors—Echo Boxes—in a large urban park. The study provided empirical evidence of how edge processing can reduce network traffic and storage demands by several orders of magnitude, making it possible to run continuous monitoring activities for many months including periods which traditionally would not be monitored. Our results demonstrate how the combination of artificial intelligence techniques and low-cost sensor networks can be used to create novel insights for ecologists and conservation decision-maker

    Capturing the sounds of an urban greenspace

    Get PDF
    Acoustic data can be a source of important information about events and the environment in modern cities. To date, much of the focus has been on monitoring noise pollution, but the urban soundscape contains a rich variety of signals about both human and natural phenomena. We describe the CitySounds project, which has installed enclosed sensor kits at several locations across a heavily used urban greenspace in the city of Edinburgh. The acoustic monitoring components regularly capture short clips in real-time of both ultrasonic and audible noises, for example encompassing bats, birds and other wildlife, traffic, and human. The sounds are complemented by collecting other data from sensors, such as temperature and relative humidity. To ensure privacy and compliance with relevant legislation, robust methods render completely unintelligible any traces of voice or conversation that may incidentally be overheard by the sensors. We have adopted a variety of methods to encourage community engagement with the audio data and to communicate the richness of urban soundscapes to a general audience

    Supervised Algorithm Selection for Flow and Other Computer Vision Problems

    No full text
    Motion estimation is one of the core problems of computer vision. Given two or more frames from a video sequence, the goal is to find the temporal correspondence for one or more points from the sequence. For dense motion estimation, or optical flow, a dense correspondence field is sought between the pair of frames. A standard approach to optical flow involves constructing an energy function and then using some optimization scheme to find its minimum. These energy functions are hand designed to work well generally, with the intention that the global minimum corresponds to the ground truth temporal correspondence. As an alternative to these heuristic energy functions we aim to assess the quality of existing algorithms directly from training data. We show that the addition of an offline training phase can improve the quality of motion estimation. For optical flow, decisions such as which algorithm to use and when to trust its accuracy, can all be learned from training data. Generating ground truth optical flow data is a difficult and time consuming process. We propose the use of synthetic data for training and present a new dataset for optical flow evaluation and a tool for generating an unlimited quantity of ground truth correspondence data. We use this method for generating data to synthesize depth images for the problem of depth image super-resolution and show that it is superior to real data. We present results for optical flow confidence estimation with improved performance on a standard benchmark dataset. Using a similar feature representation, we extend this work to occlusion region detection and present state of the art results for challenging real scenes. Finally, given a set of different algorithms we treat optical flow estimation as the problem of choosing the best algorithm from this set for a given pixel. However, posing algorithm selection as a standard classification problem assumes that class labels are disjoint. For each training example it is assumed that there is only one class label that correctly describes it, and that all other labels are equally bad. To overcome this, we propose a novel example dependent cost-sensitive learning algorithm based on decision trees where each label is instead a vector representing a data point's affinity for each of the algorithms. We show that this new algorithm has improved accuracy compared to other classification baselines on several computer vision problems

    Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees

    No full text

    Evolving plastic responses in artificial cell models

    No full text
    Two variants of biologically inspired cell model, namely eukaryotic (containing a nucleus) and prokaryotic (without a nucleus) are compared in this research. The comparison investigates their relative evolvability and ability to integrate external environmental stimulus to direct protein pattern formation within a single cell. To the authors' knowledge there has been no reported work comparing the relative performance of eukaryotic and prokaryotic artificial cells models. We propose a novel system of protein translocation for eukaryotic cells based on the process of nucleocytoplasmic transport observed in biological cells. Results demonstrate that eukaryotic cell models exhibit a higher degree of sensitivity to environmental variations compared with prokaryotes. Based on these results we conclude that the process of transporting proteins to and from the nucleus plays a key role in shaping eukaryotic cell plasticity

    Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners

    No full text
    In real-world applications of education, an effective teacher adaptively chooses the next example to teach based on the learner's current state. However, most existing work in algorithmic machine teaching focuses on the batch setting, where adaptivity plays no role. In this paper, we study the case of teaching consistent, version space learners in an interactive setting. At any time step, the teacher provides an example, the learner performs an update, and the teacher observes the learner's new state. We highlight that adaptivity does not speed up the teaching process when considering existing models of version space learners, such as "worst-case" (the learner picks the next hypothesis randomly from the version space) and "preference-based" (the learner picks hypothesis according to some global preference). Inspired by human teaching, we propose a new model where the learner picks hypotheses according to some local preference defined by the current hypothesis. We show that our model exhibits several desirable properties, e.g., adaptivity plays a key role, and the learner's transitions over hypotheses are smooth/interpretable. We develop efficient teaching algorithms and demonstrate our results via simulation and user studies

    Nonrigid surface registration and completion from RGBD images

    No full text
    Nonrigid surface registration is a challenging problem that suffers from many ambiguities. Existing methods typically assume the availability of full volumetric data, or require a global model of the surface of interest. In this paper, we introduce an approach to nonrigid registration that performs on relatively low-quality RGBD images and does not assume prior knowledge of the global surface shape. To this end, we model the surface as a collection of patches, and infer the patch deformations by performing inference in a graphical model. Our representation lets us fill in the holes in the input depth maps, thus essentially achieving surface completion. Our experimental evaluation demonstrates the effectiveness of our approach on several sequences, as well as its robustness to missing data and occlusions

    The temporal opportunist: self-supervised multi-frame monocular depth

    No full text
    Self-supervised monocular depth estimation networks are trained to predict scene depth using nearby frames as a supervision signal during training. However, for many applications, sequence information in the form of video frames is also available at test time. The vast majority of monocular networks do not make use of this extra signal, thus ignoring valuable information that could be used to improve the predicted depth. Those that do, either use computationally expensive test-time refinement techniques or off-the- shelf recurrent networks, which only indirectly make use of the geometric information that is inherently available.We propose ManyDepth, an adaptive approach to dense depth estimation that can make use of sequence information at test time, when it is available. Taking inspiration from multi-view stereo, we propose a deep end-to-end cost volume based approach that is trained using self-supervision only. We present a novel consistency loss that encourages the network to ignore the cost volume when it is deemed unreliable, e.g. in the case of moving objects, and an augmentation scheme to cope with static cameras. Our detailed experiments on both KITTI and Cityscapes show that we outperform all published self-supervised baselines, including those that use single or multiple frames at test time
    corecore