1,229 research outputs found

    A Primer on Motion Capture with Deep Learning: Principles, Pitfalls and Perspectives

    Full text link
    Extracting behavioral measurements non-invasively from video is stymied by the fact that it is a hard computational problem. Recent advances in deep learning have tremendously advanced predicting posture from videos directly, which quickly impacted neuroscience and biology more broadly. In this primer we review the budding field of motion capture with deep learning. In particular, we will discuss the principles of those novel algorithms, highlight their potential as well as pitfalls for experimentalists, and provide a glimpse into the future.Comment: Review, 21 pages, 8 figures and 5 boxe

    Rethinking pose estimation in crowds: overcoming the detection information-bottleneck and ambiguity

    Full text link
    Frequent interactions between individuals are a fundamental challenge for pose estimation algorithms. Current pipelines either use an object detector together with a pose estimator (top-down approach), or localize all body parts first and then link them to predict the pose of individuals (bottom-up). Yet, when individuals closely interact, top-down methods are ill-defined due to overlapping individuals, and bottom-up methods often falsely infer connections to distant bodyparts. Thus, we propose a novel pipeline called bottom-up conditioned top-down pose estimation (BUCTD) that combines the strengths of bottom-up and top-down methods. Specifically, we propose to use a bottom-up model as the detector, which in addition to an estimated bounding box provides a pose proposal that is fed as condition to an attention-based top-down model. We demonstrate the performance and efficiency of our approach on animal and human pose estimation benchmarks. On CrowdPose and OCHuman, we outperform previous state-of-the-art models by a significant margin. We achieve 78.5 AP on CrowdPose and 48.5 AP on OCHuman, an improvement of 8.6% and 7.8% over the prior art, respectively. Furthermore, we show that our method strongly improves the performance on multi-animal benchmarks involving fish and monkeys. The code is available at https://github.com/amathislab/BUCTDComment: Published at ICCV 2023; Code at https://github.com/amathislab/BUCTD Video at https://www.youtube.com/watch?v=BHZnA-CZeZ

    AmadeusGPT: a natural language interface for interactive animal behavioral analysis

    Full text link
    The process of quantifying and analyzing animal behavior involves translating the naturally occurring descriptive language of their actions into machine-readable code. Yet, codifying behavior analysis is often challenging without deep understanding of animal behavior and technical machine learning knowledge. To limit this gap, we introduce AmadeusGPT: a natural language interface that turns natural language descriptions of behaviors into machine-executable code. Large-language models (LLMs) such as GPT3.5 and GPT4 allow for interactive language-based queries that are potentially well suited for making interactive behavior analysis. However, the comprehension capability of these LLMs is limited by the context window size, which prevents it from remembering distant conversations. To overcome the context window limitation, we implement a novel dual-memory mechanism to allow communication between short-term and long-term memory using symbols as context pointers for retrieval and saving. Concretely, users directly use language-based definitions of behavior and our augmented GPT develops code based on the core AmadeusGPT API, which contains machine learning, computer vision, spatio-temporal reasoning, and visualization modules. Users then can interactively refine results, and seamlessly add new behavioral modules as needed. We benchmark AmadeusGPT and show we can produce state-of-the-art performance on the MABE 2022 behavior challenge tasks. Note, an end-user would not need to write any code to achieve this. Thus, collectively AmadeusGPT presents a novel way to merge deep biological knowledge, large-language models, and core computer vision modules into a more naturally intelligent system. Code and demos can be found at: https://github.com/AdaptiveMotorControlLab/AmadeusGPT.Comment: demo available https://github.com/AdaptiveMotorControlLab/AmadeusGP

    Optimal Population Codes for Space: Grid Cells Outperform Place Cells

    Get PDF
    Rodents use two distinct neuronal coordinate systems to estimate their position: place fields in the hippocampus and grid fields in the entorhinal cortex. Whereas place cells spike at only one particular spatial location, grid cells fire at multiple sites that correspond to the points of an imaginary hexagonal lattice. We study how to best construct place and grid codes, taking the probabilistic nature of neural spiking into account. Which spatial encoding properties of individual neurons confer the highest resolution when decoding the animal’s position from the neuronal population response? A priori, estimating a spatial position from a grid code could be ambiguous, as regular periodic lattices possess translational symmetry. The solution to this problem requires lattices for grid cells with different spacings; the spatial resolution crucially depends on choosing the right ratios of these spacings across the population. We compute the expected error in estimating the position in both the asymptotic limit, using Fisher information, and for low spike counts, using maximum likelihood estimation. Achieving high spatial resolution and covering a large range of space in a grid code leads to a trade-off: the best grid code for spatial resolution is built of nested modules with different spatial periods, one inside the other, whereas maximizing the spatial range requires distinct spatial periods that are pairwisely incommensurate. Optimizing the spatial resolution predicts two grid cell properties that have been experimentally observed. First, short lattice spacings should outnumber long lattice spacings. Second, the grid code should be self-similar across different lattice spacings, so that the grid field always covers a fixed fraction of the lattice period. If these conditions are satisfied and the spatial “tuning curves” for each neuron span the same range of firing rates, then the resolution of the grid code easily exceeds that of the best possible place code with the same number of neurons

    The representation of space in mammals

    Get PDF
    Animals require cognitive maps for efficiently navigating in their natural habitat. Cognitive maps are a neuronal representation of their outside world. In mammals, place cells and grid cells have been implicated to form the basis of these neuronal representations. Place cells are active at one particular location in an environment and grid cells at multiple locations of the external world that are arranged in a hexagonal lattice.As such, these cell types encode space in qualitatively different ways. Whereas the firing of one place cell is indicative of the animal's current location, the firing of one grid cell suggests that the animal is at any of the lattice's nodes. Thus, a population of place cells with varying parameters (place code) is required to exhaustively and uniquely represent an environment. Similarly, for grid cells a population with diverse encoding parameters (grid code) is needed. Place cells indeed have varying parameters: different cells are active at different locations, and the active locations have different sizes. Also, the hexagonal lattices of grid cells differ: they are spatially shifted, have different distances between the nodes and the sizes of the nodes vary in their magnitude. Hence, grid codes and place codes depend on multiple parameters, but what is the effect of these on the representation of space that they provide?In this thesis, we study, which parameters are key for an accurate representation of space by place and grid codes, respectively. Furthermore, we investigate whether place and grid codes provide a qualitatively different spatial resolution
    corecore