1,229 research outputs found
A Primer on Motion Capture with Deep Learning: Principles, Pitfalls and Perspectives
Extracting behavioral measurements non-invasively from video is stymied by
the fact that it is a hard computational problem. Recent advances in deep
learning have tremendously advanced predicting posture from videos directly,
which quickly impacted neuroscience and biology more broadly. In this primer we
review the budding field of motion capture with deep learning. In particular,
we will discuss the principles of those novel algorithms, highlight their
potential as well as pitfalls for experimentalists, and provide a glimpse into
the future.Comment: Review, 21 pages, 8 figures and 5 boxe
Rethinking pose estimation in crowds: overcoming the detection information-bottleneck and ambiguity
Frequent interactions between individuals are a fundamental challenge for
pose estimation algorithms. Current pipelines either use an object detector
together with a pose estimator (top-down approach), or localize all body parts
first and then link them to predict the pose of individuals (bottom-up). Yet,
when individuals closely interact, top-down methods are ill-defined due to
overlapping individuals, and bottom-up methods often falsely infer connections
to distant bodyparts. Thus, we propose a novel pipeline called bottom-up
conditioned top-down pose estimation (BUCTD) that combines the strengths of
bottom-up and top-down methods. Specifically, we propose to use a bottom-up
model as the detector, which in addition to an estimated bounding box provides
a pose proposal that is fed as condition to an attention-based top-down model.
We demonstrate the performance and efficiency of our approach on animal and
human pose estimation benchmarks. On CrowdPose and OCHuman, we outperform
previous state-of-the-art models by a significant margin. We achieve 78.5 AP on
CrowdPose and 48.5 AP on OCHuman, an improvement of 8.6% and 7.8% over the
prior art, respectively. Furthermore, we show that our method strongly improves
the performance on multi-animal benchmarks involving fish and monkeys. The code
is available at https://github.com/amathislab/BUCTDComment: Published at ICCV 2023; Code at https://github.com/amathislab/BUCTD
Video at https://www.youtube.com/watch?v=BHZnA-CZeZ
AmadeusGPT: a natural language interface for interactive animal behavioral analysis
The process of quantifying and analyzing animal behavior involves translating
the naturally occurring descriptive language of their actions into
machine-readable code. Yet, codifying behavior analysis is often challenging
without deep understanding of animal behavior and technical machine learning
knowledge. To limit this gap, we introduce AmadeusGPT: a natural language
interface that turns natural language descriptions of behaviors into
machine-executable code. Large-language models (LLMs) such as GPT3.5 and GPT4
allow for interactive language-based queries that are potentially well suited
for making interactive behavior analysis. However, the comprehension capability
of these LLMs is limited by the context window size, which prevents it from
remembering distant conversations. To overcome the context window limitation,
we implement a novel dual-memory mechanism to allow communication between
short-term and long-term memory using symbols as context pointers for retrieval
and saving. Concretely, users directly use language-based definitions of
behavior and our augmented GPT develops code based on the core AmadeusGPT API,
which contains machine learning, computer vision, spatio-temporal reasoning,
and visualization modules. Users then can interactively refine results, and
seamlessly add new behavioral modules as needed. We benchmark AmadeusGPT and
show we can produce state-of-the-art performance on the MABE 2022 behavior
challenge tasks. Note, an end-user would not need to write any code to achieve
this. Thus, collectively AmadeusGPT presents a novel way to merge deep
biological knowledge, large-language models, and core computer vision modules
into a more naturally intelligent system. Code and demos can be found at:
https://github.com/AdaptiveMotorControlLab/AmadeusGPT.Comment: demo available https://github.com/AdaptiveMotorControlLab/AmadeusGP
Optimal Population Codes for Space: Grid Cells Outperform Place Cells
Rodents use two distinct neuronal coordinate systems to estimate their position: place fields in the hippocampus and grid fields in the entorhinal cortex. Whereas place cells spike at only one particular spatial location, grid cells fire at multiple sites that correspond to the points of an imaginary hexagonal lattice. We study how to best construct place and grid codes, taking the probabilistic nature of neural spiking into account. Which spatial encoding properties of individual neurons confer the highest resolution when decoding the animal’s position from the neuronal population response? A priori, estimating a spatial position from a grid code could be ambiguous, as regular periodic lattices possess translational symmetry. The solution to this problem requires lattices for grid cells with different spacings; the spatial resolution crucially depends on choosing the right ratios of these spacings across the population. We compute the expected error in estimating the position in both the asymptotic limit, using Fisher information, and for low spike counts, using maximum likelihood estimation. Achieving high spatial resolution and covering a large range of space in a grid code leads to a trade-off: the best grid code for spatial resolution is built of nested modules with different spatial periods, one inside the other, whereas maximizing the spatial range requires distinct spatial periods that are pairwisely incommensurate. Optimizing the spatial resolution predicts two grid cell properties that have been experimentally observed. First, short lattice spacings should outnumber long lattice spacings. Second, the grid code should be self-similar across different lattice spacings, so that the grid field always covers a fixed fraction of the lattice period. If these conditions are satisfied and the spatial “tuning curves” for each neuron span the same range of firing rates, then the resolution of the grid code easily exceeds that of the best possible place code with the same number of neurons
The representation of space in mammals
Animals require cognitive maps for efficiently navigating in their natural habitat. Cognitive maps are a neuronal representation of their outside world. In mammals, place cells and grid cells have been implicated to form the basis of these neuronal representations. Place cells are active at one particular location in an environment and grid cells at multiple locations of the external world that are arranged in a hexagonal lattice.As such, these cell types encode space in qualitatively different ways. Whereas the firing of one place cell is indicative of the animal's current location, the firing of one grid cell suggests that the animal is at any of the lattice's nodes. Thus, a population of place cells with varying parameters (place code) is required to exhaustively and uniquely represent an environment. Similarly, for grid cells a
population with diverse encoding parameters (grid code) is needed. Place cells indeed have varying parameters: different cells are active at different locations, and the active
locations have different sizes. Also, the hexagonal lattices of grid cells differ:
they are spatially shifted, have different distances between the nodes and the
sizes of the nodes vary in their magnitude. Hence, grid codes and place codes depend
on multiple parameters, but what is the effect of these on the representation of
space that they provide?In this thesis, we study, which parameters are key for an accurate representation of space by place and grid codes, respectively. Furthermore, we
investigate whether place and grid codes provide a qualitatively different spatial
resolution
- …