1,235 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Automated identification and behaviour classification for modelling social dynamics in group-housed mice
Mice are often used in biology as exploratory models of human conditions, due to their similar genetics and physiology. Unfortunately, research on behaviour has traditionally been limited to studying individuals in isolated environments and over short periods of time. This can miss critical time-effects, and, since mice are social creatures, bias results.
This work addresses this gap in research by developing tools to analyse the individual behaviour of group-housed mice in the home-cage over several days and with minimal disruption. Using data provided by the Mary Lyon Centre at MRC Harwell we designed an end-to-end system that (a) tracks and identifies mice in a cage, (b) infers their behaviour, and subsequently (c) models the group dynamics as functions of individual activities. In support of the above, we also curated and made available a large dataset of mouse localisation and behaviour classifications (IMADGE), as well as two smaller annotated datasets for training/evaluating the identification (TIDe) and behaviour inference (ABODe) systems. This research constitutes the first of its kind in terms of the scale and challenges addressed. The data source (side-view single-channel video with clutter and no identification markers for mice) presents challenging conditions for analysis, but has the potential to give richer information while using industry standard housing.
A Tracking and Identification module was developed to automatically detect, track and identify the (visually similar) mice in the cluttered home-cage using only single-channel IR video and coarse position from RFID readings. Existing detectors and trackers were combined with a novel Integer Linear Programming formulation to assign anonymous tracks to mouse identities. This utilised a probabilistic weight model of affinity between detections and RFID pickups.
The next task necessitated the implementation of the Activity Labelling module that classifies the behaviour of each mouse, handling occlusion to avoid giving unreliable classifications when the mice cannot be observed. Two key aspects of this were (a) careful feature-selection, and (b) judicious balancing of the errors of the system in line with the repercussions for our setup.
Given these sequences of individual behaviours, we analysed the interaction dynamics between mice in the same cage by collapsing the group behaviour into a sequence of interpretable latent regimes using both static and temporal (Markov) models. Using a permutation matrix, we were able to automatically assign mice to roles in the HMM, fit a global model to a group of cages and analyse abnormalities in data from a different demographic
Do Repeat Yourself: Understanding Sufficient Conditions for Restricted Chase Non-Termination
The disjunctive restricted chase is a sound and complete procedure for
solving boolean conjunctive query entailment over knowledge bases of
disjunctive existential rules. Alas, this procedure does not always terminate
and checking if it does is undecidable. However, we can use acyclicity notions
(sufficient conditions that imply termination) to effectively apply the chase
in many real-world cases. To know if these conditions are as general as
possible, we can use cyclicity notions (sufficient conditions that imply
non-termination). In this paper, we discuss some issues with previously
existing cyclicity notions, propose some novel notions for non-termination by
dismantling the original idea, and empirically verify the generality of the new
criteria
Inference of Resource Management Specifications
A resource leak occurs when a program fails to free some finite resource
after it is no longer needed. Such leaks are a significant cause of real-world
crashes and performance problems. Recent work proposed an approach to prevent
resource leaks based on checking resource management specifications. A resource
management specification expresses how the program allocates resources, passes
them around, and releases them; it also tracks the ownership relationship
between objects and resources, and aliasing relationships between objects.
While this specify-and-verify approach has several advantages compared to prior
techniques, the need to manually write annotations presents a significant
barrier to its practical adoption.
This paper presents a novel technique to automatically infer a resource
management specification for a program, broadening the applicability of
specify-and-check verification for resource leaks. Inference in this domain is
challenging because resource management specifications differ significantly in
nature from the types that most inference techniques target. Further, for
practical effectiveness, we desire a technique that can infer the resource
management specification intended by the developer, even in cases when the code
does not fully adhere to that specification. We address these challenges
through a set of inference rules carefully designed to capture real-world
coding patterns, yielding an effective fixed-point-based inference algorithm.
We have implemented our inference algorithm in two different systems,
targeting programs written in Java and C#. In an experimental evaluation, our
technique inferred 85.5% of the annotations that programmers had written
manually for the benchmarks. Further, the verifier issued nearly the same rate
of false alarms with the manually-written and automatically-inferred
annotations
Consistent Query Answering for Primary Keys on Rooted Tree Queries
We study the data complexity of consistent query answering (CQA) on databases
that may violate the primary key constraints. A repair is a maximal subset of
the database satisfying the primary key constraints. For a Boolean query q, the
problem CERTAINTY(q) takes a database as input, and asks whether or not each
repair satisfies q. The computational complexity of CERTAINTY(q) has been
established whenever q is a self-join-free Boolean conjunctive query, or a (not
necessarily self-join-free) Boolean path query. In this paper, we take one more
step towards a general classification for all Boolean conjunctive queries by
considering the class of rooted tree queries. In particular, we show that for
every rooted tree query q, CERTAINTY(q) is in FO, NL-hard LFP, or
coNP-complete, and it is decidable (in polynomial time), given q, which of the
three cases applies. We also extend our classification to larger classes of
queries with simple primary keys. Our classification criteria rely on query
homomorphisms and our polynomial-time fixpoint algorithm is based on a novel
use of context-free grammar (CFG).Comment: To appear in PODS'2
Temporal datalog with existential quantification
Existential rules, also known as tuple-generating
dependencies (TGDs) or Datalog± rules, are heavily studied in the communities of Knowledge
Representation and Reasoning, Semantic Web,
and Databases, due to their rich modelling capabilities. In this paper we consider TGDs in
the temporal setting, by introducing and studying DatalogMTL∃—an extension of metric temporal Datalog (DatalogMTL) obtained by allowing for existential rules in programs. We show that
DatalogMTL∃
is undecidable even in the restricted
cases of guarded and weakly-acyclic programs. To
address this issue we introduce uniform semantics
which, on the one hand, is well-suited for modelling temporal knowledge as it prevents from unintended value invention and, on the other hand,
provides decidability of reasoning; in particular, it
becomes 2-ExpSpace-complete for weakly-acyclic
programs but remains undecidable for guarded programs. We provide an implementation for the decidable case and demonstrate its practical feasibility. Thus we obtain an expressive, yet decidable,
rule-language and a system which is suitable for
complex temporal reasoning with existential rules
A Modified EM Algorithm for Shrinkage Estimation in Multivariate Hidden Markov Models
Τα κρυμμένα Μαρκοβιανά μοντέλα χρησιμοποιούνται σε ένα ευρύ πεδίο εφαρμογών, λόγω της κατασκευής
τους που τα καθιστά μαθηματικώς διαχειρίσιμα και επιτρέπει τη χρήση αποτελεσματικών υπολογιστικών
τεχνικών. ́Εχουν αναπτυχθεί μέθοδοι για την εκτίμηση των παραμέτρων του μοντέλου, όπως ο αλγόριθμος
EM, αλλά και για την εύρεση των κρυμμένων καταστάσεων της Μαρκοβιανής αλυσίδας, όπως ο αλγόριθμος
Viterbi.
Σε εφαρμογές στις οποίες η διάσταση των δεδομένων είναι συγκρίσιμη με το μέγεθος του δέιγματος,
είναι γνωστό πως ο δειγματικός πίνακας συνδιακύμανσης είναι αριθμητικά ασταθής, γεγονός που επηρεάζει
άμεσα το βήμα μεγιστοποίησης (M-step) του αλγορίθμου EM, στο οποίο εμπλέκεται ο υπολογισμός του
αντιστρόφου του. Το πρόβλημα αυτό μπορεί να ενταθεί λόγω ενδεχόμενης ύπαρξης καταστάσεων οι οποίες
εμφανίζονται σπάνια, με αποτέλεσμα το μέγεθος δείγματος για την εκτίμηση των αντίστοιχων παραμέτρων
να είναι μικρό. Επομένως, η άμεση χρήση αυτών των μεθόδων είναι πιθανό να οδηγήσει σε αριθμητικά προβ-
λήματα, όσον αφορά στην εκτίμηση του πίνακα συνδιακύμανσης και του αντιστρόφου του, επηρεάζοντας
επιπλέον την εκτίμηση του πίνακα πιθανοτήτων μετάβασης και την ανακατασκευή της κρυμμένης Μαρκο-
βιανής αλυσίδας.
Στη συγκεκριμένη εργασία μελετάται θεωρητικά και αλγοριθμικά μία τροποποίηση του αλγορίθμου EM,
έτσι ώστε ο εκτιμήτης που προκύπτει για τον πίνακα συνδιακύμανσης, κατά το βήμα μεγιστοποίησης, να
είναι αυτός που απορρέει από τη χρήση της μεθόδου συρρίκνωσης (shrinkage). Για τον σκοπό αυτό, στη
συνάρτηση της λογαριθμικής πιθανοφάνειας ενσωματώνονται κάποιες ποινές, ώστε να κανονικοποιηθεί το
αντίστοιχο πρόβλημα μεγιστοποίησης. Η συνάρτηση αυτή, χρησιμοποιείται και στο βήμα εκτίμησης (E-step).
Επίσης, μελετάται αλγοριθμικά και μία παραλλαγή αυτής της μεθόδου, στην οποία η συνάρτηση με τις ποινές
χρησιμοποιείται μόνο κατά το βήμα μεγιστοποίησης (M-step).Hidden Markov models are used in a wide range of applications due to their construction that
renders them mathematically tractable and allows for the use of efficient computational techniques.
There are methods for the estimation of the model’s parameters, such as the EM algorithm, but also
for the estimation of the hidden states of the underlying Markov chain, such as the Viterbi algorithm.
In applications where the dimension of the data is comparable to the sample size, the sample
covariance matrix is known to be ill-conditioned, which directly affects the maximisation step (M-
step) of the EM algorithm, where its inverse is involved in the computations. This problem might be
amplified if there are rarely visited states resulting in a small sample size for the estimation of the
corresponding parameters. Therefore, the direct implementation of these methods can be proved to
be troublesome, as many computational problems might occur in the estimation of the covariance
matrix and its inverse, further affecting the estimation of the one-step transition probability matrix
and the reconstruction of the hidden Markov chain.
In this paper, a modified version of the EM algorithm is studied, both theoretically and computa-
tionally, in order to obtain the shrinkage estimator of the covariance matrix during the maximisation
step. This is achieved by maximising a penalised log-likelihood function, which is also used in the
estimation step (E-step). A variant of this modified version, where the penalised log-likelihood func-
tion is only used in the maximisation step (M-step), is also studied computationally
- …