10 research outputs found

    Localization Recall Precision (LRP): A New Performance Metric for Object Detection

    Get PDF
    Average precision (AP), the area under the recall-precision (RP) curve, is the standard performance measure for object detection. Despite its wide acceptance, it has a number of shortcomings, the most important of which are (i) the inability to distinguish very different RP curves, and (ii) the lack of directly measuring bounding box localization accuracy. In this paper, we propose 'Localization Recall Precision (LRP) Error', a new metric which we specifically designed for object detection. LRP Error is composed of three components related to localization, false negative (FN) rate and false positive (FP) rate. Based on LRP, we introduce the 'Optimal LRP', the minimum achievable LRP error representing the best achievable configuration of the detector in terms of recall-precision and the tightness of the boxes. In contrast to AP, which considers precisions over the entire recall domain, Optimal LRP determines the 'best' confidence score threshold for a class, which balances the trade-off between localization and recall-precision. In our experiments, we show that, for state-of-the-art object (SOTA) detectors, Optimal LRP provides richer and more discriminative information than AP. We also demonstrate that the best confidence score thresholds vary significantly among classes and detectors. Moreover, we present LRP results of a simple online video object detector which uses a SOTA still image object detector and show that the class-specific optimized thresholds increase the accuracy against the common approach of using a general threshold for all classes. At https://github.com/cancam/LRP we provide the source code that can compute LRP for the PASCAL VOC and MSCOCO datasets. Our source code can easily be adapted to other datasets as well.Comment: to appear in ECCV 201

    Generalized optimal sub-pattern assignment metric

    Get PDF
    This paper presents the generalized optimal sub-pattern assignment (GOSPA) metric on the space of finite sets of targets. Compared to the well-established optimal sub-pattern assignment (OSPA) metric, GOSPA is not normalised by the cardinality of the largest set and it penalizes cardinality errors differently, which enables us to express it as an optimisation over assignments instead of permutations. An important consequence of this is that GOSPA allows us to penalize localization errors for detected targets and the errors due to missed and false targets, as indicated by traditional multiple target tracking (MTT) performance measures, in a sound manner. In addition, we extend the GOSPA metric to the space of random finite sets, which is important to evaluate MTT algorithms via simulations in a rigorous way

    Marginal multi-Bernoulli filters: RFS derivation of MHT, JIPDA and association-based MeMBer

    Full text link
    Recent developments in random finite sets (RFSs) have yielded a variety of tracking methods that avoid data association. This paper derives a form of the full Bayes RFS filter and observes that data association is implicitly present, in a data structure similar to MHT. Subsequently, algorithms are obtained by approximating the distribution of associations. Two algorithms result: one nearly identical to JIPDA, and another related to the MeMBer filter. Both improve performance in challenging environments.Comment: Journal version at http://ieeexplore.ieee.org/document/7272821. Matlab code of simple implementation included with ancillary file

    A Target Detection and Tracking Method for Multiple Radar Systems

    Get PDF
    Multiple radar systems represent an attractive option for target tracking because they can significantly enlarge the area coverage and improve both the probability of trajectory detection and the localization accuracy. The presence of multiple extended targets or weak targets is a challenge for multiple radar systems. Moreover, their performance may be severely deteriorated by regions characterized by a high clutter density. In this article, an algorithm for detection and tracking of multiple targets, extended or weak, based on measurements provided by multiple radars in an environment with heavily cluttered regions, is proposed. The proposed method features three stages. In the first stage, past measurements are exploited to build a spatiotemporal clutter map in each radar; a weight is then assigned to each measurement to assess its significance. In the second stage, a track-before-detect algorithm, based on a weighted 3-D Hough transform, is applied to obtain target tracklets. In the third stage, a low-complexity tracklet association method, exploiting a lion reproduction model, is applied to associate tracklets of the same target. Three experiments are presented to illustrate the effectiveness of the proposed approach. The first experiment is based on synthetic data, the second one is based on actual data from a radar network with two homogeneous air surveillance radars, and the third one is based on actual data from a radar network with four different marine surveillance radars. The results reveal that the proposed method can outperform competing approaches

    Enhanced particle PHD filtering for multiple human tracking

    Get PDF
    PhD ThesisVideo-based single human tracking has found wide application but multiple human tracking is more challenging and enhanced processing techniques are required to estimate the positions and number of targets in each frame. In this thesis, the particle probability hypothesis density (PHD) lter is therefore the focus due to its ability to estimate both localization and cardinality information related to multiple human targets. To improve the tracking performance of the particle PHD lter, a number of enhancements are proposed. The Student's-t distribution is employed within the state and measurement models of the PHD lter to replace the Gaussian distribution because of its heavier tails, and thereby better predict particles with larger amplitudes. Moreover, the variational Bayesian approach is utilized to estimate the relationship between the measurement noise covariance matrix and the state model, and a joint multi-dimensioned Student's-t distribution is exploited. In order to obtain more observable measurements, a backward retrodiction step is employed to increase the measurement set, building upon the concept of a smoothing algorithm. To make further improvement, an adaptive step is used to combine the forward ltering and backward retrodiction ltering operations through the similarities of measurements achieved over discrete time. As such, the errors in the delayed measurements generated by false alarms and environment noise are avoided. In the nal work, information describing human behaviour is employed iv Abstract v to aid particle sampling in the prediction step of the particle PHD lter, which is captured in a social force model. A novel social force model is proposed based on the exponential function. Furthermore, a Markov Chain Monte Carlo (MCMC) step is utilized to resample the predicted particles, and the acceptance ratio is calculated by the results from the social force model to achieve more robust prediction. Then, a one class support vector machine (OCSVM) is applied in the measurement model of the PHD lter, trained on human features, to mitigate noise from the environment and to achieve better tracking performance. The proposed improvements of the particle PHD lters are evaluated with benchmark datasets such as the CAVIAR, PETS2009 and TUD datasets and assessed with quantitative and global evaluation measures, and are compared with state-of-the-art techniques to con rm the improvement of multiple human tracking performance

    Suivi Multi-Locuteurs avec des Informations Audio-Visuelles pour la Perception des Robots

    Get PDF
    Robot perception plays a crucial role in human-robot interaction (HRI). Perception system provides the robot information of the surroundings and enables the robot to give feedbacks. In a conversational scenario, a group of people may chat in front of the robot and move freely. In such situations, robots are expected to understand where are the people, who are speaking, or what are they talking about. This thesis concentrates on answering the first two questions, namely speaker tracking and diarization. We use different modalities of the robot’s perception system to achieve the goal. Like seeing and hearing for a human-being, audio and visual information are the critical cues for a robot in a conversational scenario. The advancement of computer vision and audio processing of the last decade has revolutionized the robot perception abilities. In this thesis, we have the following contributions: we first develop a variational Bayesian framework for tracking multiple objects. The variational Bayesian framework gives closed-form tractable problem solutions, which makes the tracking process efficient. The framework is first applied to visual multiple-person tracking. Birth and death process are built jointly with the framework to deal with the varying number of the people in the scene. Furthermore, we exploit the complementarity of vision and robot motorinformation. On the one hand, the robot’s active motion can be integrated into the visual tracking system to stabilize the tracking. On the other hand, visual information can be used to perform motor servoing. Moreover, audio and visual information are then combined in the variational framework, to estimate the smooth trajectories of speaking people, and to infer the acoustic status of a person- speaking or silent. In addition, we employ the model to acoustic-only speaker localization and tracking. Online dereverberation techniques are first applied then followed by the tracking system. Finally, a variant of the acoustic speaker tracking model based on von-Mises distribution is proposed, which is specifically adapted to directional data. All the proposed methods are validated on datasets according to applications.La perception des robots joue un rôle crucial dans l’interaction homme-robot (HRI). Le système de perception fournit les informations au robot sur l’environnement, ce qui permet au robot de réagir en consequence. Dans un scénario de conversation, un groupe de personnes peut discuter devant le robot et se déplacer librement. Dans de telles situations, les robots sont censés comprendre où sont les gens, ceux qui parlent et de quoi ils parlent. Cette thèse se concentre sur les deux premières questions, à savoir le suivi et la diarisation des locuteurs. Nous utilisons différentes modalités du système de perception du robot pour remplir cet objectif. Comme pour l’humain, l’ouie et la vue sont essentielles pour un robot dans un scénario de conversation. Les progrès de la vision par ordinateur et du traitement audio de la dernière décennie ont révolutionné les capacités de perception des robots. Dans cette thèse, nous développons les contributions suivantes : nous développons d’abord un cadre variationnel bayésien pour suivre plusieurs objets. Le cadre bayésien variationnel fournit des solutions explicites, rendant le processus de suivi très efficace. Cette approche est d’abord appliqué au suivi visuel de plusieurs personnes. Les processus de créations et de destructions sont en adéquation avecle modèle probabiliste proposé pour traiter un nombre variable de personnes. De plus, nous exploitons la complémentarité de la vision et des informations du moteur du robot : d’une part, le mouvement actif du robot peut être intégré au système de suivi visuel pour le stabiliser ; d’autre part, les informations visuelles peuvent être utilisées pour effectuer l’asservissement du moteur. Par la suite, les informations audio et visuelles sont combinées dans le modèle variationnel, pour lisser les trajectoires et déduire le statut acoustique d’une personne : parlant ou silencieux. Pour experimenter un scenario où l’informationvisuelle est absente, nous essayons le modèle pour la localisation et le suivi des locuteurs basé sur l’information acoustique uniquement. Les techniques de déréverbération sont d’abord appliquées, dont le résultat est fourni au système de suivi. Enfin, une variante du modèle de suivi des locuteurs basée sur la distribution de von-Mises est proposée, celle-ci étant plus adaptée aux données directionnelles. Toutes les méthodes proposées sont validées sur des bases de données specifiques à chaque application

    Random finite set filters for superpositional sensors

    Get PDF
    The multi–object filtering problem is a generalization of the well–known single– object filtering problem. In essence, multi–object filtering is concerned with the joint estimation of the unknown and time–varying number of objects and the state of each of these objects. The filtering problem becomes particular challenging when the number of objects cannot be inferred from the collected observations and when no association between an observation and an object is possible. A rather new and promising approach to multi–object filtering is based on the principles of finite set statistics (FISST). FISST is a methodology, originally proposed by R. Mahler, that allows the formulation of the multi–object filtering problem in a mathematical rigorous way. One of the main building blocks of this methodology are random finite sets (RFSs), which are essentially finite set (FS) – valued random variables (RVs). Hence, a RFS is a RV which is not only random in the values of each element but also random in the number of elements of the FS. Under the premise that the observations are generated by detection–type sensors, many practical and efficient multi–object filters have been proposed. In general, detection–type sensors are assumed to generate observations that either originate from a single object or are false alarms. While this is a reasonable assumption in many multi–object filtering scenarios, this is not always the case. Central to this thesis is another type of sensors, the superposition (SPS)–type sensors. Those types of sensors are assumed to generate only one single observation that encapsulates the information about all the objects in the monitored area. More specifically, a single SPS observation is comprised out of the additive contribution of all the observations which would be generated by each object individually. In this thesis multi–object filters for SPS–type sensors are derived in a formal mathematical manner using the methodology of FISST. The first key contribution is a formulation of a SPS sensor model that, alongside errors like sensor noise, accounts for the fact that an object might not be visible to a sensor due to being outside of the sensor’s restricted field of view (FOV) or because it is occluded by obstacles. The second key contribution is the derivation of multi–object Bayes filter for SPS sensors that incorporates the aforementioned SPS sensor model. The third key contribution is the formulation of a filter variant that incorporates a multi–object multi–Bernoulli distribution as underlying multi–object state distribution, thus providing a multi–object multi–Bernoulli (MeMBer) filter variant for SPS–type sensors. As the stated variant turns out not to be conjugate, two approximations to the exact solution are given. The fourth key contribution is the derivation of computationally tractable implementations of the SPS MeMBer filters
    corecore