1,649 research outputs found

    DeePoint: Pointing Recognition and Direction Estimation From A Fixed View

    Full text link
    In this paper, we realize automatic visual recognition and direction estimation of pointing. We introduce the first neural pointing understanding method based on two key contributions. The first is the introduction of a first-of-its-kind large-scale dataset for pointing recognition and direction estimation, which we refer to as the DP Dataset. DP Dataset consists of more than 2 million frames of over 33 people pointing in various styles annotated for each frame with pointing timings and 3D directions. The second is DeePoint, a novel deep network model for joint recognition and 3D direction estimation of pointing. DeePoint is a Transformer-based network which fully leverages the spatio-temporal coordination of the body parts, not just the hands. Through extensive experiments, we demonstrate the accuracy and efficiency of DeePoint. We believe DP Dataset and DeePoint will serve as a sound foundation for visual human intention understanding

    MIFTel: a multimodal interactive framework based on temporal logic rules

    Get PDF
    Human-computer and multimodal interaction are increasingly used in everyday life. Machines are able to get more from the surrounding world, assisting humans in different application areas. In this context, the correct processing and management of signals provided by the environments is determinant for structuring the data. Different sources and acquisition times can be exploited for improving recognition results. On the basis of these assumptions, we are proposing a multimodal system that exploits Allen’s temporal logic combined with a prevision method. The main object is to correlate user’s events with system’s reactions. After post-elaborating coming data from different signal sources (RGB images, depth maps, sounds, proximity sensors, etc.), the system is managing the correlations between recognition/detection results and events in real-time to create an interactive environment for the user. For increasing the recognition reliability, a predictive model is also associated with the proposed method. The modularity of the system grants a full dynamic development and upgrade with custom modules. Finally, a comparison with other similar systems is shown, underlining the high flexibility and robustness of the proposed event management method

    Intelligent Sensors for Human Motion Analysis

    Get PDF
    The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems

    Large-Scale Textured 3D Scene Reconstruction

    Get PDF
    Die Erstellung dreidimensionaler Umgebungsmodelle ist eine fundamentale Aufgabe im Bereich des maschinellen Sehens. Rekonstruktionen sind für eine Reihe von Anwendungen von Nutzen, wie bei der Vermessung, dem Erhalt von Kulturgütern oder der Erstellung virtueller Welten in der Unterhaltungsindustrie. Im Bereich des automatischen Fahrens helfen sie bei der Bewältigung einer Vielzahl an Herausforderungen. Dazu gehören Lokalisierung, das Annotieren großer Datensätze oder die vollautomatische Erstellung von Simulationsszenarien. Die Herausforderung bei der 3D Rekonstruktion ist die gemeinsame Schätzung von Sensorposen und einem Umgebunsmodell. Redundante und potenziell fehlerbehaftete Messungen verschiedener Sensoren müssen in eine gemeinsame Repräsentation der Welt integriert werden, um ein metrisch und photometrisch korrektes Modell zu erhalten. Gleichzeitig muss die Methode effizient Ressourcen nutzen, um Laufzeiten zu erreichen, welche die praktische Nutzung ermöglichen. In dieser Arbeit stellen wir ein Verfahren zur Rekonstruktion vor, das fähig ist, photorealistische 3D Rekonstruktionen großer Areale zu erstellen, die sich über mehrere Kilometer erstrecken. Entfernungsmessungen aus Laserscannern und Stereokamerasystemen werden zusammen mit Hilfe eines volumetrischen Rekonstruktionsverfahrens fusioniert. Ringschlüsse werden erkannt und als zusätzliche Bedingungen eingebracht, um eine global konsistente Karte zu erhalten. Das resultierende Gitternetz wird aus Kamerabildern texturiert, wobei die einzelnen Beobachtungen mit ihrer Güte gewichtet werden. Für eine nahtlose Erscheinung werden die unbekannten Belichtungszeiten und Parameter des optischen Systems mitgeschätzt und die Bilder entsprechend korrigiert. Wir evaluieren unsere Methode auf synthetischen Daten, realen Sensordaten unseres Versuchsfahrzeugs und öffentlich verfügbaren Datensätzen. Wir zeigen qualitative Ergebnisse großer innerstädtischer Bereiche, sowie quantitative Auswertungen der Fahrzeugtrajektorie und der Rekonstruktionsqualität. Zuletzt präsentieren wir mehrere Anwendungen und zeigen somit den Nutzen unserer Methode für Anwendungen im Bereich des automatischen Fahrens

    Perceptive agents with attentive interfaces : learning and vision for man-machine systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1996.Includes bibliographical references (leaves 107-116).by Trevor Jackson Darrell.Ph. D

    Too much information is no information: how machine learning and feature selection could help in understanding the motor control of pointing

    Get PDF
    © 2023 The Author(s). This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY), https://creativecommons.org/licenses/by/4.0/The aim of this study was to develop the use of Machine Learning techniques as a means of multivariate analysis in studies of motor control. These studies generate a huge amount of data, the analysis of which continues to be largely univariate. We propose the use of machine learning classification and feature selection as a means of uncovering feature combinations that are altered between conditions. High dimensional electromyogram (EMG) vectors were generated as several arm and trunk muscles were recorded while subjects pointed at various angles above and below the gravity neutral horizontal plane. We used Linear Discriminant Analysis (LDA) to carry out binary classifications between the EMG vectors for pointing at a particular angle, vs. pointing at the gravity neutral direction. Classification success provided a composite index of muscular adjustments for various task constraints—in this case, pointing angles. In order to find the combination of features that were significantly altered between task conditions, we conducted a post classification feature selection i.e., investigated which combination of features had allowed for the classification. Feature selection was done by comparing the representations of each category created by LDA for the classification. In other words computing the difference between the representations of each class. We propose that this approach will help with comparing high dimensional EMG patterns in two ways; (i) quantifying the effects of the entire pattern rather than using single arbitrarily defined variables and (ii) identifying the parts of the patterns that convey the most information regarding the investigated effects.Peer reviewe

    Surveillance video summarization based on trajectory rarity measure

    Get PDF
    The dynamic video summarization of surveillance videos has several critical applications, mainly due to the wide availability of digital cameras in environments such as airports, train and bus stations, shopping centers, stadiums, buildings, schools, hospitals, roads, among others. This study presents an approach for the generation of dynamic summary on surveillance video domain based on human trajectories. It has an emphasis on trajectory descriptors in conjunction with the unsupervised clustering method. Our approach contribute to existing literature concerning the combination of methods and objectives. We hypothesize that the clustering of trajectories permits to identify rare trajectories base on their morphology. The clustering as an output provides numerous subsets of trajectories or clusters and the number of elements of a specific cluster is used to determine their rarity. Those subsets with few components are rare while the others that have a high number of elements are considered ordinary; therefore, the implications of our study show that is possible to use unsupervised clustering for automatic detection of rare trajectories based on their morphology and with this information segment videos. We experimented with different sets of trajectories segmenting the rare videos from our ground truth.Trabajo de investigació

    Task adapted reconstruction for inverse problems

    Full text link
    The paper considers the problem of performing a task defined on a model parameter that is only observed indirectly through noisy data in an ill-posed inverse problem. A key aspect is to formalize the steps of reconstruction and task as appropriate estimators (non-randomized decision rules) in statistical estimation problems. The implementation makes use of (deep) neural networks to provide a differentiable parametrization of the family of estimators for both steps. These networks are combined and jointly trained against suitable supervised training data in order to minimize a joint differentiable loss function, resulting in an end-to-end task adapted reconstruction method. The suggested framework is generic, yet adaptable, with a plug-and-play structure for adjusting both the inverse problem and the task at hand. More precisely, the data model (forward operator and statistical model of the noise) associated with the inverse problem is exchangeable, e.g., by using neural network architecture given by a learned iterative method. Furthermore, any task that is encodable as a trainable neural network can be used. The approach is demonstrated on joint tomographic image reconstruction, classification and joint tomographic image reconstruction segmentation
    corecore