89 research outputs found

    Action recognition from RGB-D data

    Get PDF
    In recent years, action recognition based on RGB-D data has attracted increasing attention. Different from traditional 2D action recognition, RGB-D data contains extra depth and skeleton modalities. Different modalities have their own characteristics. This thesis presents seven novel methods to take advantages of the three modalities for action recognition. First, effective handcrafted features are designed and frequent pattern mining method is employed to mine the most discriminative, representative and nonredundant features for skeleton-based action recognition. Second, to take advantages of powerful Convolutional Neural Networks (ConvNets), it is proposed to represent spatio-temporal information carried in 3D skeleton sequences in three 2D images by encoding the joint trajectories and their dynamics into color distribution in the images, and ConvNets are adopted to learn the discriminative features for human action recognition. Third, for depth-based action recognition, three strategies of data augmentation are proposed to apply ConvNets to small training datasets. Forth, to take full advantage of the 3D structural information offered in the depth modality and its being insensitive to illumination variations, three simple, compact yet effective images-based representations are proposed and ConvNets are adopted for feature extraction and classification. However, both of previous two methods are sensitive to noise and could not differentiate well fine-grained actions. Fifth, it is proposed to represent a depth map sequence into three pairs of structured dynamic images at body, part and joint levels respectively through bidirectional rank pooling to deal with the issue. The structured dynamic image preserves the spatial-temporal information, enhances the structure information across both body parts/joints and different temporal scales, and takes advantages of ConvNets for action recognition. Sixth, it is proposed to extract and use scene flow for action recognition from RGB and depth data. Last, to exploit the joint information in multi-modal features arising from heterogeneous sources (RGB, depth), it is proposed to cooperatively train a single ConvNet (referred to as c-ConvNet) on both RGB features and depth features, and deeply aggregate the two modalities to achieve robust action recognition

    Detección de situaciones de violencia física interpersonal en videos usando técnicas de aprendizaje profundo

    Get PDF
    Diseña una arquitectura con el modelo de red neuronal convolucional Xception y LSTM para la detección de violencia física interpersonal en los videos de sistemas de vigilancia. Debido al aumento de inseguridad en el país y como medida preventiva, se buscó reforzar el sistema de videovigilancia, donde se enfocó en la necesidad de integrar nuevas tecnologías para supervisar la seguridad ciudadana como es el caso del uso de la visión artificial. Para el entrenamiento, validación y prueba de la arquitectura del modelo propuesto, se utilizó los conjuntos de datos Hockey Fight Dataset y Real Life Violence Situations Dataset. Los resultados obtenidos en la exactitud de nuestra propuesta en el conjunto de datos Hockey Fight Dataset supero a todos los demás métodos. En el caso del conjunto de datos Real Life Violence Situations Dataset que cuenta 2000 videos en contraste de otros conjuntos de datos utilizados para la detección de violencia, se obtuvieron buenos resultados en la exactitud mayores al 90%.Perú. Universidad Nacional Mayor de San Marcos. Vicerrectorado de Investigación y Posgrado. Proyectos de Investigación con Financiamiento para Grupos de Investigación. PCONFIGI. Código: C21201361. Resolución: 005753-2021-R/UNMS

    Understanding egocentric human actions with temporal decision forests

    Get PDF
    Understanding human actions is a fundamental task in computer vision with a wide range of applications including pervasive health-care, robotics and game control. This thesis focuses on the problem of egocentric action recognition from RGB-D data, wherein the world is viewed through the eyes of the actor whose hands describe the actions. The main contributions of this work are its findings regarding egocentric actions as described by hands in two application scenarios and a proposal of a new technique that is based on temporal decision forests. The thesis first introduces a novel framework to recognise fingertip writing in mid-air in the context of human-computer interaction. This framework detects whether the user is writing and tracks the fingertip over time to generate spatio-temporal trajectories that are recognised by using a Hough forest variant that encourages temporal consistency in prediction. A problem with using such forest approach for action recognition is that the learning of temporal dynamics is limited to hand-crafted temporal features and temporal regression, which may break the temporal continuity and lead to inconsistent predictions. To overcome this limitation, the thesis proposes transition forests. Besides any temporal information that is encoded in the feature space, the forest automatically learns the temporal dynamics during training, and it is exploited in inference in an online and efficient manner achieving state-of-the-art results. The last contribution of this thesis is its introduction of the first RGB-D benchmark to allow for the study of egocentric hand-object actions with both hand and object pose annotations. This study conducts an extensive evaluation of different baselines, state-of-the art approaches and temporal decision forest models using colour, depth and hand pose features. Furthermore, it extends the transition forest model to incorporate data from different modalities and demonstrates the benefit of using hand pose features to recognise egocentric human actions. The thesis concludes by discussing and analysing the contributions and proposing a few ideas for future work.Open Acces

    Contextual Understanding of Sequential Data Across Multiple Modalities

    Get PDF
    In recent years, progress in computing and networking has made it possible to collect large volumes of data for various different applications in data mining and data analytics using machine learning methods. Data may come from different sources and in different shapes and forms depending on their inherent nature and the acquisition process. In this dissertation, we focus specifically on sequential data, which have been exponentially growing in recent years on platforms such as YouTube, social media, news agency sites, and other platforms. An important characteristic of sequential data is the inherent causal structure with latent patterns that can be discovered and learned from samples of the dataset. With this in mind, we target problems in two different domains of Computer Vision and Natural Language Processing that deal with sequential data and share the common characteristics of such data. The first one is action recognition based on video data, which is a fundamental problem in computer vision. This problem aims to find generalized patterns from videos to recognize or predict human actions. A video contains two important sets of information, i.e. appearance and motion. These information are complementary, and therefore an accurate recognition or prediction of activities or actions in video data depend significantly on our ability to extract them both. However, effective extraction of these information is a non-trivial task due to several challenges, such as viewpoint changes, camera motions, and scale variations, to name a few. It is thus crucial to design effective and generalized representations of video data that learn these variations and/or are invariant to such variations. We propose different models that learn and extract spatio-temporal correlations from video frames by using deep networks that overcome these challenges. The second problem that we study in this dissertation in the context of sequential data analysis is text summarization in multi-document processing. Sentences consist of sequence of words that imply context. The summarization task requires learning and understanding the contextual information from each sentence in order to determine which subset of sentences forms the best representative of a given article. With the progress made by deep learning, better representations of words have been achieved, leading in turn to better contextual representations of sentences. We propose summarization methods that combine mathematical optimization, Determinantal Point Processes (DPPs), and deep learning models that outperform the state of the art in multi-document text summarization

    Drones Detection Using Smart Sensors

    Get PDF
    Drones are modern and sophisticated technology that have been used in numerous fields. Nowadays, many countries use them in exploration, reconnaissance operations, and espionage in military operations. Drones also have many uses that are not limited to only daily life. For example, drones are used for home delivery, safety monitoring, and others. However, the use of drones is a double-edged sword. Drones can be used for positive purposes to improve the quality of human lives, but they can also be used for criminal purposes and other detrimental purposes. In fact, many countries have been attacked by terrorists using smart drones. Hence, drone detection is an active area of research and it receives the attention of many scholars. Advanced drones are, many times, difficult to detect, and hence they, sometimes, can be life threatening. Currently, most detection methods are based on video, sound, radar, temperature, radio frequency (RF), or Wi-Fi techniques. However, each detection method has several flaws that make them imperfect choices for drone detection in sensitive areas. Our aim is to overcome the challenges that most existing drone detection techniques face. In this thesis, we propose two modeling techniques and compare them to produce an efficient system for drone detection. Specifically, we compare the two proposed models by investigating the risk assessments and the probability of success for each model

    Recent advances in video analytics for rail network surveillance for security, trespass and suicide prevention— a survey

    Get PDF
    Railway networks systems are by design open and accessible to people, but this presents challenges in the prevention of events such as terrorism, trespass, and suicide fatalities. With the rapid advancement of machine learning, numerous computer vision methods have been developed in closed-circuit television (CCTV) surveillance systems for the purposes of managing public spaces. These methods are built based on multiple types of sensors and are designed to automatically detect static objects and unexpected events, monitor people, and prevent potential dangers. This survey focuses on recently developed CCTV surveillance methods for rail networks, discusses the challenges they face, their advantages and disadvantages and a vision for future railway surveillance systems. State-of-the-art methods for object detection and behaviour recognition applied to rail network surveillance systems are introduced, and the ethics of handling personal data and the use of automated systems are also considered

    Data-driven maintenance of military systems:Potential and challenges

    Get PDF
    The success of military missions is largely dependent on the reliability and availability of the systems that are used. In modern warfare, data is considered as an important weapon, both in offence and defence. However, collection and analysis of the proper data can also play a crucial role in reducing the number of system failures, and thus increase the system availability and military performance considerably. In this chapter, the concept of data-driven maintenance will be introduced. First, the various maturity levels, ranging from detection of failures and automated diagnostics to advanced condition monitoring and predictive maintenance are introduced. Then, the different types of data and associated decisions are discussed. And finally, six practical cases from the Dutch MoD will be used to demonstrate the benefits of this concept and discuss the challenges that are encountered in applying this in military practice
    corecore