13 research outputs found

    A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis

    Full text link
    Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of human-annotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents. In the current research, a typical example of such cross-domain transfer learning is the use of neural networks that have been pre-trained on the ImageNet database for object recognition. It remains a mostly open question whether or not this pre-training helps to analyse historical documents, which have fundamentally different image properties when compared with ImageNet. In this paper, we present a comprehensive empirical survey on the effect of ImageNet pre-training for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval. While we obtain mixed results for semantic segmentation at pixel-level, we observe a clear trend across different network architectures that ImageNet pre-training has a positive effect on classification as well as content-based retrieval

    Attention-Based Deep Learning Framework for Human Activity Recognition with User Adaptation

    Full text link
    Sensor-based human activity recognition (HAR) requires to predict the action of a person based on sensor-generated time series data. HAR has attracted major interest in the past few years, thanks to the large number of applications enabled by modern ubiquitous computing devices. While several techniques based on hand-crafted feature engineering have been proposed, the current state-of-the-art is represented by deep learning architectures that automatically obtain high level representations and that use recurrent neural networks (RNNs) to extract temporal dependencies in the input. RNNs have several limitations, in particular in dealing with long-term dependencies. We propose a novel deep learning framework, \algname, based on a purely attention-based mechanism, that overcomes the limitations of the state-of-the-art. We show that our proposed attention-based architecture is considerably more powerful than previous approaches, with an average increment, of more than 7%7\% on the F1 score over the previous best performing model. Furthermore, we consider the problem of personalizing HAR deep learning models, which is of great importance in several applications. We propose a simple and effective transfer-learning based strategy to adapt a model to a specific user, providing an average increment of 6%6\% on the F1 score on the predictions for that user. Our extensive experimental evaluation proves the significantly superior capabilities of our proposed framework over the current state-of-the-art and the effectiveness of our user adaptation technique.Comment: Accepted for publication on the IEEE Sensors Journa

    Cross-modal data retrieval and generation using deep neural networks

    Get PDF
    The exponential growth of deep learning has helped solve problems across different fields of study. Convolutional neural networks have become a go-to tool for extracting features from images. Similarly, variations of recurrent neural networks such as Long-Short Term Memory and Gated Recurrent Unit architectures do a good job extracting useful information from temporal data such as text and time series data. Although, these networks are good at extracting features for a particular modality, learning features across multiple modalities is still a challenging task. In this work, we develop a generative common vector space model in which similar concepts from different modalities are brought closer in a common latent space representation while dissimilar concepts are pushed far apart in this same space. The developed model not only aims at solving the cross-modal retrieval problem but also uses the vector generated by the common vector space model to generate real looking data. This work mainly focuses on image and text modalities. However, it can be extended to other modalities as well. We train and evaluate the performance of the model on Caltech CUB and Oxford-102 datasets

    Outlier Detection in Wearable Sensor Data for Human Activity Recognition (HAR) Based on DRNNs

    Get PDF
    Wearable sensors provide a user-friendly and non-intrusive mechanism to extract user-relateddata that paves the way to the development of personalized applications. Within those applications, humanactivity recognition (HAR) plays an important role in the characterization of the user context. Outlierdetection methods focus on finding anomalous data samples that are likely to have been generated by adifferent mechanism. This paper combines outlier detection and HAR by introducing a novel algorithmthat is able both to detect information from secondary activities inside the main activity and to extract datasegments of a particular sub-activity from a different activity. Several machine learning algorithms havebeen previously used in the area of HAR based on the analysis of the time sequences generated by wearablesensors. Deep recurrent neural networks (DRNNs) have proven to be optimally adapted to the sequentialcharacteristics of wearable sensor data in previous studies. A DRNN-based algorithm is proposed in thispaper for outlier detection in HAR. The results are validated both for intra- and inter-subject cases and bothfor outlier detection and sub-activity recognition using two different datasets. A first dataset comprising4 major activities (walking, running, climbing up, and down) from 15 users is used to train and validatethe proposal. Intra-subject outlier detection is able to detect all major outliers in the walking activity in thisdataset, while inter-subject outlier detection only fails for one participant executing the activity in a peculiarway. Sub-activity detection has been validated by finding out and extracting walking segments present inthe other three activities in this dataset. A second dataset using four different users, a different setting anddifferent sensor devices is used to assess the generalization of results.This work was supported by the ‘‘ANALYTICS USING SENSOR DATA FOR FLATCITY’’ Project (MINECO/ ERDF, EU) funded in partby the Spanish Agencia Estatal de Investigación (AEI) under Grant TIN2016-77158-C4-1-R and in part by the European RegionalDevelopment Fund (ERDF)

    Uma abordagem de aprendizagem profunda para identificação de pessoas através do som dos passos

    Get PDF
    With the advent of pervasive computing, the technology has become part of human daily life. Thus, in the construction of ”intelligent”environments, systems are used that are linked to the routine of the individuals who live there. So one of the main needs is the recognition of the individual who lives in the space. The objective of this work is to iden tify individuals in an intelligent environment through information obtained by the sound of their steps. To this end, different configurations of neural networks of deep learning were developed and tested in order to classify the people who participated in an experi ment realized in Carvalho and Rosa (2010). The proposed neural network architecture is composed of a chain of neural networks and the results achieved to 98.57 % accuracy.Com o advento da Computação UbĂ­qua, a tecnologia passou a fazer parte do cotidiano do ser humano. Deste modo, na construção de ambientes “inteligentes”, sĂŁo empregados sistemas ligados `a rotina dos indivĂ­duos que ali habitam. Assim, uma das principais necessidades ÂŽe o reconhecimento do indivĂ­duo que convive no espaço em questĂŁo. O objetivo deste trabalho ÂŽe identificar os indivĂ­duos em um ambiente inteligente atravĂ©s de informaçÔes obtidas pelo som de seus passos. Para isso, foram desenvolvidas e testadas diferentes configuraçÔes de redes neurais de aprendizado profundo com o propĂłsito de classificar as pessoas que participaram de um experimento realizado em Carvalho e Rosa (2010). A arquitetura de rede neural proposta ÂŽe composta por um encadeamento de redes neurais e seus resultados alcançaram atĂ© 98,57% de acurĂĄci

    Transforming sensor data to the image domain for deep learning — An application to footstep detection

    No full text
    corecore