13 research outputs found
A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis
Automatic analysis of scanned historical documents comprises a wide range of
image analysis tasks, which are often challenging for machine learning due to a
lack of human-annotated learning samples. With the advent of deep neural
networks, a promising way to cope with the lack of training data is to
pre-train models on images from a different domain and then fine-tune them on
historical documents. In the current research, a typical example of such
cross-domain transfer learning is the use of neural networks that have been
pre-trained on the ImageNet database for object recognition. It remains a
mostly open question whether or not this pre-training helps to analyse
historical documents, which have fundamentally different image properties when
compared with ImageNet. In this paper, we present a comprehensive empirical
survey on the effect of ImageNet pre-training for diverse historical document
analysis tasks, including character recognition, style classification,
manuscript dating, semantic segmentation, and content-based retrieval. While we
obtain mixed results for semantic segmentation at pixel-level, we observe a
clear trend across different network architectures that ImageNet pre-training
has a positive effect on classification as well as content-based retrieval
Attention-Based Deep Learning Framework for Human Activity Recognition with User Adaptation
Sensor-based human activity recognition (HAR) requires to predict the action
of a person based on sensor-generated time series data. HAR has attracted major
interest in the past few years, thanks to the large number of applications
enabled by modern ubiquitous computing devices. While several techniques based
on hand-crafted feature engineering have been proposed, the current
state-of-the-art is represented by deep learning architectures that
automatically obtain high level representations and that use recurrent neural
networks (RNNs) to extract temporal dependencies in the input. RNNs have
several limitations, in particular in dealing with long-term dependencies. We
propose a novel deep learning framework, \algname, based on a purely
attention-based mechanism, that overcomes the limitations of the
state-of-the-art. We show that our proposed attention-based architecture is
considerably more powerful than previous approaches, with an average increment,
of more than on the F1 score over the previous best performing model.
Furthermore, we consider the problem of personalizing HAR deep learning models,
which is of great importance in several applications. We propose a simple and
effective transfer-learning based strategy to adapt a model to a specific user,
providing an average increment of on the F1 score on the predictions for
that user. Our extensive experimental evaluation proves the significantly
superior capabilities of our proposed framework over the current
state-of-the-art and the effectiveness of our user adaptation technique.Comment: Accepted for publication on the IEEE Sensors Journa
Cross-modal data retrieval and generation using deep neural networks
The exponential growth of deep learning has helped solve problems across different fields of study. Convolutional neural networks have become a go-to tool for extracting features from images. Similarly, variations of recurrent neural networks such as Long-Short Term Memory and Gated Recurrent Unit architectures do a good job extracting useful information from temporal data such as text and time series data. Although, these networks are good at extracting features for a particular modality, learning features across multiple modalities is still a challenging task. In this work, we develop a generative common vector space model in which similar concepts from different modalities are brought closer in a common latent space representation while dissimilar concepts are pushed far apart in this same space. The developed model not only aims at solving the cross-modal retrieval problem but also uses the vector generated by the common vector space model to generate real looking data. This work mainly focuses on image and text modalities. However, it can be extended to other modalities as well. We train and evaluate the performance of the model on Caltech CUB and Oxford-102 datasets
Outlier Detection in Wearable Sensor Data for Human Activity Recognition (HAR) Based on DRNNs
Wearable sensors provide a user-friendly and non-intrusive
mechanism to extract user-relateddata
that paves the way to the development of personalized applications. Within
those applications, humanactivity
recognition (HAR) plays an important role in the characterization of the user
context. Outlierdetection
methods focus on finding anomalous data samples that are likely to have been
generated by adifferent
mechanism. This paper combines outlier detection and HAR by introducing a novel
algorithmthat is able both to
detect information from secondary activities inside the main activity and to
extract datasegments of a
particular sub-activity from a different activity. Several machine learning
algorithms havebeen previously
used in the area of HAR based on the analysis of the time sequences generated
by wearablesensors. Deep
recurrent neural networks (DRNNs) have proven to be optimally adapted to the
sequentialcharacteristics of
wearable sensor data in previous studies. A DRNN-based algorithm is proposed in
thispaper for outlier
detection in HAR. The results are validated both for intra- and inter-subject
cases and bothfor outlier
detection and sub-activity recognition using two different datasets. A first
dataset comprising4 major
activities (walking, running, climbing up, and down) from 15 users is used to
train and validatethe
proposal. Intra-subject outlier detection is able to detect all major outliers
in the walking activity in thisdataset,
while inter-subject outlier detection only fails for one participant executing
the activity in a peculiarway.
Sub-activity detection has been validated by finding out and extracting walking
segments present inthe other
three activities in this dataset. A second dataset using four different users,
a different setting anddifferent
sensor devices is used to assess the generalization of results.This work was supported by the ââANALYTICS USING SENSOR DATA FOR FLATCITYââ Project (MINECO/ ERDF, EU) funded in partby the Spanish Agencia Estatal de InvestigaciĂłn (AEI) under Grant TIN2016-77158-C4-1-R and in part by the European RegionalDevelopment Fund (ERDF)
Uma abordagem de aprendizagem profunda para identificação de pessoas através do som dos passos
With the advent of pervasive computing, the technology has become part of human daily life. Thus, in the construction of âintelligentâenvironments, systems are used that are linked to the routine of the individuals who live there. So one of the main needs is the recognition of the individual who lives in the space. The objective of this work is to iden tify individuals in an intelligent environment through information obtained by the sound of their steps. To this end, different configurations of neural networks of deep learning were developed and tested in order to classify the people who participated in an experi ment realized in Carvalho and Rosa (2010). The proposed neural network architecture is composed of a chain of neural networks and the results achieved to 98.57 % accuracy.Com o advento da Computação UbĂqua, a tecnologia passou a fazer parte do cotidiano do ser humano. Deste modo, na construção de ambientes âinteligentesâ, sĂŁo empregados sistemas ligados `a rotina dos indivĂduos que ali habitam. Assim, uma das principais necessidades ÂŽe o reconhecimento do indivĂduo que convive no espaço em questĂŁo. O objetivo deste trabalho ÂŽe identificar os indivĂduos em um ambiente inteligente atravĂ©s de informaçÔes obtidas pelo som de seus passos. Para isso, foram desenvolvidas e testadas diferentes configuraçÔes de redes neurais de aprendizado profundo com o propĂłsito de classificar as pessoas que participaram de um experimento realizado em Carvalho e Rosa (2010). A arquitetura de rede neural proposta ÂŽe composta por um encadeamento de redes neurais e seus resultados alcançaram atĂ© 98,57% de acurĂĄci