13,612 research outputs found
A survey of comics research in computer science
Graphical novels such as comics and mangas are well known all over the world.
The digital transition started to change the way people are reading comics,
more and more on smartphones and tablets and less and less on paper. In the
recent years, a wide variety of research about comics has been proposed and
might change the way comics are created, distributed and read in future years.
Early work focuses on low level document image analysis: indeed comic books are
complex, they contains text, drawings, balloon, panels, onomatopoeia, etc.
Different fields of computer science covered research about user interaction
and content generation such as multimedia, artificial intelligence,
human-computer interaction, etc. with different sets of values. We propose in
this paper to review the previous research about comics in computer science, to
state what have been done and to give some insights about the main outlooks
Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean
A new tightly coupled speech and natural language integration model is
presented for a TDNN-based continuous possibly large vocabulary speech
recognition system for Korean. Unlike popular n-best techniques developed for
integrating mainly HMM-based speech recognition and natural language processing
in a {\em word level}, which is obviously inadequate for morphologically
complex agglutinative languages, our model constructs a spoken language system
based on a {\em morpheme-level} speech and language integration. With this
integration scheme, the spoken Korean processing engine (SKOPE) is designed and
implemented using a TDNN-based diphone recognition module integrated with a
Viterbi-based lexical decoding and symbolic phonological/morphological
co-analysis. Our experiment results show that the speaker-dependent continuous
{\em eojeol} (Korean word) recognition and integrated morphological analysis
can be achieved with over 80.6% success rate directly from speech inputs for
the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer
processing of oriental language journa
Text Segmentation Using Exponential Models
This paper introduces a new statistical approach to partitioning text
automatically into coherent segments. Our approach enlists both short-range and
long-range language models to help it sniff out likely sites of topic changes
in text. To aid its search, the system consults a set of simple lexical hints
it has learned to associate with the presence of boundaries through inspection
of a large corpus of annotated data. We also propose a new probabilistically
motivated error metric for use by the natural language processing and
information retrieval communities, intended to supersede precision and recall
for appraising segmentation algorithms. Qualitative assessment of our algorithm
as well as evaluation using this new metric demonstrate the effectiveness of
our approach in two very different domains, Wall Street Journal articles and
the TDT Corpus, a collection of newswire articles and broadcast news
transcripts.Comment: 12 pages, LaTeX source and postscript figures for EMNLP-2 pape
Automatic Genre Classification of Latin Music Using Ensemble of Classifiers
This paper presents a novel approach to the task of automatic music genre classification which is based on ensemble learning. Feature vectors are extracted from three 30-second music segments from the beginning, middle and end of each music piece. Individual classifiers are trained to account for each music segment. During classification, the output provided by each classifier is combined with the aim of improving music genre classification accuracy. Experiments carried out on a dataset containing 600 music samples from two Latin genres (Tango and Salsa) have shown that for the task of automatic music genre classification, the features extracted from the middle and end music segments provide better results than using the beginning music segment. Furthermore, the proposed ensemble method provides better accuracy than using single classifiers and any individual segment
Substructure and Boundary Modeling for Continuous Action Recognition
This paper introduces a probabilistic graphical model for continuous action
recognition with two novel components: substructure transition model and
discriminative boundary model. The first component encodes the sparse and
global temporal transition prior between action primitives in state-space model
to handle the large spatial-temporal variations within an action class. The
second component enforces the action duration constraint in a discriminative
way to locate the transition boundaries between actions more accurately. The
two components are integrated into a unified graphical structure to enable
effective training and inference. Our comprehensive experimental results on
both public and in-house datasets show that, with the capability to incorporate
additional information that had not been explicitly or efficiently modeled by
previous methods, our proposed algorithm achieved significantly improved
performance for continuous action recognition.Comment: Detailed version of the CVPR 2012 paper. 15 pages, 6 figure
Learning Timbre Analogies from Unlabelled Data by Multivariate Tree Regression
This is the Author's Original Manuscript of an article whose final and definitive form, the Version of Record, has been published in the Journal of New Music Research, November 2011, copyright Taylor & Francis. The published article is available online at http://www.tandfonline.com/10.1080/09298215.2011.596938
Detecting User Engagement in Everyday Conversations
This paper presents a novel application of speech emotion recognition:
estimation of the level of conversational engagement between users of a voice
communication system. We begin by using machine learning techniques, such as
the support vector machine (SVM), to classify users' emotions as expressed in
individual utterances. However, this alone fails to model the temporal and
interactive aspects of conversational engagement. We therefore propose the use
of a multilevel structure based on coupled hidden Markov models (HMM) to
estimate engagement levels in continuous natural speech. The first level is
comprised of SVM-based classifiers that recognize emotional states, which could
be (e.g.) discrete emotion types or arousal/valence levels. A high-level HMM
then uses these emotional states as input, estimating users' engagement in
conversation by decoding the internal states of the HMM. We report experimental
results obtained by applying our algorithms to the LDC Emotional Prosody and
CallFriend speech corpora.Comment: 4 pages (A4), 1 figure (EPS
- ā¦