Search CORE

25 research outputs found

Embedded Large-Scale Handwritten Chinese Character Recognition

Author: Bellegarda Jerome R.
Chherawala Youssouf
Dixon Ryan S.
Dolfing Hans J. G. A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/04/2020
Field of study

As handwriting input becomes more prevalent, the large symbol inventory required to support Chinese handwriting recognition poses unique challenges. This paper describes how the Apple deep learning recognition system can accurately handle up to 30,000 Chinese characters while running in real-time across a range of mobile devices. To achieve acceptable accuracy, we paid particular attention to data collection conditions, representativeness of writing styles, and training regimen. We found that, with proper care, even larger inventories are within reach. Our experiments show that accuracy only degrades slowly as the inventory increases, as long as we use training data of sufficient quality and in sufficient quantity.Comment: 5 pages, 7 figure

arXiv.org e-Print Archive

Crossref

Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification

Author: B Roberto
CH Wu
CM Lee
D Bitouk
DA Sauter
H Peng
Ilker Demirkol
J Rong
JC Platt
Jianbo Yuan
JR Bellegarda
KR Scherer
KR Scherer
M Hoque
M Liberman
Melissa Sturge-Apple
MP Black
N Yang
Na Yang
NHD Jong
NV Chawla
P Kerig
R Bakeman
R Barra-Chicote
S Yun
T Bänziger
VN Vapnik
VN Vapnik
Wendi Heinzelman
X Huang
Y Yang
Yun Zhou
Zhiyao Duan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

As an essential approach to understanding human interactions, emotion classification is a vital component of behavioral studies as well as being important in the design of context-aware systems. Recent studies have shown that speech contains rich information about emotion, and numerous speech-based emotion classification methods have been proposed. However, the classification performance is still short of what is desired for the algorithms to be used in real systems. We present an emotion classification system using several one-against-all support vector machines with a thresholding fusion mechanism to combine the individual outputs, which provides the functionality to effectively increase the emotion classification accuracy at the expense of rejecting some samples as unclassified. Results show that the proposed system outperforms three state-of-the-art methods and that the thresholding fusion mechanism can effectively improve the emotion classification, which is important for applications that require very high accuracy but do not require that all samples be classified. We evaluate the system performance for several challenging scenarios including speaker-independent tests, tests on noisy speech signals, and tests using non-professional acted recordings, in order to demonstrate the performance of the system and the effectiveness of the thresholding fusion mechanism in real scenarios.Peer ReviewedPreprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

A computational model for simulating text comprehension

Author: A. E. Cook
A. M. Glenberg
B. Lemaire
B. Lemaire
Benoît Lemaire
C. A. Perfetti
C. Bellissens
C. Burgess
Cédrick Bellissens
D. J. Chwilla
D. S. McNamara
D. S. McNamara
E. J. O’Brien
F. Haye de la
G. Denhière
G. Denhière
Guy Denhière
I. Tapiero
J. F. Voss
J. L. Myers
J. R. Bellegarda
K. Livesay
M. L. Rizzella
P. Broek van den
P. Broek van den
R. A. Zwaan
R. J. Gerrig
R. M. French
S. Baudet
S. Caillies
S. Caillies
S. Caillies
S. Deerwester
S. T. Dumais
Sandra Jhean-Larose
T. K. Landauer
T. K. Landauer
T. K. Landauer
T. K. Landauer
T. Linderholm
W. Kintsch
W. Kintsch
W. Kintsch
W. Kintsch
W. Kintsch
W. Lowe
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Combining Statistical Language Models via the Latent Maximum Entropy Principle

Author: A. Berger
A. Dempster
B. Roark
C. Shannon
C. Chelba
Dale Schuurmans
F. Peng
Fuchun Peng
J. Bellegarda
J. Darroch
R. Rosenfeld
R. Rosenfeld
S. Abney
S. Della Pietra
S. Lauritzen
S. Chen
S. Chen
S. Khudanpur
Shaojun Wang
T. Hofmann
Y. Bengio
Yunxin Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis

Author: A Ben-Hur
A Floratos
AR Shah
B Qian
B Rost
B-J Webb-Robertson
Bin Liu
C Leslie
CG Nevill-Manning
CS Leslie
H Ogul
H Rangwala
H Saigo
I Rigoutsos
J Bellegarda
J Shawe-Taylor
K Karplus
L Holm
L Liao
Lei Lin
M Ganapathiraju
M Gribskov
Q Dong
Q Dong
Q Dong
Q Dong
Q Dong
Qiwen Dong
QJ Su
QW Dong
R Kuang
S Henikoff
SE Brenner
SE Dowd
SF Altschul
SF Altschul
T Damoulas
T Håndstad
T Jaakkola
T Lingner
TF Smith
TK Landauer
TL Bailey
VN Vapnik
WR Pearson
WS Noble
Xiaolong Wang
Xuan Wang
Y Hou
Y Hou
Y Yang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Protein remote homology detection and fold recognition are central problems in bioinformatics. Currently, discriminative methods based on support vector machine (SVM) are the most effective and accurate methods for solving these problems. A key step to improve the performance of the SVM-based methods is to find a suitable representation of protein sequences. Results In this paper, a novel building block of proteins called Top-<it>n</it>-grams is presented, which contains the evolutionary information extracted from the protein sequence frequency profiles. The protein sequence frequency profiles are calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into Top-<it>n</it>-grams. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each Top-<it>n</it>-gram. The training vectors are evaluated by SVM to train classifiers which are then used to classify the test protein sequences. We demonstrate that the prediction performance of remote homology detection and fold recognition can be improved by combining Top-<it>n</it>-grams and latent semantic analysis (LSA), which is an efficient feature extraction technique from natural language processing. When tested on superfamily and fold benchmarks, the method combining Top-<it>n</it>-grams and LSA gives significantly better results compared to related methods. Conclusion The method based on Top-<it>n</it>-grams significantly outperforms the methods based on many other building blocks including N-grams, patterns, motifs and binary profiles. Therefore, Top-<it>n</it>-gram is a good building block of the protein sequences and can be widely used in many tasks of the computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the prediction of protein binding sites.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Combining usage and content in an online recommendation system for music in the Long Tail

Author: D Bogdanov
G Tzanetakis
J Bellegarda
M Slaney
P Lamere
R Burke
R Kohavi
SC Deerwester
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Multi-Domain Adapted Machine Translation Using Unsupervised Text Clustering

Author: F Pedregosa
G Faaß
J Han
J Karlgren
JR Bellegarda
K Lagus
N Bertoldi
PF Brown
R Iyer
R Rosenfeld
R Rosenfeld
T Kohonen
T Kohonen
T Kohonen
VI Levenshtein
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2015
Field of study

Domain Adaptation in Machine Translation means to take a machine translation system that is restricted to work in a specific context and to enable the system to translate text from a different domain. The paper presents a two-step domain adaptation strategy, by first making use of unlabeled training material through an unsupervised algorithm, the Self-Organizing Map, to create auxiliary language models, and then to include these models dynamically in a machine translation pipelin

Crossref

NORA - Norwegian Open Research Archives