Search CORE

3,470 research outputs found

Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation

Author: Andreas Stolcke
Dilek Hakkani-Tür
Elizabeth Shriberg
Grosz B.
Gökhan Tür
Hearst Marti A
Passonneau Rebecca J
Publication venue
Publication date: 01/01/2000
Field of study

We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.Comment: 27 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bilkent University Institutional Repository

Neural Networks for Complex Data

Author: Cottrell Marie
Olteanu Madalina
Rossi Fabrice
Rynkiewicz Joseph
Villa-Vialaneix Nathalie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/10/2012
Field of study

Artificial neural networks are simple and efficient machine learning tools. Defined originally in the traditional setting of simple vector data, neural network models have evolved to address more and more difficulties of complex real world problems, ranging from time evolving data to sophisticated data structures such as graphs and functions. This paper summarizes advances on those themes from the last decade, with a focus on results obtained by members of the SAMM team of Universit\'e Paris

arXiv.org e-Print Archive

HAL-Paris1

Automatic Quality Estimation for ASR System Combination

Author: Falavigna Daniele
Jalalvand Shahab
Matassoni Marco
Negri Matteo
Turchi Marco
Publication venue: 'Elsevier BV'
Publication date: 22/06/2017
Field of study

Recognizer Output Voting Error Reduction (ROVER) has been widely used for system combination in automatic speech recognition (ASR). In order to select the most appropriate words to insert at each position in the output transcriptions, some ROVER extensions rely on critical information such as confidence scores and other ASR decoder features. This information, which is not always available, highly depends on the decoding process and sometimes tends to over estimate the real quality of the recognized words. In this paper we propose a novel variant of ROVER that takes advantage of ASR quality estimation (QE) for ranking the transcriptions at "segment level" instead of: i) relying on confidence scores, or ii) feeding ROVER with randomly ordered hypotheses. We first introduce an effective set of features to compensate for the absence of ASR decoder information. Then, we apply QE techniques to perform accurate hypothesis ranking at segment-level before starting the fusion process. The evaluation is carried out on two different tasks, in which we respectively combine hypotheses coming from independent ASR systems and multi-microphone recordings. In both tasks, it is assumed that the ASR decoder information is not available. The proposed approach significantly outperforms standard ROVER and it is competitive with two strong oracles that e xploit prior knowledge about the real quality of the hypotheses to be combined. Compared to standard ROVER, the abs olute WER improvements in the two evaluation scenarios range from 0.5% to 7.3%

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Advances in Character Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject

Directory of Open Access Books (DOAB)

Image Understanding by Hierarchical Symbolic Representation and Inexact Matching of Attributed Graphs

Author: Eshera Mohamed A.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/1985
Field of study

We study the symbolic representation of imagery information by a powerful global representation scheme in the form of Attributed Relational Graph (ARG), and propose new techniques for the extraction of such representation from spatial-domain images, and for performing the task of image understanding through the analysis of the extracted ARG representation. To achieve practical image understanding tasks, the system needs to comprehend the imagery information in a global form. Therefore, we propose a multi-layer hierarchical scheme for the extraction of global symbolic representation from spatial-domain images. The proposed scheme produces a symbolic mapping of the input data in terms of an output alphabet, whose elements are defined over global subimages. The proposed scheme uses a combination of model-driven and data-driven concepts. The model- driven principle is represented by a graph transducer, which is used to specify the alphabet at each layer in the scheme. A symbolic mapping is driven by the input data to map the input local alphabet into the output global alphabet. Through the iterative application of the symbolic transformational mapping at different levels of hierarchy, the system extracts a global representation from the image in the form of attributed relational graphs. Further processing and interpretation of the imagery information can, then, be performed on their ARG representation. We also propose an efficient approach for calculating a distance measure and finding the best inexact matching configuration between attributed relational graphs. For two ARGs, we define sequences of weighted error-transformations which when performed on one ARG (or a subgraph of it), will produce the other ARG. A distance measure between two ARGs is defined as the weight of the sequence which possesses minimum total-weight. Moreover, this minimum-total weight sequence defines the best inexact matching configuration between the two ARGs. The global minimization over the possible sequences is performed by a dynamic programming technique, the approach shows good results for ARGs of practical sizes. The proposed system possesses the capability to inference the alphabets of the ARG representation which it uses. In the inference phase, the hierarchical scheme is usually driven by the input data only, which normally consist of images of model objects. It extracts the global alphabet of the ARG representation of the models. The extracted model representation is then used in the operation phase of the system to: perform the mapping in the multi-layer scheme. We present our experimental results for utilizing the proposed system for locating objects in complex scenes

Purdue E-Pubs

Automatic road network extraction in suburban areas from aerial images

Author: Grote Anne
Publication venue: Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2011
Field of study

[no abstract

Institutionelles Repositorium der Leibniz Universität Hannover

Digital Image Access & Retrieval

Author: Heidorn P. Bryan
Sandore Beth
Publication venue: Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign
Publication date: 01/01/1997
Field of study

The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Implementation, integration, and optimization of a fuzzy foreground segmentation system

Author: Bowen Ryan M.
Publication venue: RIT Scholar Works
Publication date: 01/01/2013
Field of study

Foreground segmentation is often an important preliminary step for various video processing systems. By improving the accuracy of the foreground segmentation process, the overall performance of a video processing system has the potential for improvement. This work introduces a Fuzzy Foreground Segmentation System (FFSS) that uses Mamdani-type Fuzzy Inference Systems (FIS) to control pixel-level accumulated statistics. The error of the FFSS is quantified by comparing its output with hand-segmented ground-truth images from a set of image sequences that specifically model canonical problems of foreground segmentation. Optimization of the FFSS parameters is achieved using a Real-Coded Genetic Algorithm (RCGA). Additionally, multiple central composite designed experiments used to analyze the performance of RCGA under selected schemes and their respective parameters. The RCGA schemes and parameters are chosen as to reduce variation and execution time for a set of known multi-dimensional test functions. The selected multi-dimensional test functions represent assorted function landscapes. To demonstrate accuracy of the FFSS and implicate the importance of the foreground segmentation process, the system is applied to real-time human detection from a single-camera security system. The Human Detection System (HDS) is composed of an IP Camera networked to multiple heterogeneous computers for distributed parallel processing. The implementation of the HDS, adheres to a System of Systems (SoS) architecture which standardizes data/communication, reduces overall complexity, and maintains a high level of interoperability

RIT Scholar Works