8,893 research outputs found
An Efficient Hidden Markov Model for Offline Handwritten Numeral Recognition
Traditionally, the performance of ocr algorithms and systems is based on the
recognition of isolated characters. When a system classifies an individual
character, its output is typically a character label or a reject marker that
corresponds to an unrecognized character. By comparing output labels with the
correct labels, the number of correct recognition, substitution errors
misrecognized characters, and rejects unrecognized characters are determined.
Nowadays, although recognition of printed isolated characters is performed with
high accuracy, recognition of handwritten characters still remains an open
problem in the research arena. The ability to identify machine printed
characters in an automated or a semi automated manner has obvious applications
in numerous fields. Since creating an algorithm with a one hundred percent
correct recognition rate is quite probably impossible in our world of noise and
different font styles, it is important to design character recognition
algorithms with these failures in mind so that when mistakes are inevitably
made, they will at least be understandable and predictable to the person
working with theComment: 6pages, 5 figure
Prosody-Based Automatic Segmentation of Speech into Sentences and Topics
A crucial step in processing speech audio data for information extraction,
topic detection, or browsing/playback is to segment the input into sentence and
topic units. Speech segmentation is challenging, since the cues typically
present for segmenting text (headers, paragraphs, punctuation) are absent in
spoken language. We investigate the use of prosody (information gleaned from
the timing and melody of speech) for these tasks. Using decision tree and
hidden Markov modeling techniques, we combine prosodic cues with word-based
approaches, and evaluate performance on two speech corpora, Broadcast News and
Switchboard. Results show that the prosodic model alone performs on par with,
or better than, word-based statistical language models -- for both true and
automatically recognized words in news speech. The prosodic model achieves
comparable performance with significantly less training data, and requires no
hand-labeling of prosodic events. Across tasks and corpora, we obtain a
significant improvement over word-only models using a probabilistic combination
of prosodic and lexical information. Inspection reveals that the prosodic
models capture language-independent boundary indicators described in the
literature. Finally, cue usage is task and corpus dependent. For example, pause
and pitch features are highly informative for segmenting news speech, whereas
pause, duration and word-based cues dominate for natural conversation.Comment: 30 pages, 9 figures. To appear in Speech Communication 32(1-2),
Special Issue on Accessing Information in Spoken Audio, September 200
Automatic Segmentation of Multiparty Dialogue
In this paper, we investigate the problem of automatically predicting segment boundaries in spoken multiparty dialogue. We extend prior work in two ways. We first apply approaches that have been proposed for predicting top-level topic shifts to the problem of identifying subtopic boundaries. We then explore the impact on performance of using ASR output as opposed to human transcription. Examination of the effect of features shows that predicting top-level and predicting subtopic boundaries are two distinct tasks: (1) for predicting subtopic boundaries, the lexical cohesion-based approach alone can achieve competitive results, (2) for predicting top-level boundaries, the machine learning approach that combines lexical-cohesion and conversational features performs best, and (3) conversational cues, such as cue phrases and overlapping speech, are better indicators for the top-level prediction task. We also find that the transcription errors inevitable in ASR output have a negative impact on models that combine lexical-cohesion and conversational features, but do not change the general preference of approach for the two tasks
Learning Behavioural Context
The original publication is available at www.springerlink.co
Phonetic Searching
An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.Georgia Tech Research Corporatio
Increase Apparent Public Speaking Fluency By Speech Augmentation
Fluent and confident speech is desirable to every speaker. But professional
speech delivering requires a great deal of experience and practice. In this
paper, we propose a speech stream manipulation system which can help
non-professional speakers to produce fluent, professional-like speech content,
in turn contributing towards better listener engagement and comprehension. We
propose to achieve this task by manipulating the disfluencies in human speech,
like the sounds 'uh' and 'um', the filler words and awkward long silences.
Given any unrehearsed speech we segment and silence the filled pauses and
doctor the duration of imposed silence as well as other long pauses
('disfluent') by a predictive model learned using professional speech dataset.
Finally, we output a audio stream in which speaker sounds more fluent,
confident and practiced compared to the original speech he/she recorded.
According to our quantitative evaluation, we significantly increase the fluency
of speech by reducing rate of pauses and fillers
Learning About Meetings
Most people participate in meetings almost every day, multiple times a day.
The study of meetings is important, but also challenging, as it requires an
understanding of social signals and complex interpersonal dynamics. Our aim
this work is to use a data-driven approach to the science of meetings. We
provide tentative evidence that: i) it is possible to automatically detect when
during the meeting a key decision is taking place, from analyzing only the
local dialogue acts, ii) there are common patterns in the way social dialogue
acts are interspersed throughout a meeting, iii) at the time key decisions are
made, the amount of time left in the meeting can be predicted from the amount
of time that has passed, iv) it is often possible to predict whether a proposal
during a meeting will be accepted or rejected based entirely on the language
(the set of persuasive words) used by the speaker
Radar HRRP Modeling using Dynamic System for Radar Target Recognition
High resolution range profile (HRRP) is being known as one of the most powerful tools for radar target recognition. The main problem with range profile for radar target recognition is its sensitivity to aspect angle. To overcome this problem, consecutive samples of HRRP were assumed to be identically independently distributed (IID) in small frames of aspect angles in most of the related works. Here, considering the physical circumstances of maneuver of an aerial target, we have proposed dynamic system which models the short dependency between consecutive samples of HRRP in segments of the whole HRRP sequence. Dynamic system (DS) is used to model the sequence of PCA (principal component analysis) coefficients extracted from the sequence of HRRPs. Considering this we have proposed a model called PCA+DS. We have also proposed a segmentation algorithm which segments the HRRP sequence reliably. Akaike information criterion (AIC) used to evaluate the quality of data modeling showed that our PCA+DS model outperforms factor analysis (FA) model. In addition, target recognition results using simulated data showed that our method based on PCA+DS achieves better recognition rates compared to the method based on FA
- âŠ