32,606 research outputs found
SKOPE: A connectionist/symbolic architecture of spoken Korean processing
Spoken language processing requires speech and natural language integration.
Moreover, spoken Korean calls for unique processing methodology due to its
linguistic characteristics. This paper presents SKOPE, a connectionist/symbolic
spoken Korean processing engine, which emphasizes that: 1) connectionist and
symbolic techniques must be selectively applied according to their relative
strength and weakness, and 2) the linguistic characteristics of Korean must be
fully considered for phoneme recognition, speech and language integration, and
morphological/syntactic processing. The design and implementation of SKOPE
demonstrates how connectionist/symbolic hybrid architectures can be constructed
for spoken agglutinative language processing. Also SKOPE presents many novel
ideas for speech and language processing. The phoneme recognition,
morphological analysis, and syntactic analysis experiments show that SKOPE is a
viable approach for the spoken Korean processing.Comment: 8 pages, latex, use aaai.sty & aaai.bst, bibfile: nlpsp.bib, to be
presented at IJCAI95 workshops on new approaches to learning for natural
language processin
Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean
A new tightly coupled speech and natural language integration model is
presented for a TDNN-based continuous possibly large vocabulary speech
recognition system for Korean. Unlike popular n-best techniques developed for
integrating mainly HMM-based speech recognition and natural language processing
in a {\em word level}, which is obviously inadequate for morphologically
complex agglutinative languages, our model constructs a spoken language system
based on a {\em morpheme-level} speech and language integration. With this
integration scheme, the spoken Korean processing engine (SKOPE) is designed and
implemented using a TDNN-based diphone recognition module integrated with a
Viterbi-based lexical decoding and symbolic phonological/morphological
co-analysis. Our experiment results show that the speaker-dependent continuous
{\em eojeol} (Korean word) recognition and integrated morphological analysis
can be achieved with over 80.6% success rate directly from speech inputs for
the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer
processing of oriental language journa
Enhancing dynamic symbolic execution via loop summarisation, segmented memory and pending constraints
Software has become ubiquitous and its impact is still increasing. The more software is
created, the more bugs get introduced into it. With software’s increasing omnipresence,
these bugs have a high probability of negative impact on everyday life. There are many
efforts aimed at improving software correctness, among which symbolic execution, a program
analysis technique that aims to systematically explore all program paths. In this thesis we
present three techniques for enhancing symbolic execution.
We first present a counterexample-guided inductive synthesis approach to summarise a
class of loops, called memoryless loops using standard library functions. Our approach can
summarize two thirds of memoryless loops we gathered on a set of open-source programs.
These loop summaries can be used to: 1) enhance symbolic execution, 2) optimise native
code and 3) refactor code.
We then propose a technique that avoids expensive forking by using a segmented memory
model. In this model, we split memory into segments using pointer alias analysis, so that each
symbolic pointer refers to objects in a single segment. This results in a memory model where
forking due to symbolic pointer dereferences is reduced. We evaluate our segmented memory
model on benchmarks such as SQLite, m4 and make and observe significant decreases in
execution time and memory usage.
Finally, we present pending constraints, which can enhance scalability of symbolic
execution by aggressively prioritising execution paths that are already known to be feasible
either via cached solver solutions or seeds. The execution of other paths is deferred until
no paths are known to be feasible without using the constraint solver. We evaluate our
technique on nine applications, including SQLite3, make and tcpdump, and show it can
achieve higher coverage for both seeded and non-seeded exploration.Open Acces
Introduction of statistical information in a syntactic analyser for document image recognition
International audienceThis paper presents an improvement to document layout analysis systems, oering a possible solution to Sayre's paradox (which states that an element must be recognized before it can be segmented; and it must be segmented before it can be recognized). This improvement, based on stochastic parsing, allows integration of statistical information, obtained from recognizers, during syntactic layout analysis. We present how this fusion of numeric and symbolic information in a feedback loop can be applied to syntactic methods to improve document description expressiveness. To limit combinatorial explosion during exploration of solutions, we devised an operator that allows optional activation of the stochastic parsing mechanism. Our evaluation on 1250 handwritten business letters shows this method allows the improvement of global recognition scores
Resources for Evaluation of Summarization Techniques
We report on two corpora to be used in the evaluation of component systems
for the tasks of (1) linear segmentation of text and (2) summary-directed
sentence extraction. We present characteristics of the corpora, methods used in
the collection of user judgments, and an overview of the application of the
corpora to evaluating the component system. Finally, we discuss the problems
and issues with construction of the test set which apply broadly to the
construction of evaluation resources for language technologies.Comment: LaTeX source, 5 pages, US Letter, uses lrec98.st
ANN-based Innovative Segmentation Method for Handwritten text in Assamese
Artificial Neural Network (ANN) s has widely been used for recognition of optically scanned character, which partially emulates human thinking in the domain of the Artificial Intelligence. But prior to recognition, it is necessary to segment the character from the text to sentences, words etc. Segmentation of words into individual letters has been one of the major problems in handwriting recognition. Despite several successful works all over the work, development of such tools in specific languages is still an ongoing process especially in the Indian context. This work explores the application of ANN as an aid to segmentation of handwritten characters in Assamese- an important language in the North Eastern part of India. The work explores the performance difference obtained in applying an ANN-based dynamic segmentation algorithm compared to projection- based static segmentation. The algorithm involves, first training of an ANN with individual handwritten characters recorded from different individuals. Handwritten sentences are separated out from text using a static segmentation method. From the segmented line, individual characters are separated out by first over segmenting the entire line. Each of the segments thus obtained, next, is fed to the trained ANN. The point of segmentation at which the ANN recognizes a segment or a combination of several segments to be similar to a handwritten character, a segmentation boundary for the character is assumed to exist and segmentation performed. The segmented character is next compared to the best available match and the segmentation boundary confirmed
Measures of Analysis of Time Series (MATS): A MATLAB Toolkit for Computation of Multiple Measures on Time Series Data Bases
In many applications, such as physiology and finance, large time series data
bases are to be analyzed requiring the computation of linear, nonlinear and
other measures. Such measures have been developed and implemented in commercial
and freeware softwares rather selectively and independently. The Measures of
Analysis of Time Series ({\tt MATS}) {\tt MATLAB} toolkit is designed to handle
an arbitrary large set of scalar time series and compute a large variety of
measures on them, allowing for the specification of varying measure parameters
as well. The variety of options with added facilities for visualization of the
results support different settings of time series analysis, such as the
detection of dynamics changes in long data records, resampling (surrogate or
bootstrap) tests for independence and linearity with various test statistics,
and discrimination power of different measures and for different combinations
of their parameters. The basic features of {\tt MATS} are presented and the
implemented measures are briefly described. The usefulness of {\tt MATS} is
illustrated on some empirical examples along with screenshots.Comment: 25 pages, 9 figures, two tables, the software can be downloaded at
http://eeganalysis.web.auth.gr/indexen.ht
- …