4,014 research outputs found
Text fragment extraction using incremental evolving fuzzy grammar fragments learner
Additional structure within free texts can be utilized to assist in identification of matching items and can benefit many intelligent text pattern recognition applications. This paper presents an incremental evolving fuzzy grammar (IEFG) method that focuses on the learning of underlying text fragment patterns and provides an efficient fuzzy grammar representation that exploits both syntactic and semantic properties. This notion is quantified via (i) fuzzy membership which measures the degree of membership for a text fragment in a semantic grammar class and (ii) fuzzy grammar similarity which estimates the similarity between two grammars (iii) grammar combination which combines and generalizes the grammar at a minimal generalization. Terrorism incidents data from the United States World Incidents Tracking System (WITS) are used in experiments and presented throughout the paper. A comparison with regular expression methods is made in identification of text fragments representing times. The application of text fragment extraction using IEFG is demonstrated in event type, victim type, dead count and wounded count detection with WITS XML-tagged data used as golden standard. Results have shown the efficiency and practicality of IEFG
Using Program Synthesis for Program Analysis
In this paper, we identify a fragment of second-order logic with restricted
quantification that is expressive enough to capture numerous static analysis
problems (e.g. safety proving, bug finding, termination and non-termination
proving, superoptimisation). We call this fragment the {\it synthesis
fragment}. Satisfiability of a formula in the synthesis fragment is decidable
over finite domains; specifically the decision problem is NEXPTIME-complete. If
a formula in this fragment is satisfiable, a solution consists of a satisfying
assignment from the second order variables to \emph{functions over finite
domains}. To concretely find these solutions, we synthesise \emph{programs}
that compute the functions. Our program synthesis algorithm is complete for
finite state programs, i.e. every \emph{function} over finite domains is
computed by some \emph{program} that we can synthesise. We can therefore use
our synthesiser as a decision procedure for the synthesis fragment of
second-order logic, which in turn allows us to use it as a powerful backend for
many program analysis tasks. To show the tractability of our approach, we
evaluate the program synthesiser on several static analysis problems.Comment: 19 pages, to appear in LPAR 2015. arXiv admin note: text overlap with
arXiv:1409.492
Synthesizing Program Input Grammars
We present an algorithm for synthesizing a context-free grammar encoding the
language of valid program inputs from a set of input examples and blackbox
access to the program. Our algorithm addresses shortcomings of existing grammar
inference algorithms, which both severely overgeneralize and are prohibitively
slow. Our implementation, GLADE, leverages the grammar synthesized by our
algorithm to fuzz test programs with structured inputs. We show that GLADE
substantially increases the incremental coverage on valid inputs compared to
two baseline fuzzers
Completability vs (In)completeness
In everyday conversation, no notion of “complete sentence” is required for syntactic licensing. However, so-called “fragmentary”, “incomplete”, and abandoned utterances are problematic for standard formalisms. When contextualised, such data show that (a) non-sentential utterances are adequate to underpin agent coordination, while (b) all linguistic dependencies can be systematically distributed across participants and turns. Standard models have problems accounting for such data because their notions of ‘constituency’ and ‘syntactic domain’ are independent of performance considerations. Concomitantly, we argue that no notion of “full proposition” or encoded speech act is necessary for successful interaction: strings, contents, and joint actions emerge in conversation without any single participant having envisaged in advance the outcome of their own or their interlocutors’ actions. Nonetheless, morphosyntactic and semantic licensing mechanisms need to apply incrementally and subsententially. We argue that, while a representational level of abstract syntax, divorced from conceptual structure and physical action, impedes natural accounts of subsentential coordination phenomena, a view of grammar as a “skill” employing domain-general mechanisms, rather than fixed form-meaning mappings, is needed instead. We provide a sketch of a predictive and incremental architecture (Dynamic Syntax) within which underspecification and time-relative update of meanings and utterances constitute the sole concept of “syntax”
Bibliometric Survey on Incremental Learning in Text Classification Algorithms for False Information Detection
The false information or misinformation over the web has severe effects on people, business and society as a whole. Therefore, detection of misinformation has become a topic of research among many researchers. Detecting misinformation of textual articles is directly connected to text classification problem. With the massive and dynamic generation of unstructured textual documents over the web, incremental learning in text classification has gained more popularity. This survey explores recent advancements in incremental learning in text classification and review the research publications of the area from Scopus, Web of Science, Google Scholar, and IEEE databases and perform quantitative analysis by using methods such as publication statistics, collaboration degree, research network analysis, and citation analysis. The contribution of this study in incremental learning in text classification provides researchers insights on the latest status of the research through literature survey, and helps the researchers to know the various applications and the techniques used recently in the field
Prediction of the MSCI EURO index based on fuzzy grammar fragments extracted from European central bank statements
We focus on predicting the movement of the MSCI EURO index based on European Central Bank (ECB) statements. For this purpose we learn and extract fuzzy grammars from the text of the ECB statements. Based on a set of selected General Inquirer (GI) categories, the extracted fuzzy grammars are grouped around individual content categories. The frequency at which these fuzzy grammars are encountered in the text constitute input to a Fuzzy Inference System (FIS). The FIS maps these frequencies to the levels of the MSCI EURO index. Ultimately, the goal is to predict whether the MSCI EURO index will exhibit upward or downward movement based on the content of ECB statements, as quantified through the use of fuzzy grammars and GI content categories
SCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis Using Artificial Neural Networks
In this paper, we describe a so-called screening approach for learning robust
processing of spontaneously spoken language. A screening approach is a flat
analysis which uses shallow sequences of category representations for analyzing
an utterance at various syntactic, semantic and dialog levels. Rather than
using a deeply structured symbolic analysis, we use a flat connectionist
analysis. This screening approach aims at supporting speech and language
processing by using (1) data-driven learning and (2) robustness of
connectionist networks. In order to test this approach, we have developed the
SCREEN system which is based on this new robust, learned and flat analysis.
In this paper, we focus on a detailed description of SCREEN's architecture,
the flat syntactic and semantic analysis, the interaction with a speech
recognizer, and a detailed evaluation analysis of the robustness under the
influence of noisy or incomplete input. The main result of this paper is that
flat representations allow more robust processing of spontaneous spoken
language than deeply structured representations. In particular, we show how the
fault-tolerance and learning capability of connectionist networks can support a
flat analysis for providing more robust spoken-language processing within an
overall hybrid symbolic/connectionist framework.Comment: 51 pages, Postscript. To be published in Journal of Artificial
Intelligence Research 6(1), 199
Towards Incremental Parsing of Natural Language using Recursive Neural Networks
In this paper we develop novel algorithmic ideas for building a natural language
parser grounded upon the hypothesis of incrementality. Although widely accepted
and experimentally supported under a cognitive perspective as a model of the human
parser, the incrementality assumption has never been exploited for building automatic
parsers of unconstrained real texts. The essentials of the hypothesis are that words are
processed in a left-to-right fashion, and the syntactic structure is kept totally connected
at each step.
Our proposal relies on a machine learning technique for predicting the correctness of
partial syntactic structures that are built during the parsing process. A recursive neural
network architecture is employed for computing predictions after a training phase on
examples drawn from a corpus of parsed sentences, the Penn Treebank. Our results
indicate the viability of the approach andlay out the premises for a novel generation of
algorithms for natural language processing which more closely model human parsing.
These algorithms may prove very useful in the development of eÆcient parsers
- …