4,014 research outputs found

    Text fragment extraction using incremental evolving fuzzy grammar fragments learner

    Get PDF
    Additional structure within free texts can be utilized to assist in identification of matching items and can benefit many intelligent text pattern recognition applications. This paper presents an incremental evolving fuzzy grammar (IEFG) method that focuses on the learning of underlying text fragment patterns and provides an efficient fuzzy grammar representation that exploits both syntactic and semantic properties. This notion is quantified via (i) fuzzy membership which measures the degree of membership for a text fragment in a semantic grammar class and (ii) fuzzy grammar similarity which estimates the similarity between two grammars (iii) grammar combination which combines and generalizes the grammar at a minimal generalization. Terrorism incidents data from the United States World Incidents Tracking System (WITS) are used in experiments and presented throughout the paper. A comparison with regular expression methods is made in identification of text fragments representing times. The application of text fragment extraction using IEFG is demonstrated in event type, victim type, dead count and wounded count detection with WITS XML-tagged data used as golden standard. Results have shown the efficiency and practicality of IEFG

    Using Program Synthesis for Program Analysis

    Get PDF
    In this paper, we identify a fragment of second-order logic with restricted quantification that is expressive enough to capture numerous static analysis problems (e.g. safety proving, bug finding, termination and non-termination proving, superoptimisation). We call this fragment the {\it synthesis fragment}. Satisfiability of a formula in the synthesis fragment is decidable over finite domains; specifically the decision problem is NEXPTIME-complete. If a formula in this fragment is satisfiable, a solution consists of a satisfying assignment from the second order variables to \emph{functions over finite domains}. To concretely find these solutions, we synthesise \emph{programs} that compute the functions. Our program synthesis algorithm is complete for finite state programs, i.e. every \emph{function} over finite domains is computed by some \emph{program} that we can synthesise. We can therefore use our synthesiser as a decision procedure for the synthesis fragment of second-order logic, which in turn allows us to use it as a powerful backend for many program analysis tasks. To show the tractability of our approach, we evaluate the program synthesiser on several static analysis problems.Comment: 19 pages, to appear in LPAR 2015. arXiv admin note: text overlap with arXiv:1409.492

    Synthesizing Program Input Grammars

    Full text link
    We present an algorithm for synthesizing a context-free grammar encoding the language of valid program inputs from a set of input examples and blackbox access to the program. Our algorithm addresses shortcomings of existing grammar inference algorithms, which both severely overgeneralize and are prohibitively slow. Our implementation, GLADE, leverages the grammar synthesized by our algorithm to fuzz test programs with structured inputs. We show that GLADE substantially increases the incremental coverage on valid inputs compared to two baseline fuzzers

    Completability vs (In)completeness

    Get PDF
    In everyday conversation, no notion of “complete sentence” is required for syntactic licensing. However, so-called “fragmentary”, “incomplete”, and abandoned utterances are problematic for standard formalisms. When contextualised, such data show that (a) non-sentential utterances are adequate to underpin agent coordination, while (b) all linguistic dependencies can be systematically distributed across participants and turns. Standard models have problems accounting for such data because their notions of ‘constituency’ and ‘syntactic domain’ are independent of performance considerations. Concomitantly, we argue that no notion of “full proposition” or encoded speech act is necessary for successful interaction: strings, contents, and joint actions emerge in conversation without any single participant having envisaged in advance the outcome of their own or their interlocutors’ actions. Nonetheless, morphosyntactic and semantic licensing mechanisms need to apply incrementally and subsententially. We argue that, while a representational level of abstract syntax, divorced from conceptual structure and physical action, impedes natural accounts of subsentential coordination phenomena, a view of grammar as a “skill” employing domain-general mechanisms, rather than fixed form-meaning mappings, is needed instead. We provide a sketch of a predictive and incremental architecture (Dynamic Syntax) within which underspecification and time-relative update of meanings and utterances constitute the sole concept of “syntax”

    Bibliometric Survey on Incremental Learning in Text Classification Algorithms for False Information Detection

    Get PDF
    The false information or misinformation over the web has severe effects on people, business and society as a whole. Therefore, detection of misinformation has become a topic of research among many researchers. Detecting misinformation of textual articles is directly connected to text classification problem. With the massive and dynamic generation of unstructured textual documents over the web, incremental learning in text classification has gained more popularity. This survey explores recent advancements in incremental learning in text classification and review the research publications of the area from Scopus, Web of Science, Google Scholar, and IEEE databases and perform quantitative analysis by using methods such as publication statistics, collaboration degree, research network analysis, and citation analysis. The contribution of this study in incremental learning in text classification provides researchers insights on the latest status of the research through literature survey, and helps the researchers to know the various applications and the techniques used recently in the field

    Prediction of the MSCI EURO index based on fuzzy grammar fragments extracted from European central bank statements

    Get PDF
    We focus on predicting the movement of the MSCI EURO index based on European Central Bank (ECB) statements. For this purpose we learn and extract fuzzy grammars from the text of the ECB statements. Based on a set of selected General Inquirer (GI) categories, the extracted fuzzy grammars are grouped around individual content categories. The frequency at which these fuzzy grammars are encountered in the text constitute input to a Fuzzy Inference System (FIS). The FIS maps these frequencies to the levels of the MSCI EURO index. Ultimately, the goal is to predict whether the MSCI EURO index will exhibit upward or downward movement based on the content of ECB statements, as quantified through the use of fuzzy grammars and GI content categories

    SCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis Using Artificial Neural Networks

    Get PDF
    In this paper, we describe a so-called screening approach for learning robust processing of spontaneously spoken language. A screening approach is a flat analysis which uses shallow sequences of category representations for analyzing an utterance at various syntactic, semantic and dialog levels. Rather than using a deeply structured symbolic analysis, we use a flat connectionist analysis. This screening approach aims at supporting speech and language processing by using (1) data-driven learning and (2) robustness of connectionist networks. In order to test this approach, we have developed the SCREEN system which is based on this new robust, learned and flat analysis. In this paper, we focus on a detailed description of SCREEN's architecture, the flat syntactic and semantic analysis, the interaction with a speech recognizer, and a detailed evaluation analysis of the robustness under the influence of noisy or incomplete input. The main result of this paper is that flat representations allow more robust processing of spontaneous spoken language than deeply structured representations. In particular, we show how the fault-tolerance and learning capability of connectionist networks can support a flat analysis for providing more robust spoken-language processing within an overall hybrid symbolic/connectionist framework.Comment: 51 pages, Postscript. To be published in Journal of Artificial Intelligence Research 6(1), 199

    Towards Incremental Parsing of Natural Language using Recursive Neural Networks

    Get PDF
    In this paper we develop novel algorithmic ideas for building a natural language parser grounded upon the hypothesis of incrementality. Although widely accepted and experimentally supported under a cognitive perspective as a model of the human parser, the incrementality assumption has never been exploited for building automatic parsers of unconstrained real texts. The essentials of the hypothesis are that words are processed in a left-to-right fashion, and the syntactic structure is kept totally connected at each step. Our proposal relies on a machine learning technique for predicting the correctness of partial syntactic structures that are built during the parsing process. A recursive neural network architecture is employed for computing predictions after a training phase on examples drawn from a corpus of parsed sentences, the Penn Treebank. Our results indicate the viability of the approach andlay out the premises for a novel generation of algorithms for natural language processing which more closely model human parsing. These algorithms may prove very useful in the development of eÆcient parsers
    corecore