3,332 research outputs found

    Certified Context-Free Parsing: A formalisation of Valiant's Algorithm in Agda

    Get PDF
    Valiant (1975) has developed an algorithm for recognition of context free languages. As of today, it remains the algorithm with the best asymptotic complexity for this purpose. In this paper, we present an algebraic specification, implementation, and proof of correctness of a generalisation of Valiant's algorithm. The generalisation can be used for recognition, parsing or generic calculation of the transitive closure of upper triangular matrices. The proof is certified by the Agda proof assistant. The certification is representative of state-of-the-art methods for specification and proofs in proof assistants based on type-theory. As such, this paper can be read as a tutorial for the Agda system

    A Type-coherent, Expressive Representation as an Initial Step to Language Understanding

    Full text link
    A growing interest in tasks involving language understanding by the NLP community has led to the need for effective semantic parsing and inference. Modern NLP systems use semantic representations that do not quite fulfill the nuanced needs for language understanding: adequately modeling language semantics, enabling general inferences, and being accurately recoverable. This document describes underspecified logical forms (ULF) for Episodic Logic (EL), which is an initial form for a semantic representation that balances these needs. ULFs fully resolve the semantic type structure while leaving issues such as quantifier scope, word sense, and anaphora unresolved; they provide a starting point for further resolution into EL, and enable certain structural inferences without further resolution. This document also presents preliminary results of creating a hand-annotated corpus of ULFs for the purpose of training a precise ULF parser, showing a three-person pairwise interannotator agreement of 0.88 on confident annotations. We hypothesize that a divide-and-conquer approach to semantic parsing starting with derivation of ULFs will lead to semantic analyses that do justice to subtle aspects of linguistic meaning, and will enable construction of more accurate semantic parsers.Comment: Accepted for publication at The 13th International Conference on Computational Semantics (IWCS 2019

    An Intelligent Text Extraction and Navigation System

    Get PDF
    We present sppc, a high-performance system for intelligent text extraction and navigation from German free text documents. The main purpose of sppc is to extract as much linguistic structure as possible for performing domain-specific processing. sppc consists of a set of domain-independent shallow core components which are realized by means of cascaded weighted finite state machines and generic dynamic tries. All extracted information is represented uniformly in one data structure (called the text chart) in a highly compact and linked form in order to support indexing and navigation through the set of solutions. Germa

    A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure

    Get PDF
    BACKGROUND: Covariance models (CMs) are probabilistic models of RNA secondary structure, analogous to profile hidden Markov models of linear sequence. The dynamic programming algorithm for aligning a CM to an RNA sequence of length N is O(N(3)) in memory. This is only practical for small RNAs. RESULTS: I describe a divide and conquer variant of the alignment algorithm that is analogous to memory-efficient Myers/Miller dynamic programming algorithms for linear sequence alignment. The new algorithm has an O(N(2) log N) memory complexity, at the expense of a small constant factor in time. CONCLUSIONS: Optimal ribosomal RNA structural alignments that previously required up to 150 GB of memory now require less than 270 MB

    Web Mediators for Accessible Browsing

    Full text link
    We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number of all text characters on a web page. K-means clustering is used to create unique thresholds to differentiate index pages and article pages on individual web sites. Index pages contain mostly links to articles and other indices, while article pages contain mostly text. We also present a novel link grouping algorithm using agglomerative hierarchical clustering that groups links in the same spatial neighborhood together while preserving link structure. Grouping allows users with severe disabilities to use a scan-based mechanism to tab through a web page and select items. In experiments, we saw up to a 40-fold reduction in the number of commands needed to click on a link with a scan-based interface, which shows that we can vastly improve the rate of communication for users with disabilities. We used web page classification and link grouping to alter web page display on an accessible web browser that we developed to make a usable browsing interface for users with disabilities. Our classification method consistently outperformed a baseline classifier even when using minimal data to generate article and index clusters, and achieved classification accuracy of 94.0% on web sites with well-formed or slightly malformed HTML, compared with 80.1% accuracy for the baseline classifier.National Science Foundation (IIS-0308213, IIS-039009, IIS-0093367, P200A01031, EIA-0202067

    A Semantic Approach for Keyword Search on Relational Databases

    Get PDF
    Today’s search engines make it easier for the user to browse and query the online available data. But when it comes to structured data, the queries have to be structured too, in order to retrieve the data. This makes it difficult for novice users, with no knowledge of the underlying schema or query language, to access the relational data. Therefore, to query the structured data in an unstructured language of web, there is a need to map the user keyword queries to their equivalent SQL format. This research is intended to bridge the gap by introducing a framework named STRUCT. Unlike most of the existing work which pays very little attention to the contextual information provided by the user, our approach takes these details into account to elucidate the implied structural information necessary for constructing the SQL clauses. One fundamental issue on keyword search in traditional databases is how to interpret users’ information needs behind keywords they provided. A common approach of many prototype systems is to make such interpretation as a designer’s choice (such as imposing AND or OR semantics, or a combination), leaving no choice to users. A much more meaningful approach would be allowing users themselves to specify the required semantics through contextual information. So can we build a system which stays with the simplicity of Keyword search, yet can incorporate the contextual information provided in the user query? STRUCT answers this question by taking English language queries involving intended keywords. Instead of resorting on a full-fledged natural language processing, the unneeded words in the queries are discarded. Only the specific contextual information along with the keywords containing database contents will be used to construct SQL queries. The contextual information is used to interpret the meaning of the queries, including the semantics involving AND,OR and NOT. In this thesis we describe the architecture of STRUCT, procedure of English query processing (parsing), basic idea of the grouping algorithm, SQL query construction and sample results of experiments
    • …
    corecore