947 research outputs found

    Building and querying parallel treebanks

    Get PDF
    This paper describes our work on building a trilingual parallel treebank. We have annotated constituent structure trees from three text genres (a philosophy novel, economy reports and a technical user manual). Our parallel treebank includes word and phrase alignments. The alignment information was manually checked using a graphical tool that allows the annotator to view a pair of trees from parallel sentences. This tool comes with a powerful search facility which supersedes the expressivity of previous popular treebank query engines

    The TIGER Corpus Navigator

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 91-102. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Algorithmic And Computational Approaches For Improving The Efficiency Of Mobile Genomic Element Discovery, A Bioinformatics Framework

    Get PDF
    Through this research, we are showcasing the application of computational approaches to the discoveries in the life sciences spectrum. Our current research not only focused on mobile genetic elements but also developed the computational methods that enabled these findings. We combined the biology sciences and computer science in our research, which is essentially multidisciplinary. To that end, this research intricately probed the role and implications of mobile genetic elements, emphasizing transposable elements. These dynamic components wielded substantial influence over genomic architecture\u27s structure, function, and evolutionary adaptations. An integral component of our study is the innovative computational tool, Target/IGE Retriever (TIGER), employed to detect and map these mobile genetic elements. Given the pronounced impact of these elements on gene regulation and their involvement in various genetic diseases, their precise detection and mapping within a genome were crucial for understanding intricate genetic dynamics and disease etiology. Addressing computational challenges, the study introduces three new algorithms to enhance TIGER\u27s performance, tested using E. coli genomes. This testing aimed to determine the impact of database size reduction on result accuracy and performance. Findings indicate that while prophage yields are less affected by database size, non-phage islands show sensitivity, suggesting performance improvements with smaller databases. Furthermore, the research conducts a comparative analysis of TIGER and BLAST outputs, focusing on validating transposons identified in E. coli genomes. This involves cross-referencing with established databases and employing statistical methods for match categorization, enhancing the authenticity of transposon location identification.. Within the purview of this rigorous analytical process, particular attention is accorded to evaluating sequence alignment results and the quality of BLAST hits, focusing specifically on identifying direct repeats within insertion sequences. The study underscores TIGER\u27s efficacy in transposon discovery and yields critical insights into its performance relative to BLAST. This research illuminates potential avenues for enhancing computational tools in bioinformatics, all within the larger framework of contributing significantly to genomics and bioinformatics research\u27s ongoing advancements. Our work deepens our understanding of the role and influence of mobile genetic elements on genomic architecture. Index Term: Computational biology, bioinformatics, mobile genetic elements, transposon, validation, database

    Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

    Full text link
    Large Language Models (LLMs) have shown human-like reasoning abilities but still struggle with complex logical problems. This paper introduces a novel framework, Logic-LM, which integrates LLMs with symbolic solvers to improve logical problem-solving. Our method first utilizes LLMs to translate a natural language problem into a symbolic formulation. Afterward, a deterministic symbolic solver performs inference on the formulated problem. We also introduce a self-refinement module, which utilizes the symbolic solver's error messages to revise symbolic formalizations. We demonstrate Logic-LM's effectiveness on five logical reasoning datasets: ProofWriter, PrOntoQA, FOLIO, LogicalDeduction, and AR-LSAT. On average, Logic-LM achieves a significant performance boost of 39.2% over using LLM alone with standard prompting and 18.4% over LLM with chain-of-thought prompting. Our findings suggest that Logic-LM, by combining LLMs with symbolic logic, offers a promising avenue for faithful logical reasoning. Code and data are publicly available at https://github.com/teacherpeterpan/Logic-LLM.Comment: EMNLP 2023 (Findings, long paper

    A Lightweight Framework for Universal Fragment Composition

    Get PDF
    Domain-specific languages (DSLs) are useful tools for coping with complexity in software development. DSLs provide developers with appropriate constructs for specifying and solving the problems they are faced with. While the exact definition of DSLs can vary, they can roughly be divided into two categories: embedded and non-embedded. Embedded DSLs (E-DSLs) are integrated into general-purpose host languages (e.g. Java), while non-embedded DSLs (NE-DSLs) are standalone languages with their own tooling (e.g. compilers or interpreters). NE-DSLs can for example be found on the Semantic Web where they are used for querying or describing shared domain models (ontologies). A common theme with DSLs is naturally their support of focused expressive power. However, in many cases they do not support non–domain-specific component-oriented constructs that can be useful for developers. Such constructs are standard in general-purpose languages (procedures, methods, packages, libraries etc.). While E-DSLs have access to such constructs via their host languages, NE-DSLs do not have this opportunity. Instead, to support such notions, each of these languages have to be extended and their tooling updated accordingly. Such modifications can be costly and must be done individually for each language. A solution method for one language cannot easily be reused for another. There currently exist no appropriate technology for tackling this problem in a general manner. Apart from identifying the need for a general approach to address this issue, we extend existing composition technology to provide a language-inclusive solution. We build upon fragment-based composition techniques and make them applicable to arbitrary (context-free) languages. We call this process for the composition techniques’ universalization. The techniques are called fragment-based since their view of components— reusable software units with interfaces—are pieces of source code that conform to an underlying (context-free) language grammar. The universalization process is grammar-driven: given a base language grammar and a description of the compositional needs wrt. the composition techniques, an adapted grammar is created that corresponds to the specified needs. The result is thus an adapted grammar that forms the foundation for allowing to define and compose the desired fragments. We further build upon this grammar-driven universalization approach to allow developers to define the non–domain-specific component-oriented constructs that are needed for NE-DSLs. Developers are able to define both what those constructs should be, and how they are to be interpreted (via composition). Thus, developers can effectively define language extensions and their semantics. This solution is presented in a framework that can be reused for different languages, even if their notion of ‘components’ differ. To demonstrate the approach and show its applicability, we apply it to two Semantic Web related NE-DSLs that are in need of component-oriented constructs. We introduce modules to the rule-based Web query language Xcerpt and role models to the Web Ontology Language OWL

    Annotation, exploitation and evaluation of parallel corpora

    Get PDF
    Exchange between the translation studies and the computational linguistics communities has traditionally not been very intense. Among other things, this is reflected by the different views on parallel corpora. While computational linguistics does not always strictly pay attention to the translation direction (e.g. when translation rules are extracted from (sub)corpora which actually only consist of translations), translation studies are amongst other things concerned with exactly comparing source and target texts (e.g. to draw conclusions on interference and standardization effects). However, there has recently been more exchange between the two fields – especially when it comes to the annotation of parallel corpora. This special issue brings together the different research perspectives. Its contributions show – from both perspectives – how the communities have come to interact in recent years

    Proceedings

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Generics, laws and context

    Get PDF

    The Semantics and Acquisition of Time in Language

    Get PDF
    This dissertation is about the structure of temporal semantics and children’s acquisition of temporal language. It argues for the importance of investigating semantics both at the abstract level of linguistic structures and at the concrete level of the time-course of acquisition, as these two levels provide natural constraints for each other. With respect to semantics, it provides a computationally inspired analysis of tense, grammatical aspect and lexical aspect that uses finite state automata to dynamically calculate the progress of an event over a time interval. It is shown that the analysis can account for many well-known temporal phenomena, such as the different entailments of telic and atelic predicates in the imperfective aspect (the imperfective paradox), and the various unified and serial interpretations of sentences involving a cardinally quantified phrase, such as Three Ringlings visited Florida. With respect to children’s acquisition of temporal language, the dissertation investigates the Aspect First hypothesis which states that children initially use tense and grammatical aspect morphology to mark the lexical aspect property of telicity. Two forced-choice comprehension experiments were conducted with children aged 2.5 to 5 years old to test children’s understanding of tense and grammatical aspect morphology; in a control condition, open class cues were used to test children’s conceptual competence with tense and grammatical aspect information independently of their competence with the relevant morphology (e.g., in the middle of and in a few seconds were the open class cues for imperfective aspect and future tense, respectively). Results showed that even the youngest children understood the concepts underlying tense and grammatical aspect as measured by their performance with the open class cues but they did not demonstrate adult competence with the closed class morphology for grammatical aspect and did so only marginally for tense. Comprehension of tense morphology preceded that of grammatical aspect morphology and in particular, children showed an early facility with markers of the future tense
    corecore