1,268 research outputs found

    Optimization issues in machine learning of coreference resolution

    Get PDF

    A Hybrid Environment for Syntax-Semantic Tagging

    Full text link
    The thesis describes the application of the relaxation labelling algorithm to NLP disambiguation. Language is modelled through context constraint inspired on Constraint Grammars. The constraints enable the use of a real value statind "compatibility". The technique is applied to POS tagging, Shallow Parsing and Word Sense Disambigation. Experiments and results are reported. The proposed approach enables the use of multi-feature constraint models, the simultaneous resolution of several NL disambiguation tasks, and the collaboration of linguistic and statistical models.Comment: PhD Thesis. 120 page

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

    Evaluating parts-of-speech taggers for use in a text-to-scene conversion system

    Get PDF
    This paper presents parts-of-speech tagging as a first step towards an autonomous text-to-scene conversion system. It categorizes some freely available taggers, according to the techniques used by each in order to automatically identify word-classes. In addition, the performance of each identified tagger is verified experimentally. The SUSANNE corpus is used for testing and reveals the complexity of working with different tagsets, resulting in substantially lower accuracies in our tests than in those reported by the developers of each tagger. The taggers are then grouped to form a voting system to attempt to raise accuracies, but in no cases do the combined results improve upon the individual accuracies. Additionally a new metric, agreement, is tentatively proposed as an indication of confidence in the output of a group of taggers where such output cannot be validated

    An Unsupervised Approach for Discovering Relevant Tutorial Fragments for APIs

    Full text link
    Developers increasingly rely on API tutorials to facilitate software development. However, it remains a challenging task for them to discover relevant API tutorial fragments explaining unfamiliar APIs. Existing supervised approaches suffer from the heavy burden of manually preparing corpus-specific annotated data and features. In this study, we propose a novel unsupervised approach, namely Fragment Recommender for APIs with PageRank and Topic model (FRAPT). FRAPT can well address two main challenges lying in the task and effectively determine relevant tutorial fragments for APIs. In FRAPT, a Fragment Parser is proposed to identify APIs in tutorial fragments and replace ambiguous pronouns and variables with related ontologies and API names, so as to address the pronoun and variable resolution challenge. Then, a Fragment Filter employs a set of nonexplanatory detection rules to remove non-explanatory fragments, thus address the non-explanatory fragment identification challenge. Finally, two correlation scores are achieved and aggregated to determine relevant fragments for APIs, by applying both topic model and PageRank algorithm to the retained fragments. Extensive experiments over two publicly open tutorial corpora show that, FRAPT improves the state-of-the-art approach by 8.77% and 12.32% respectively in terms of F-Measure. The effectiveness of key components of FRAPT is also validated.Comment: 11 pages, 8 figures, In Proc. of 39rd IEEE International Conference on Software Engineering (ICSE'17

    On the Combination of Game-Theoretic Learning and Multi Model Adaptive Filters

    Get PDF
    This paper casts coordination of a team of robots within the framework of game theoretic learning algorithms. In particular a novel variant of fictitious play is proposed, by considering multi-model adaptive filters as a method to estimate other players’ strategies. The proposed algorithm can be used as a coordination mechanism between players when they should take decisions under uncertainty. Each player chooses an action after taking into account the actions of the other players and also the uncertainty. Uncertainty can occur either in terms of noisy observations or various types of other players. In addition, in contrast to other game-theoretic and heuristic algorithms for distributed optimisation, it is not necessary to find the optimal parameters a priori. Various parameter values can be used initially as inputs to different models. Therefore, the resulting decisions will be aggregate results of all the parameter values. Simulations are used to test the performance of the proposed methodology against other game-theoretic learning algorithms.</p

    A Model of the Network Architecture of the Brain that Supports Natural Language Processing

    Get PDF
    For centuries, neuroscience has proposed models of the neurobiology of language processing that are static and localised to few temporal and inferior frontal regions. Although existing models have offered some insight into the processes underlying lower-level language features, they have largely overlooked how language operates in the real world. Here, we aimed at investigating the network organisation of the brain and how it supports language processing in a naturalistic setting. We hypothesised that the brain is organised in a multiple core-periphery and dynamic modular architecture, with canonical language regions forming high-connectivity hubs. Moreover, we predicted that language processing would be distributed to much of the rest of the brain, allowing it to perform more complex tasks and to share information with other cognitive domains. To test these hypotheses, we collected the Naturalistic Neuroimaging Database of people watching full length movies during functional magnetic resonance imaging. We computed network algorithms to capture the voxel-wise architecture of the brain in individual participants and inspected variations in activity distribution over different stimuli and over more complex language features. Our results confirmed the hypothesis that the brain is organised in a flexible multiple core-periphery architecture with large dynamic communities. Here, language processing was distributed to much of the rest of the brain, together forming multiple communities. Canonical language regions constituted hubs, explaining why they consistently appear in various other neurobiology of language models. Moreover, language processing was supported by other regions such as visual cortex and episodic memory regions, when processing more complex context-specific language features. Overall, our flexible and distributed model of language comprehension and the brain points to additional brain regions and pathways that could be exploited for novel and more individualised therapies for patients suffering from speech impairments
    • …
    corecore