978 research outputs found

    Pattern Mining for Named Entity Recognition

    Get PDF
    International audienceMany evaluation campaigns have shown that knowledge-based and data-driven approaches remain equally competitive for Named Entity Recognition. Our re-search team has developed CasEN, a symbolic system based on finite state tran-ducers, which achieved promising results during the Ester2 French-speaking eval-uation campaign. Despite these encouraging results, manually extending the cov-erage of such a hand-crafted system is a difficult task. In this paper, we present a novel approach based on pattern mining for NER and to supplement our sys-tem's knowledge base. The system, mXS, exhaustively searches for hierarchical sequential patterns, that aim at detecting Named Entity boundaries. We assess their efficiency by using such patterns in a standalone mode and in combination with our existing system

    A HMM POS Tagger for Micro-blogging Type Texts

    Get PDF
    The high volume of communication via micro-blogging type messages has created an increased demand for text processing tools customised the unstructured text genre. The available text processing tools developed on structured texts has been shown to deteriorate significantly when used on unstructured, micro-blogging type texts. In this paper, we present the results of testing a HMM based POS (Part-Of-Speech) tagging model customized for unstructured texts. We also evaluated the tagger against published CRF based state-of-the-art POS tagging models customized for Tweet messages using three publicly available Tweet corpora. Finally, we did cross-validation tests with both the taggers by training them on one Tweet corpus and testing them on another one

    Growing a list

    Get PDF
    It is easy to find expert knowledge on the Internet on almost any topic, but obtaining a complete overview of a given topic is not always easy: Information can be scattered across many sources and must be aggregated to be useful. We introduce a method for intelligently growing a list of relevant items, starting from a small seed of examples. Our algorithm takes advantage of the wisdom of the crowd, in the sense that there are many experts who post lists of things on the Internet. We use a collection of simple machine learning components to find these experts and aggregate their lists to produce a single complete and meaningful list. We use experiments with gold standards and open-ended experiments without gold standards to show that our method significantly outperforms the state of the art. Our method uses the clustering algorithm Bayesian Sets even when its underlying independence assumption is violated, and we provide a theoretical generalization bound to motivate its use.

    Winding of planar gaussian processes

    Full text link
    We consider a smooth, rotationally invariant, centered gaussian process in the plane, with arbitrary correlation matrix CttC_{t t'}. We study the winding angle ϕt\phi_t around its center. We obtain a closed formula for the variance of the winding angle as a function of the matrix CttC_{tt'}. For most stationary processes Ctt=C(tt)C_{tt'}=C(t-t') the winding angle exhibits diffusion at large time with diffusion coefficient D=0dsC(s)2/(C(0)2C(s)2)D = \int_0^\infty ds C'(s)^2/(C(0)^2-C(s)^2). Correlations of exp(inϕt)\exp(i n \phi_t) with integer nn, the distribution of the angular velocity ϕ˙t\dot \phi_t, and the variance of the algebraic area are also obtained. For smooth processes with stationary increments (random walks) the variance of the winding angle grows as 1/2(lnt)2{1/2} (\ln t)^2, with proper generalizations to the various classes of fractional Brownian motion. These results are tested numerically. Non integer nn is studied numerically.Comment: 12 pages, 6 figure

    Creativity and Autonomy in Swarm Intelligence Systems

    Get PDF
    This work introduces two swarm intelligence algorithms -- one mimicking the behaviour of one species of ants (\emph{Leptothorax acervorum}) foraging (a `Stochastic Diffusion Search', SDS) and the other algorithm mimicking the behaviour of birds flocking (a `Particle Swarm Optimiser', PSO) -- and outlines a novel integration strategy exploiting the local search properties of the PSO with global SDS behaviour. The resulting hybrid algorithm is used to sketch novel drawings of an input image, exploliting an artistic tension between the local behaviour of the `birds flocking' - as they seek to follow the input sketch - and the global behaviour of the `ants foraging' - as they seek to encourage the flock to explore novel regions of the canvas. The paper concludes by exploring the putative `creativity' of this hybrid swarm system in the philosophical light of the `rhizome' and Deleuze's well known `Orchid and Wasp' metaphor

    Next big challenges in core AI technology

    Get PDF
    Algorithms and the Foundations of Software technolog

    Computational fact checking from knowledge networks

    Get PDF
    Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation

    Pharmacoeconomic analysis of adjuvant oral capecitabine vs intravenous 5-FU/LV in Dukes' C colon cancer: the X-ACT trial

    Get PDF
    Oral capecitabine (Xeloda<sup>®</sup>) is an effective drug with favourable safety in adjuvant and metastatic colorectal cancer. Oxaliplatin-based therapy is becoming standard for Dukes' C colon cancer in patients suitable for combination therapy, but is not yet approved by the UK National Institute for Health and Clinical Excellence (NICE) in the adjuvant setting. Adjuvant capecitabine is at least as effective as 5-fluorouracil/leucovorin (5-FU/LV), with significant superiority in relapse-free survival and a trend towards improved disease-free and overall survival. We assessed the cost-effectiveness of adjuvant capecitabine from payer (UK National Health Service (NHS)) and societal perspectives. We used clinical trial data and published sources to estimate incremental direct and societal costs and gains in quality-adjusted life months (QALMs). Acquisition costs were higher for capecitabine than 5-FU/LV, but higher 5-FU/LV administration costs resulted in 57% lower chemotherapy costs for capecitabine. Capecitabine vs 5-FU/LV-associated adverse events required fewer medications and hospitalisations (cost savings £3653). Societal costs, including patient travel/time costs, were reduced by >75% with capecitabine vs 5-FU/LV (cost savings £1318), with lifetime gain in QALMs of 9 months. Medical resource utilisation is significantly decreased with capecitabine vs 5-FU/LV, with cost savings to the NHS and society. Capecitabine is also projected to increase life expectancy vs 5-FU/LV. Cost savings and better outcomes make capecitabine a preferred adjuvant therapy for Dukes' C colon cancer. This pharmacoeconomic analysis strongly supports replacing 5-FU/LV with capecitabine in the adjuvant treatment of colon cancer in the UK

    Understanding Democracy and Development Traps Using a Data-Driven Approach

    Get PDF
    Methods from machine learning and data science are becoming increasingly important in the social sciences, providing powerful new ways of identifying statistical relationships in large data sets. However, these relationships do not necessarily offer an understanding of the processes underlying the data. To address this problem, we have developed a method for fitting nonlinear dynamical systems models to data related to social change. Here, we use this method to investigate how countries become trapped at low levels of socioeconomic development. We identify two types of traps. The first is a democracy trap, where countries with low levels of economic growth and/or citizen education fail to develop democracy. The second trap is in terms of cultural values, where countries with low levels of democracy and/or life expectancy fail to develop emancipative values. We show that many key developing countries, including India and Egypt, lie near the border of these development traps, and we investigate the time taken for these nations to transition toward higher democracy and socioeconomic well-being
    corecore