978 research outputs found
Pattern Mining for Named Entity Recognition
International audienceMany evaluation campaigns have shown that knowledge-based and data-driven approaches remain equally competitive for Named Entity Recognition. Our re-search team has developed CasEN, a symbolic system based on finite state tran-ducers, which achieved promising results during the Ester2 French-speaking eval-uation campaign. Despite these encouraging results, manually extending the cov-erage of such a hand-crafted system is a difficult task. In this paper, we present a novel approach based on pattern mining for NER and to supplement our sys-tem's knowledge base. The system, mXS, exhaustively searches for hierarchical sequential patterns, that aim at detecting Named Entity boundaries. We assess their efficiency by using such patterns in a standalone mode and in combination with our existing system
A HMM POS Tagger for Micro-blogging Type Texts
The high volume of communication via micro-blogging type messages has created an increased demand for text processing tools customised the unstructured text genre. The available text processing tools developed on structured texts has been shown to deteriorate significantly when used on unstructured, micro-blogging type texts. In this paper, we present the results of testing a HMM based POS (Part-Of-Speech) tagging model customized for unstructured texts. We also evaluated the tagger against published CRF based state-of-the-art POS tagging models customized for Tweet messages using three publicly available Tweet corpora. Finally, we did cross-validation tests with both the taggers by training them on one Tweet corpus and testing them on another one
Growing a list
It is easy to find expert knowledge on the Internet on almost any topic, but obtaining a complete overview of a given topic is not always easy: Information can be scattered across many sources and must be aggregated to be useful. We introduce a method for intelligently growing a list of relevant items, starting from a small seed of examples. Our algorithm takes advantage of the wisdom of the crowd, in the sense that there are many experts who post lists of things on the Internet. We use a collection of simple machine learning components to find these experts and aggregate their lists to produce a single complete and meaningful list. We use experiments with gold standards and open-ended experiments without gold standards to show that our method significantly outperforms the state of the art. Our method uses the clustering algorithm Bayesian Sets even when its underlying independence assumption is violated, and we provide a theoretical generalization bound to motivate its use.
Winding of planar gaussian processes
We consider a smooth, rotationally invariant, centered gaussian process in
the plane, with arbitrary correlation matrix . We study the winding
angle around its center. We obtain a closed formula for the variance
of the winding angle as a function of the matrix . For most stationary
processes the winding angle exhibits diffusion at large time
with diffusion coefficient .
Correlations of with integer , the distribution of the
angular velocity , and the variance of the algebraic area are also
obtained. For smooth processes with stationary increments (random walks) the
variance of the winding angle grows as , with proper
generalizations to the various classes of fractional Brownian motion. These
results are tested numerically. Non integer is studied numerically.Comment: 12 pages, 6 figure
Creativity and Autonomy in Swarm Intelligence Systems
This work introduces two swarm intelligence algorithms -- one mimicking the behaviour of one species of ants (\emph{Leptothorax acervorum}) foraging (a `Stochastic Diffusion Search', SDS) and the other algorithm mimicking the behaviour of birds flocking (a `Particle Swarm Optimiser', PSO) -- and outlines a novel integration strategy exploiting the local search properties of the PSO with global SDS behaviour. The resulting hybrid algorithm is used to sketch novel drawings of an input image, exploliting an artistic tension between the local behaviour of the `birds flocking' - as they seek to follow the input sketch - and the global behaviour of the `ants foraging' - as they seek to encourage the flock to explore novel regions of the canvas. The paper concludes by exploring the putative `creativity' of this hybrid swarm system in the philosophical light of the `rhizome' and Deleuze's well known `Orchid and Wasp' metaphor
Next big challenges in core AI technology
Algorithms and the Foundations of Software technolog
Computational fact checking from knowledge networks
Traditional fact checking by expert journalists cannot keep up with the
enormous volume of information that is now generated online. Computational fact
checking may significantly enhance our ability to evaluate the veracity of
dubious information. Here we show that the complexities of human fact checking
can be approximated quite well by finding the shortest path between concept
nodes under properly defined semantic proximity metrics on knowledge graphs.
Framed as a network problem this approach is feasible with efficient
computational techniques. We evaluate this approach by examining tens of
thousands of claims related to history, entertainment, geography, and
biographical information using a public knowledge graph extracted from
Wikipedia. Statements independently known to be true consistently receive
higher support via our method than do false ones. These findings represent a
significant step toward scalable computational fact-checking methods that may
one day mitigate the spread of harmful misinformation
Pharmacoeconomic analysis of adjuvant oral capecitabine vs intravenous 5-FU/LV in Dukes' C colon cancer: the X-ACT trial
Oral capecitabine (Xeloda<sup>®</sup>) is an effective drug with favourable safety in adjuvant and metastatic colorectal cancer. Oxaliplatin-based therapy is becoming standard for Dukes' C colon cancer in patients suitable for combination therapy, but is not yet approved by the UK National Institute for Health and Clinical Excellence (NICE) in the adjuvant setting. Adjuvant capecitabine is at least as effective as 5-fluorouracil/leucovorin (5-FU/LV), with significant superiority in relapse-free survival and a trend towards improved disease-free and overall survival. We assessed the cost-effectiveness of adjuvant capecitabine from payer (UK National Health Service (NHS)) and societal perspectives. We used clinical trial data and published sources to estimate incremental direct and societal costs and gains in quality-adjusted life months (QALMs). Acquisition costs were higher for capecitabine than 5-FU/LV, but higher 5-FU/LV administration costs resulted in 57% lower chemotherapy costs for capecitabine. Capecitabine vs 5-FU/LV-associated adverse events required fewer medications and hospitalisations (cost savings £3653). Societal costs, including patient travel/time costs, were reduced by >75% with capecitabine vs 5-FU/LV (cost savings £1318), with lifetime gain in QALMs of 9 months. Medical resource utilisation is significantly decreased with capecitabine vs 5-FU/LV, with cost savings to the NHS and society. Capecitabine is also projected to increase life expectancy vs 5-FU/LV. Cost savings and better outcomes make capecitabine a preferred adjuvant therapy for Dukes' C colon cancer. This pharmacoeconomic analysis strongly supports replacing 5-FU/LV with capecitabine in the adjuvant treatment of colon cancer in the UK
Understanding Democracy and Development Traps Using a Data-Driven Approach
Methods from machine learning and data science are becoming increasingly important in the social sciences, providing powerful new ways of identifying statistical relationships in large data sets. However, these relationships do not necessarily offer an understanding of the processes underlying the data. To address this problem, we have developed a method for fitting nonlinear dynamical systems models to data related to social change. Here, we use this method to investigate how countries become trapped at low levels of socioeconomic development. We identify two types of traps. The first is a democracy trap, where countries with low levels of economic growth and/or citizen education fail to develop democracy. The second trap is in terms of cultural values, where countries with low levels of democracy and/or life expectancy fail to develop emancipative values. We show that many key developing countries, including India and Egypt, lie near the border of these development traps, and we investigate the time taken for these nations to transition toward higher democracy and socioeconomic well-being
- …