510 research outputs found

    Génération de résumés de mise à jour : Utilisation d'un algorithme de classification non supervisée pour détecter la nouveauté dans les articles de presse

    No full text
    Dans cet article, nous présentons un système de résumé automatique multi-documents, dédié au résumé de mise à jour – ou de nouveauté. Dans une première partie, nous présentons la méthode sur laquelle notre système est fondé, CBSEAS, et son adaptation à la tâche de résumé de mise à jour. Générer des résumés de mise à jour est une tâche plus compliquée que de générer des résumés « standard », et nécessite une évaluation spécifique. Nous décrivons ensuite la tâche « Résumé de mise à jour » de TAC 2009, à laquelle nous avons participé afin d'évaluer notre système. Cette campagne d'évaluation internationale nous a permis de confronter notre système à d'autres systèmes de résumé automatique. Finalement, nous présentons et discutons les résultats intéressants obtenus par notre système

    Precision Measurement of the Radiative B\Beta Decay of the Free Neutron

    Get PDF
    The standard model predicts that, in addition to a proton, an electron, and an antineutrino, a continuous spectrum of photons is emitted in the β\beta decay of the free neutron. We report on the RDK II experiment which measured the photon spectrum using two different detector arrays. An annular array of bismuth germanium oxide scintillators detected photons from 14 to 782~keV. The spectral shape was consistent with theory, and we determined a branching ratio of 0.00335 ±\pm 0.00005 [stat] ±\pm 0.00015 [syst]. A second detector array of large area avalanche photodiodes directly detected photons from 0.4 to 14~keV. For this array, the spectral shape was consistent with theory, and the branching ratio was determined to be 0.00582 ±\pm 0.00023 [stat] ±\pm 0.00062 [syst]. We report the first precision test of the shape of the photon energy spectrum from neutron radiative decay and a substantially improved determination of the branching ratio over a broad range of photon energies

    Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps

    Full text link
    Concept maps can be used to concisely represent important information and bring structure into large document collections. Therefore, we study a variant of multi-document summarization that produces summaries in the form of concept maps. However, suitable evaluation datasets for this task are currently missing. To close this gap, we present a newly created corpus of concept maps that summarize heterogeneous collections of web documents on educational topics. It was created using a novel crowdsourcing approach that allows us to efficiently determine important elements in large document collections. We release the corpus along with a baseline system and proposed evaluation protocol to enable further research on this variant of summarization.Comment: Published at EMNLP 201

    LQVSumm: a corpus of linguistic quality violations in multi-document summarization

    Get PDF
    We present LQVSumm, a corpus of about 2000 automatically created extractive multi-document summaries from the TAC 2011 shared task on Guided Summarization, which we annotated with several types of linguistic quality violations. Examples for such violations include pronouns that lack antecedents or ungrammatical clauses. We give details on the annotation scheme and show that inter-annotator agreement is good given the open-ended nature of the task. The annotated summaries have previously been scored for Readability on a numeric scale by human annotators in the context of the TAC challenge; we show that the number of instances of violations of linguistic quality of a summary correlates with these intuitively assigned numeric scores. On a system-level, the average number of violations marked in a system’s summaries achieves higher correlation with the Readability scores than current supervised state-of-the-art methods for assigning a single readability score to a summary. It is our hope that our corpus facilitates the development of methods that not only judge the linguistic quality of automatically generated summaries as a whole, but which also allow for detecting, labeling, and fixing particular violations in a text

    Finding answers to questions, in text collections or web, in open domain or specialty domains

    Get PDF
    International audienceThis chapter is dedicated to factual question answering, i.e. extracting precise and exact answers to question given in natural language from texts. A question in natural language gives more information than a bag of word query (i.e. a query made of a list of words), and provides clues for finding precise answers. We will first focus on the presentation of the underlying problems mainly due to the existence of linguistic variations between questions and their answerable pieces of texts for selecting relevant passages and extracting reliable answers. We will first present how to answer factual question in open domain. We will also present answering questions in specialty domain as it requires dealing with semi-structured knowledge and specialized terminologies, and can lead to different applications, as information management in corporations for example. Searching answers on the Web constitutes another application frame and introduces specificities linked to Web redundancy or collaborative usage. Besides, the Web is also multilingual, and a challenging problem consists in searching answers in target language documents other than the source language of the question. For all these topics, we present main approaches and the remaining problems

    What Makes a Top-Performing Precision Medicine Search Engine? Tracing Main System Features in a Systematic Way

    Full text link
    From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performance. In order to overcome this explanatory gap, we first determined optimal feature configurations using the Sequential Model-based Algorithm Configuration (SMAC) program and applied its output to a BM25-based search engine. We then ran an ablation study to systematically assess the individual contributions of relevant system features: BM25 parameters, query type and weighting schema, query expansion, stop word filtering, and keyword boosting. For evaluation, we employed the gold standard data from the three TREC-PM installments to evaluate the effectiveness of different features using the commonly shared infNDCG metric.Comment: Accepted for SIGIR2020, 10 page

    Generating Aspect-oriented Multi-document Summarization with Event-Aspect Model

    Get PDF
    In this paper, we propose a novel approach to automatic generation of aspect-oriented summaries from multiple documents. We first develop an event-aspect LDA model to cluster sentences into aspects. We then use extended LexRank algorithm to rank the sentences in each cluster. We use Integer Linear Programming for sentence selection. Key features of our method include automatic grouping of semantically related sentences and sentence ranking based on extension of random walk model. Also, we implement a new sentence compression algorithm which use dependency tree instead of parser tree. We compare our method with four baseline methods. Quantitative evaluation based on Rouge metric demonstrates the effectiveness and advantages of our method.

    Characterization of a Li-6 loaded liquid organic scintillator for fast neutron spectrometry and thermal neutron detection

    Full text link
    The characterization of a liquid scintillator incorporating an aqueous solution of enriched lithium chloride to produce a scintillator with 0.40% Li-6 is presented, including the performance of the scintillator in terms of its optical properties and neutron response. The scintillator was incorporated into a fast neutron spectrometer, and the light output spectra from 2.5 MeV, 14.1 MeV, and Cf-252 neutrons were measured using capture-gated coincidence techniques. The spectrometer was operated without coincidence to perform thermal neutron measurements. Possible improvements in spectrometer performance are discussed.Comment: Submitted to Applied Radiation and Isotopes. 11 pages, 7 figures, 3 tables. Revision addresses reviewers' comment
    corecore