1,967 research outputs found

    Text Summarization Across High and Low-Resource Settings

    Get PDF
    Natural language processing aims to build automated systems that can both understand and generate natural language textual data. As the amount of textual data available online has increased exponentially, so has the need for intelligence systems to comprehend and present it to the world. As a result, automatic text summarization, the process by which a text\u27s salient content is automatically distilled into a concise form, has become a necessary tool. Automatic text summarization approaches and applications vary based on the input summarized, which may constitute single or multiple documents of different genres. Furthermore, the desired output style may consist of a sentence or sub-sentential units chosen directly from the input in extractive summarization or a fusion and paraphrase of the input document in abstractive summarization. Despite differences in the above use-cases, specific themes, such as the role of large-scale data for training these models, the application of summarization models in real-world scenarios, and the need for adequately evaluating and comparing summaries, are common across these settings. This dissertation presents novel data and modeling techniques for deep neural network-based summarization models trained across high-resource (thousands of supervised training examples) and low-resource (zero to hundreds of supervised training examples) data settings and a comprehensive evaluation of the model and metric progress in the field. We examine both Recurrent Neural Network (RNN)-based and Transformer-based models to extract and generate summaries from the input. To facilitate the training of large-scale networks, we introduce datasets applicable for multi-document summarization (MDS) for pedagogical applications and for news summarization. While the high-resource settings allow models to advance state-of-the-art performance, the failure of such models to adapt to settings outside of that in which it was initially trained requires smarter use of labeled data and motivates work in low-resource summarization. To this end, we propose unsupervised learning techniques for both extractive summarization in question answering, abstractive summarization on distantly-supervised data for summarization of community question answering forums, and abstractive zero and few-shot summarization across several domains. To measure the progress made along these axes, we revisit the evaluation of current summarization models. In particular, this dissertation addresses the following research objectives: 1) High-resource Summarization. We introduce datasets for multi-document summarization, focusing on pedagogical applications for NLP, news summarization, and Wikipedia topic summarization. Large-scale datasets allow models to achieve state-of-the-art performance on these tasks compared to prior modeling techniques, and we introduce a novel model to reduce redundancy. However, we also examine how models trained on these large-scale datasets fare when applied to new settings, showing the need for more generalizable models. 2) Low-resource Summarization. While high-resource summarization improves model performance, for practical applications, data-efficient models are necessary. We propose a pipeline for creating synthetic training data for training extractive question-answering models, a form of query-based extractive summarization with short-phrase summaries. In other work, we propose an automatic pipeline for training a multi-document summarizer in answer summarization on community question-answering forums without labeled data. Finally, we push the boundaries of abstractive summarization model performance when little or no training data is available across several domains. 3) Automatic Summarization Evaluation. To understand the extent of progress made across recent modeling techniques and better understand the current evaluation protocols, we examine the current metrics used to compare summarization output quality across 12 metrics across 23 deep neural network models and propose better-motivated summarization evaluation guidelines as well as point to open problems in summarization evaluation

    Bose-Einstein Correlations of Three Charged Pions in Hadronic Z^0 Decays

    Get PDF
    Bose-Einstein Correlations (BEC) of three identical charged pions were studied in 4 x 10^6 hadronic Z^0 decays recorded with the OPAL detector at LEP. The genuine three-pion correlations, corrected for the Coulomb effect, were separated from the known two-pion correlations by a new subtraction procedure. A significant genuine three-pion BEC enhancement near threshold was observed having an emitter source radius of r_3 = 0.580 +/- 0.004 (stat.) +/- 0.029 (syst.) fm and a strength of \lambda_3 = 0.504 +/- 0.010 (stat.) +/- 0.041 (syst.). The Coulomb correction was found to increase the \lambda_3 value by \~9% and to reduce r_3 by ~6%. The measured \lambda_3 corresponds to a value of 0.707 +/- 0.014 (stat.) +/- 0.078 (syst.) when one takes into account the three-pion sample purity. A relation between the two-pion and the three-pion source parameters is discussed.Comment: 19 pages, LaTeX, 5 eps figures included, accepted by Eur. Phys. J.

    Search for the standard model Higgs boson at LEP

    Get PDF

    Systems medicine and integrated care to combat chronic noncommunicable diseases

    Get PDF
    We propose an innovative, integrated, cost-effective health system to combat major non-communicable diseases (NCDs), including cardiovascular, chronic respiratory, metabolic, rheumatologic and neurologic disorders and cancers, which together are the predominant health problem of the 21st century. This proposed holistic strategy involves comprehensive patient-centered integrated care and multi-scale, multi-modal and multi-level systems approaches to tackle NCDs as a common group of diseases. Rather than studying each disease individually, it will take into account their intertwined gene-environment, socio-economic interactions and co-morbidities that lead to individual-specific complex phenotypes. It will implement a road map for predictive, preventive, personalized and participatory (P4) medicine based on a robust and extensive knowledge management infrastructure that contains individual patient information. It will be supported by strategic partnerships involving all stakeholders, including general practitioners associated with patient-centered care. This systems medicine strategy, which will take a holistic approach to disease, is designed to allow the results to be used globally, taking into account the needs and specificities of local economies and health systems

    The Dark Energy Survey : more than dark energy – an overview

    Get PDF
    This overview paper describes the legacy prospect and discovery potential of the Dark Energy Survey (DES) beyond cosmological studies, illustrating it with examples from the DES early data. DES is using a wide-field camera (DECam) on the 4 m Blanco Telescope in Chile to image 5000 sq deg of the sky in five filters (grizY). By its completion, the survey is expected to have generated a catalogue of 300 million galaxies with photometric redshifts and 100 million stars. In addition, a time-domain survey search over 27 sq deg is expected to yield a sample of thousands of Type Ia supernovae and other transients. The main goals of DES are to characterize dark energy and dark matter, and to test alternative models of gravity; these goals will be pursued by studying large-scale structure, cluster counts, weak gravitational lensing and Type Ia supernovae. However, DES also provides a rich data set which allows us to study many other aspects of astrophysics. In this paper, we focus on additional science with DES, emphasizing areas where the survey makes a difference with respect to other current surveys. The paper illustrates, using early data (from ‘Science Verification’, and from the first, second and third seasons of observations), what DES can tell us about the Solar system, the Milky Way, galaxy evolution, quasars and other topics. In addition, we show that if the cosmological model is assumed to be +cold dark matter, then important astrophysics can be deduced from the primary DES probes. Highlights from DES early data include the discovery of 34 trans-Neptunian objects, 17 dwarf satellites of the Milky Way, one published z > 6 quasar (and more confirmed) and two published superluminous supernovae (and more confirmed)

    Differential cross section measurements for the production of a W boson in association with jets in proton–proton collisions at √s = 7 TeV

    Get PDF
    Measurements are reported of differential cross sections for the production of a W boson, which decays into a muon and a neutrino, in association with jets, as a function of several variables, including the transverse momenta (pT) and pseudorapidities of the four leading jets, the scalar sum of jet transverse momenta (HT), and the difference in azimuthal angle between the directions of each jet and the muon. The data sample of pp collisions at a centre-of-mass energy of 7 TeV was collected with the CMS detector at the LHC and corresponds to an integrated luminosity of 5.0 fb[superscript −1]. The measured cross sections are compared to predictions from Monte Carlo generators, MadGraph + pythia and sherpa, and to next-to-leading-order calculations from BlackHat + sherpa. The differential cross sections are found to be in agreement with the predictions, apart from the pT distributions of the leading jets at high pT values, the distributions of the HT at high-HT and low jet multiplicity, and the distribution of the difference in azimuthal angle between the leading jet and the muon at low values.United States. Dept. of EnergyNational Science Foundation (U.S.)Alfred P. Sloan Foundatio

    Measurement of the Production Rate of Charm Quark Pairs from Gluons in Hadronic Z0Z^{0} Decays

    Get PDF
    The rate of secondary charm-quark-pair production has been measured in 4.4 million hadronic Z0 decays collected by OPAL. By selecting events with three jets and tagging charmed hadrons in the gluon jet candidate using leptons and charged D* mesons, the average number of secondary charm-quark pairs per hadronic event is found to be (3.20+-0.21+-0.38)x10-2
    • 

    corecore