432,928 research outputs found

    Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

    Full text link
    In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table

    The Porter stemming algorithm: then and now

    Get PDF
    Purpose: In 1980, Porter presented a simple algorithm for stemming English language words. This paper summarises the main features of the algorithm, and highlights its role not just in modern information retrieval research, but also in a range of related subject domains. Design: Review of literature and research involving use of the Porter algorithm. Findings: The algorithm has been widely adopted and extended so that it has become the standard approach to word conflation for information retrieval in a wide range of languages. Value: The 1980 paper in Program by Porter describing his algorithm has been highly cited. This paper provides a context for the original paper as well as an overview of its subsequent use

    Stemming the Global Trade in Falsified and Substandard Medicines

    Get PDF
    Drug safety and quality is an essential assumption of clinical medicine, but there is growing concern that this assumption is not always correct. Poor manufacturing and deliberate fraud occasionally compromises the drug supply in the United States, and the problem is far more common and serious in low- and middle-income countries with weak drug regulatory systems. An Institute of Medicine consensus committee report identified the causes and possible solutions to the problem of falsified and substandard drugs around the world. The vocabulary people use to discuss the problem is itself a concern. The word counterfeit is often used innocuously to describe any drug that is not what it seems, but some NGOs and emerging manufacturing nations object to this term. These groups see hostility to generic pharmaceuticals in a discussion of counterfeit medicines. These groups see hostility to generic pharmaceuticals in a discussion of counterfeit medicines. Precisely speaking, a counterfeit drug infringes on a registered trademark, and trademark infringement in not necessarily a problem of public health consequence. Instead of talking broadly about counterfeit drugs, the WHO and other stakeholders should consider two main categories of drug quality problems. Falsified medicines misrepresent the product’s identity or source or both. Substandard drugs fail to meet the national specifications given in an accepted pharmacopeia or the manufacturer’s dossier. In practice, there is often considerable overlap between categories. There is considerable uncertainty about the size of the falsified and substandard drug market. Improved pharmacovigilance, especially in developing countries, give a better picture of the scope of the problem. In the United States, tighter regulatory controls on the wholesale market and a mandatory drug tracking system would improve drug safety. In developing countries, development finance organizations should invest in small- and medium-sized pharmaceutical manufacturers, and governments should use tools such as franchising, accreditation, low-interest loans, and task shifting to encourage private sector investment in drug retail. Finally, the WHO should work with stakeholders such as the UNODC and the WCO to develop an international code of practice on falsified and substandard drugs

    Stemming the Tide

    Get PDF
    Climate change has become one of the most significant and fastest growing threats to cultural heritage around the globe. Yet cultural heritage sites and collections also serve as invaluable sources of resilience for communities to address climate change. In March 2020, the Smithsonian American Art Museum and the Smithsonian’s National Collections Program convened the symposium “Stemming the Tide: Global Strategies for Sustaining Cultural Heritage through Climate Change” to empower cultural heritage authorities, managers, and advocates to pursue more ambitious engagement and collaborative approaches to climate change. Speakers explored six categories of cultural heritage identified by the International Council on Monuments and Sites (ICOMOS): Cultural Landscapes and Historic Urban Landscapes, Archaeological Sites, Built Heritage (Buildings and Structures), Cultural Communities, Intangible Cultural Heritage, and Museums and Collections.Publishe
    • …
    corecore