35,815 research outputs found

    Dublin City University at QA@CLEF 2008

    Get PDF
    We describe our participation in Multilingual Question Answering at CLEF 2008 using German and English as our source and target languages respectively. The system was built using UIMA (Unstructured Information Management Architecture) as underlying framework

    Enhanced services for targeted information retrieval by event extraction and data mining

    Get PDF
    Where Information Retrieval (IR) and Text Categorization delivers a set of (ranked) documents according to a query, users of large document collections would rather like to receive answers. Question-answering from text has already been the goal of the Message Understanding Conferences. Since then, the task of text understanding has been reduced to several more tractable tasks, most prominently Named Entity Recognition (NER) and Relation Extraction. Now, pieces can be put together to form enhanced services added on an IR system. In this paper, we present a framework which combines standard IR with machine learning and (pre-)processing for NER in order to extract events from a large document collection. Some questions can already be answered by particular events. Other questions require an analysis of a set of events. Hence, the extracted events become input to another machine learning process which delivers the final output to the user's question. Our case study is the public collection of minutes of plenary sessions of the German parliament and of petitions to the German parliament. --

    Normalized Information Distance

    Get PDF
    The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.Comment: 33 pages, 12 figures, pdf, in: Normalized information distance, in: Information Theory and Statistical Learning, Eds. M. Dehmer, F. Emmert-Streib, Springer-Verlag, New-York, To appea

    Argumentation Mining in User-Generated Web Discourse

    Full text link
    The goal of argumentation mining, an evolving research field in computational linguistics, is to design methods capable of analyzing people's argumentation. In this article, we go beyond the state of the art in several ways. (i) We deal with actual Web data and take up the challenges given by the variety of registers, multiple domains, and unrestricted noisy user-generated Web discourse. (ii) We bridge the gap between normative argumentation theories and argumentation phenomena encountered in actual data by adapting an argumentation model tested in an extensive annotation study. (iii) We create a new gold standard corpus (90k tokens in 340 documents) and experiment with several machine learning methods to identify argument components. We offer the data, source codes, and annotation guidelines to the community under free licenses. Our findings show that argumentation mining in user-generated Web discourse is a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17

    New Methods in Human Subjects Research: Do We Need a New Ethics?

    Get PDF
    Online surveys and interviews, the observations of chat rooms or online games, data mining, knowledge discovery in databases (KDD), collecting biomarkers, employing biometrics, using RFID technology - even as implants in the human body - and other related processes all seem to be more promising, cheaper, faster, and comprehensive than conventional methods of human subjects research. But at the same time these new means of gathering information may pose powerful threats to privacy, autonomy, and informed consent. Online research, particularly involving children and minors but also other vulnerable groups such as ethnic or religious minorities, is in urgent need of an adequate research ethics that can provide reasonable and morally justified constraints for human subjects research. The paper at hand seeks to provide some clarification of these new means of information gathering and the challenges they present to moral concepts like -privacy, autonomy, informed consent, beneficence, and justice. Some existing codes of conduct and ethical guidelines are examined to determine whether they provide answers to those challenges and/or whether they can be helpful in the development of principles and regulations governing human subjects research. Finally, some conclusions and recommendations are presented that can help in the ask of formulating an adequate research ethics for human subjects research.Human Subjects Research, Online Research, Biomarkers, Biometrics, Autonomy, Privacy, Informed Consent, Research Ethics
    • 

    corecore