8,321 research outputs found

    Extracting and Attributing Quotes in Text and Assessing them as Opinions

    Get PDF
    News articles often report on the opinions that salient people have about important issues. While it is possible to infer an opinion from a person's actions, it is much more common to demonstrate that a person holds an opinion by reporting on what they have said. These instances of speech are called reported speech, and in this thesis we set out to detect instances of reported speech, attribute them to their speaker, and to identify which instances provide evidence of an opinion. We first focus on extracting reported speech, which involves finding all acts of communication that are reported in an article. Previous work has approached this task with rule-based methods, however there are several factors that confound these approaches. To demonstrate this, we build a corpus of 965 news articles, where we mark all instances of speech. We then show that a supervised token-based approach outperforms all of our rule-based alternatives, even in extracting direct quotes. Next, we examine the problem of finding the speaker of each quote. For this task we annotate the same 965 news articles with links from each quote to its speaker. Using this, and three other corpora, we develop new methods and features for quote attribution, which achieve state-of-the-art accuracy on our corpus and strong results on the others. Having extracted quotes and determined who spoke them, we move on to the opinion mining part of our work. Most of the task definitions in opinion mining do not easily work with opinions in news, so we define a new task, where the aim is to classify whether quotes demonstrate support, neutrality, or opposition to a given position statement. This formulation improved annotator agreement when compared to our earlier annotation schemes. Using this we build an opinion corpus of 700 news documents covering 7 topics. In this thesis we do not attempt this full task, but we do present preliminary results

    Towards Reversible Sessions

    Full text link
    In this work, we incorporate reversibility into structured communication-based programming, to allow parties of a session to automatically undo, in a rollback fashion, the effect of previously executed interactions. This permits taking different computation paths along the same session, as well as reverting the whole session and starting a new one. Our aim is to define a theoretical basis for examining the interplay in concurrent systems between reversible computation and session-based interaction. We thus enrich a session-based variant of pi-calculus with memory devices, dedicated to keep track of the computation history of sessions in order to reverse it. We discuss our initial investigation concerning the definition of a session type discipline for the proposed reversible calculus, and its practical advantages for static verification of safe composition in communication-centric distributed software performing reversible computations.Comment: In Proceedings PLACES 2014, arXiv:1406.331

    Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

    Full text link
    We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our approach follows a fully unsupervised bootstrapping paradigm. It leverages the redundancy present in large news corpora, more precisely, the fact that the same quotation often appears across multiple news articles in slightly different contexts. Starting from a few seed patterns, such as ["Q", said S.], our method extracts a set of quotation-speaker pairs (Q, S), which are in turn used for discovering new patterns expressing the same quotations; the process is then repeated with the larger pattern set. Our algorithm is highly scalable, which we demonstrate by running it on the large ICWSM 2011 Spinn3r corpus. Validating our results against a crowdsourced ground truth, we obtain 90% precision at 40% recall using a single seed pattern, with significantly higher recall values for more frequently reported (and thus likely more interesting) quotations. Finally, we showcase the usefulness of our algorithm's output for computational social science by analyzing the sentiment expressed in our extracted quotations.Comment: Accepted at the 12th International Conference on Web and Social Media (ICWSM), 201

    CofeNet: Context and Former-Label Enhanced Net for Complicated Quotation Extraction

    Full text link
    Quotation extraction aims to extract quotations from written text. There are three components in a quotation: source refers to the holder of the quotation, cue is the trigger word(s), and content is the main body. Existing solutions for quotation extraction mainly utilize rule-based approaches and sequence labeling models. While rule-based approaches often lead to low recalls, sequence labeling models cannot well handle quotations with complicated structures. In this paper, we propose the Context and Former-Label Enhanced Net (CofeNet) for quotation extraction. CofeNet is able to extract complicated quotations with components of variable lengths and complicated structures. On two public datasets (i.e., PolNeAR and Riqua) and one proprietary dataset (i.e., PoliticsZH), we show that our CofeNet achieves state-of-the-art performance on complicated quotation extraction.Comment: Accepted by COLING 202

    FRACAS: A FRench Annotated Corpus of Attribution relations in newS

    Full text link
    Quotation extraction is a widely useful task both from a sociological and from a Natural Language Processing perspective. However, very little data is available to study this task in languages other than English. In this paper, we present a manually annotated corpus of 1676 newswire texts in French for quotation extraction and source attribution. We first describe the composition of our corpus and the choices that were made in selecting the data. We then detail the annotation guidelines and annotation process, as well as a few statistics about the final corpus and the obtained balance between quote types (direct, indirect and mixed, which are particularly challenging). We end by detailing our inter-annotator agreement between the 8 annotators who worked on manual labelling, which is substantially high for such a difficult linguistic phenomenon

    A Graphical Approach to Progress for Structured Communication in Web Services

    Full text link
    We investigate a graphical representation of session invocation interdependency in order to prove progress for the pi-calculus with sessions under the usual session typing discipline. We show that those processes whose associated dependency graph is acyclic can be brought to reduce. We call such processes transparent processes. Additionally, we prove that for well-typed processes where services contain no free names, such acyclicity is preserved by the reduction semantics. Our results encompass programs (processes containing neither free nor restricted session channels) and higher-order sessions (delegation). Furthermore, we give examples suggesting that transparent processes constitute a large enough class of processes with progress to have applications in modern session-based programming languages for web services.Comment: In Proceedings ICE 2010, arXiv:1010.530

    Integrated electronic prescribing and robotic dispensing: a case study

    Get PDF
    INTRODUCTION: To quantify the benefits of electronic prescribing directly linked to a robotic dispensing machine. CASE DESCRIPTION: Quantitative case study analysis is used on a single case. Hospital A (1,000 beds) has used an integrated electronic prescribing system for 10 years, and in 2009 linked two robotic dispensing machines to the system. The impact on dispensing error rates (quality) and efficiency (costs) were assessed. EVALUATION AND DISCUSSION: The implementation delivered staff efficiencies above expectation. For the out-patient department, this was 16% more than the business case had suggested. For the in-patients dispensary, four staff were released for re-deployment. Additionally, £500,000 in stockholding efficiency above that suggested by the business case was identified. Overall dispensing error rates were not adversely affected and products dispensed by the electronic prescribing - robot system produced zero dispensing errors. The speed of dispensing increased also, as the electronic prescribing - robot combination permitted almost instantaneous dispensing from the point of a doctor entering a prescription. CONCLUSION: It was significant that the combination of electronic prescribing and a robot eliminated dispensing errors. Any errors that did occur were not as a result of the electronic prescribing - robotic system (i.e. the product was not stocked within the robot). The direct linking of electronic prescribing and robots as a dispensing system together produces efficiencies and improves the quality of the dispensing process

    Terminomics methodologies and the completeness of reductive dimethylation: A meta-analysis of publicly available datasets

    Full text link
    © 2019 by the authors. Methods for analyzing the terminal sequences of proteins have been refined over the previous decade; however, few studies have evaluated the quality of the data that have been produced from those methodologies. While performing global N-terminal labelling on bacteria, we observed that the labelling was not complete and investigated whether this was a common occurrence. We assessed the completeness of labelling in a selection of existing, publicly available N-terminomics datasets and empirically determined that amine-based labelling chemistry does not achieve complete labelling and potentially has issues with labelling amine groups at sequence-specific residues. This finding led us to conduct a thorough review of the historical literature that showed that this is not an unexpected finding, with numerous publications reporting incomplete labelling. These findings have implications for the quantitation of N-terminal peptides and the biological interpretations of these data

    A case study

    Get PDF
    UIDB/03213/2020 UIDP/03213/2020The modelling and encoding of polylexical units, i.e. recurrent sequences of lexemes that are perceived as independent lexical units, is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI), a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper, we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0, an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as –like constructs, whereas the latter becomes –like constructs, which can have further constraints imposed on them (sense numbers, domain labels, grammatical labels etc.). We codify the use of attributes on to encode different kinds of labels for polylexicals (implicit, explicit and normalised), concluding that the interoperability of lexical resources would be significantly improved if dictionary encoders would have access to an expressive but relatively simple typology of polylexical units.publishersversionpublishe
    corecore