624 research outputs found

    'Healthy' Coreference: Applying Coreference Resolution to the Health Education Domain

    Get PDF
    This thesis investigates coreference and its resolution within the domain of health education. Coreference is the relationship between two linguistic expressions that refer to the same real-world entity, and resolution involves identifying this relationship among sets of referring expressions. The coreference resolution task is considered among the most difficult of problems in Artificial Intelligence; in some cases, resolution is impossible even for humans. For example, "she" in the sentence "Lynn called Jennifer while she was on vacation" is genuinely ambiguous: the vacationer could be either Lynn or Jennifer. There are three primary motivations for this thesis. The first is that health education has never before been studied in this context. So far, the vast majority of coreference research has focused on news. Secondly, achieving domain-independent resolution is unlikely without understanding the extent to which coreference varies across different genres. Finally, coreference pervades language and is an essential part of coherent discourse. Its effective use is a key component of easy-to-understand health education materials, where readability is paramount. No suitable corpus of health education materials existed, so our first step was to create one. The comprehensive analysis of this corpus, which required manual annotation of coreference, confirmed our hypothesis that the coreference used in health education differs substantially from that in previously studied domains. This analysis was then used to shape the design of a knowledge-lean algorithm for resolving coreference. This algorithm performed surprisingly well on this corpus, e.g., successfully resolving over 85% of all pronouns when evaluated on unseen data. Despite the importance of coreferentially annotated corpora, only a handful are known to exist, likely because of the difficulty and cost of reliably annotating coreference. The paucity of genres represented in these existing annotated corpora creates an implicit bias in domain-independent coreference resolution. In an effort to address these issues, we plan to make our health education corpus available to the wider research community, hopefully encouraging a broader focus in the future

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

    Anaphora Resolution and Text Retrieval

    Get PDF
    Empirical approaches based on qualitative or quantitative methods of corpus linguistics have become a central paradigm within linguistics. The series takes account of this fact and provides a platform for approaches within synchronous linguistics as well as interdisciplinary works with a linguistic focus which devise new ways of working empirically and develop new data-based methods and theoretical models for empirical linguistic analyses

    Anaphora Resolution and Text Retrieval

    Get PDF
    Empirical approaches based on qualitative or quantitative methods of corpus linguistics have become a central paradigm within linguistics. The series takes account of this fact and provides a platform for approaches within synchronous linguistics as well as interdisciplinary works with a linguistic focus which devise new ways of working empirically and develop new data-based methods and theoretical models for empirical linguistic analyses

    Reflexives and tree unification grammar

    Get PDF

    Critical Discourse Analysis and Rhetorical Tropes in Donald Trump’s First Speech to the UN

    Get PDF
    Language and politics go hand in hand and learning and comprehending political genre is to learn a language created for codifying, extending and transmitting political discourse in any text/talk. Drawing upon the theoretical framework of Fairclough’s CDA and Rhetoric, the current study aims at investigating Donald Trump’s First Speech, from the point of frequency and functions of some rhetorical strategies (Parallelism, Anaphora and the Power of Three, Antithesis and Expletive, etc.), Nominalization, Passivization, We-groups and Modality as well as Lexical and Textual Analysis, presented to the UN delivered on Sep. 19, 2017. Specifically, the study seeks to determine: (1) how President Trump succeeded in conveying his notions and assumptions to his intended audience, and in convincing and negotiating, (2) how he attempted to explicitly and implicitly pass his attitudes on his targets, and (3) how those orientations, intended notions and assumptions were seamlessly presented to his addressees in discoursal and lexico-grammatical levels; (4) and finally in this underlying trend how he achieved his own ends. The results of the study hope to enhance reading comprehension and writing in academic registers for EFL/ESL students

    PersoNER: Persian named-entity recognition

    Full text link
    © 1963-2018 ACL. Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network

    America\u27s Quandary-- Masking Injustice: Ideological Analyses of America\u27s Moves Towards its Promise | A Pedagogical Primer on Rhetoric

    Get PDF
    AMERICA’S QUANDARY— MASKING INJUSTICE: IDEOLOGICAL ANALYSES OF AMERICA’S MOVES TOWARDS ITS PROMISE | A PEDAGOGICAL PRIMER ON RHETORIC by RONALD JERRY WALKER Under the Direction of George Pullman, PhD, and Elizabeth Sanders Lopez, PhD ABSTRACT Rhetoric, persuasive discourse, and rhetorical analysis, art and science of rhetorical text scrutiny, are invaluable aspects of composition pedagogy. Rhetoric commands our world. This dissertation project manifests four features. First, it reveals America’s promise through rhetorical artifact texts. Second, the project presents an academic investigation—America’s moves towards its promise. Third, it recounts the continuing injustices suffered by women and peoples of color, all hidden behind (rhetorical) masks, that continue to plague America. And fourth, this collection altogether serves as a pedagogical primer on rhetoric. Founding documents and public monuments serve as a ruse that masks injustice and inequality. This is America’s quandary, a reality that unfortunately escapes journalistic focus. Masks enable the American hegemony of sexism, racism, homophobia, xenophobia, and classism to thrive. This nation’s affluent “founders” completely ignored and erased the vast majority of peoples co-inhabiting our boundless lands—millions of indigenous Native American nations; women, not mentioned anywhere among America’s founding documents; abducted African (forced) laborers commanded to toil generational lives, as chattel; and poor whites. Later immigrants, especially peoples of color, would also be denied their “liberty and justice for all.” Today, the USA is World Number One—in obesity, in opioid addiction, in female prison incarceration, in male prison incarceration, in military defense spending, in military weaponry (international spread), and in war (today Afghanistan, Syria, Iraq, Yemen, Somalia, Libya, Niger). And in international intrigue (arrogance). America’s quandary is our lack of universal healthcare, the antiquated Electoral College, the rising 1%, the shrinking middle class, the widening gap between America’s rich and poor, and our obsessive desire to police and command the world. This project interrogates rhetoric “to unmask and demystify” America’s rhetorical hegemony of disadvantage and inequality; while America faces a bleak future. We English teachers are First Responders for our culture. Our democratic republic must have—to survive—a committed populace of engaged citizens whose critical thinking, analytical reasoning, and civic responsibility can invigorate American culture. If We are successful, America will be successful; if We fail, America will fail. Perhaps if We could just make America better, all would work out. INDEX WORDS: Rhetorical analysis, Rhetorical devices, Critical thinking, Argument essa
    corecore