4,377 research outputs found

    Optimization issues in machine learning of coreference resolution

    Get PDF

    Towards Entity Status

    Get PDF
    Discourse entities are an important construct in computational linguistics. They introduce an additional level of representation between referring expressions and that which they refer to: the level of mental representation. In this thesis, I first explore some semiotic and communication theoretic aspects of discourse entities. Then, I develop the concept of "entity status". Entity status is a meta-variable that collects two dimensions formations about the role that an entity plays a discourse, and management informations about how the entity is created, accessed, and updated. Finally, the concept is applied to two case studies: the first one focusses on the choice of referring expressions in radio news, while the second looks at the conditions under which a discourse entity can be mentioned as a pronoun.Diskursentitäten sind ein wichtiger Konstrukt in der Computerlinguistik. Sie führen eine zusätzliche Repräsentationsebene ein zwischen referierenden Ausdrücken, und dem, auf das diese Ausdrücke referieren: die Ebene der mentalen Repräsentation. In dieser Dissertation erkunde ich zunächst einige semiotische und kommunikationstheoretische Aspekte von Diskursentitäten. Danach führe ich den Begriff des "Entitätenstatus" ein. Entitätenstatus ist eine Meta-Variable, die zwei Dimensionen von Information über eine Diskursentität vereinigt: Struktur-Informationen über die Rolle, die eine Entität im Diskurs spielt, und Verwaltungs-Informationen über Erstellung, Zugriff und Update. Dieser Begriff wird schlussendlich auf zwei Fallstudien angewendet: die erste Studie konzentriert sich auf die Wahl referierender Ausdrücke in Radionachrichten, während die zweite Studie die Bedingungen untersucht, in denen eine Diskursentität als Pronomen erwähnt werden kann

    Demonstrative anaphora: forms and functions in full-text scientific articles

    Get PDF
    This study examines the functions and characteristics of demonstrative anaphora (this, these, that, those) in a collection of full-text scientific documents, confirming that they play an important role in maintaining discourse focus and binding together cohesive sections of text. Unlike corpora in other subject domains, the Cystic Fibrosis database contains more demonstrative expressions than other class of anaphora. As participants in intersentential reference, demonstratives often refer to complex propositions rather than simple noun phrases. While this tendency complicates automated resolution, our results yield some suggestions toward a resolution algorithm. Primarily, we argue for the incorporation of demonstrative form since different types of demonstratives show different patterns regarding antecedent length and composition. Although further analysis is necessary, our findings provide a groundwork for future exploration

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Guidance on the principles of language accessibility in National Curriculum Assessments : research background

    Get PDF
    This review accompanies the document, which describes the principles which should guide the development of clear assessment questions. The purpose of the review is to present and discuss in detail the research underpinning these principles. It begins from the standpoint that National Curriculum assessments, indeed any assessments, should be: - appropriate to the age of the pupils - an effective measure of their abilities, skills and concept development - fair to all irrespective of gender, language, religion, ethnic or social origin or disability. (Ofqual, 2011) The Regulatory Framework for National Assessments: National Curriculum and Early Years Foundation Stage (Ofqual, 2011) sets out a number of common criteria which apply to all aspects of the development and implementation of National Assessments. One of these criteria refers to the need for assessment procedures to minimise bias: “The assessment should minimise bias, differentiating only on the basis of each learner’s ability to meet National Curriculum requirements” (Section 5.39, page 16). The Framework goes on to argue that: “Minimising bias is about ensuring that an assessment does not produce unreasonably adverse outcomes for particular groups of learners” (Annex 1, page 29). This criterion reinforces the guiding principle that any form of assessment should provide information about the knowledge and understanding of relevant content material. That is to say that the means through which this knowledge and understanding is examined, the design of the assessment and the language used should as far as possible be transparent, and should not influence adversely the performance of those being assessed. There is clearly a large number of ways in which any given assessment task can be presented and in which questions can be asked. Some of these ways will make the task more accessible – that is, easier to complete successfully – and some will get in the way of successful completion. Section 26 of the Fair Access by Design (Ofqual, 2010) document lists a number of guiding principles for improving the accessibility of assessment questions, although the research basis for these principles is not made completely clear in that document. The aim of the current review is to examine the research background more closely in order to provide a more substantial basis for a renewed set of principles to underpin the concept of language accessibility. In the review, each section will be prefaced by a statement of the principles outlined in Guidance on the Principles of Language Accessibility in National Curriculum Assessments and then the research evidence underpinning these principles will be reviewed

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

    The cross-cultural use of sample surveys: problems of comparability

    Full text link
    Der vorliegende Beitrag (zuerst 1968 erschienen) diskutiert die folgenden methodologischen und theoretischen Probleme der 'cross-cultural research': Der Wandel bei der Identifikation von Problembereichen; Fragen der Bedeutung und der verbalen Kommunikation; die Äquivalenz von Indikatoren; der Befragte als Einheit in design und Analyse; der Gebrauch des Kulturkonzepts im interkulturellen Vergleich; politische und verwaltungstechnische Probleme und einige soziale Auswirkungen dieses Ansatzes auf die vergleichende Sozialforschung. (pmb)'This article (first published in 1968) deals with the following problems of cross-cultural research: change in the identification of problems of cross-cultural research: change in the identification of problem areas; question meaning and problems of verbal communication; equivalence of indicators; the respondent as a unit in design and analysis; the usage of 'culture' in cross-cultural surveys; administrative and diplomatic problems; and some social effects of comparative social research.' (author's abstract
    corecore