32,911 research outputs found

    Linguistics and LIS: A Research Agenda

    Get PDF
    Linguistics and Library and Information Science (LIS) are both interdisciplinary fields that draws from areas such as languages, psychology, sociology, cognitive science, computer science, anthropology, education, and management. The theories and methods of linguistic research can have significant explanatory power for LIS. This article presents a research agenda for LIS that proposes the use of linguistic analysis methods, including discourse analysis, typology, and genre theory

    A role for genre-based pedagogy in academic writing instruction?: an EAP perspective

    Get PDF
    In this paper I discuss the use of genre as a theoretical construct in academic writing instruction in the context of English for Academic Purposes (EAP) courses. I begin by considering the notion of discourse competence as a concept that accounts for the knowledge elements and skills employed by expert academic writers, and then consider genre as a way of operationalizing the different elements of discourse competence knowledge for the purpose of writing instruction. I review briefly the diversity of approaches to theorizing genre knowledge, and then present the dual social genre/cognitive genre approach that I have used as a basis for research and course design in an EAP context. I exemplify this model by summarizing the key elements of two studies of research genres in which I have used this model. I conclude with a brief theoretical discussion of the issue of construct validity in relation to using the concept of genre in research that relates to writing instruction

    Conventions and mutual expectations — understanding sources for web genres

    Get PDF
    Genres can be understood in many different ways. They are often perceived as a primarily sociological construction, or, alternatively, as a stylostatistically observable objective characteristic of texts. The latter view is more common in the research field of information and language technology. These two views can be quite compatible and can inform each other; this present investigation discusses knowledge sources for studying genre variation and change by observing reader and author behaviour rather than performing analyses on the information objects themselves

    SLIS Student Research Journal, Vol.7, Iss.1

    Get PDF

    A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

    Full text link
    This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest corpora available for the task of NLI, at 433k examples, this corpus improves upon available resources in its coverage: it offers data from ten distinct genres of written and spoken English--making it possible to evaluate systems on nearly the full complexity of the language--and it offers an explicit setting for the evaluation of cross-genre domain adaptation.Comment: 10 pages, 1 figures, 5 tables. v2 corrects a misreported accuracy number for the CBOW model in the 'matched' setting. v3 adds a discussion of the difficulty of the corpus to the analysis section. v4 is the version that was accepted to NAACL201

    Nations in news: ordinary stereotypes in national TV news coverage of Spain and Germany

    Get PDF
    This contribution investigates the stereotyping of nations in TV news text. It compares the headline appearances of the names Germany and Spain on each other‟s leading national evening TV news program during the peak of the European financial crisis (2011-13). The paper combines quantitative analysis of word-frequency and topic-distribution in a 621 headline-corpus, with in-depth case analysis of news values underpinning 32 extracted headline examples. A discussion of literature in media anthropology and Critical Discourse Analysis concludes with the argument that intentions and consequences of media discourse should be separated, whereas differences between ordinary and official language should not be overvalued. The case study shows how the textual display of Germans and Spaniards supports the everyday imagining of national belonging, how othering works through the labelling of nations as “economies”, and how negativity, competition and relatedness are prevailing values underlying the examined news headlines

    All mixed up? Finding the optimal feature set for general readability prediction and its application to English and Dutch

    Get PDF
    Readability research has a long and rich tradition, but there has been too little focus on general readability prediction without targeting a specific audience or text genre. Moreover, though NLP-inspired research has focused on adding more complex readability features there is still no consensus on which features contribute most to the prediction. In this article, we investigate in close detail the feasibility of constructing a readability prediction system for English and Dutch generic text using supervised machine learning. Based on readability assessments by both experts and a crowd, we implement different types of text characteristics ranging from easy-to-compute superficial text characteristics to features requiring a deep linguistic processing, resulting in ten different feature groups. Both a regression and classification setup are investigated reflecting the two possible readability prediction tasks: scoring individual texts or comparing two texts. We show that going beyond correlation calculations for readability optimization using a wrapper-based genetic algorithm optimization approach is a promising task which provides considerable insights in which feature combinations contribute to the overall readability prediction. Since we also have gold standard information available for those features requiring deep processing we are able to investigate the true upper bound of our Dutch system. Interestingly, we will observe that the performance of our fully-automatic readability prediction pipeline is on par with the pipeline using golden deep syntactic and semantic information
    corecore