26,458 research outputs found

    A Formal Framework for Linguistic Annotation

    Get PDF
    `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.Comment: 49 page

    IMAGINE Final Report

    No full text

    A Topic-Agnostic Approach for Identifying Fake News Pages

    Full text link
    Fake news and misinformation have been increasingly used to manipulate popular opinion and influence political processes. To better understand fake news, how they are propagated, and how to counter their effect, it is necessary to first identify them. Recently, approaches have been proposed to automatically classify articles as fake based on their content. An important challenge for these approaches comes from the dynamic nature of news: as new political events are covered, topics and discourse constantly change and thus, a classifier trained using content from articles published at a given time is likely to become ineffective in the future. To address this challenge, we propose a topic-agnostic (TAG) classification strategy that uses linguistic and web-markup features to identify fake news pages. We report experimental results using multiple data sets which show that our approach attains high accuracy in the identification of fake news, even as topics evolve over time.Comment: Accepted for publication in the Companion Proceedings of the 2019 World Wide Web Conference (WWW'19 Companion). Presented in the 2019 International Workshop on Misinformation, Computational Fact-Checking and Credible Web (MisinfoWorkshop2019). 6 page

    Natural language processing and advanced information management

    Get PDF
    Integrating diverse information sources and application software in a principled and general manner will require a very capable advanced information management (AIM) system. In particular, such a system will need a comprehensive addressing scheme to locate the material in its docuverse. It will also need a natural language processing (NLP) system of great sophistication. It seems that the NLP system must serve three functions. First, it provides an natural language interface (NLI) for the users. Second, it serves as the core component that understands and makes use of the real-world interpretations (RWIs) contained in the docuverse. Third, it enables the reasoning specialists (RSs) to arrive at conclusions that can be transformed into procedures that will satisfy the users' requests. The best candidate for an intelligent agent that can satisfactorily make use of RSs and transform documents (TDs) appears to be an object oriented data base (OODB). OODBs have, apparently, an inherent capacity to use the large numbers of RSs and TDs that will be required by an AIM system and an inherent capacity to use them in an effective way

    Handbook of Easy Languages in Europe

    Get PDF
    The Handbook of Easy Languages in Europe describes what Easy Language is and how it is used in European countries. It demonstrates the great diversity of actors, instruments and outcomes related to Easy Language throughout Europe. All people, despite their limitations, have an equal right to information, inclusion, and social participation. This results in requirements for understandable language. The notion of Easy Language refers to modified forms of standard languages that aim to facilitate reading and language comprehension. This handbook describes the historical background, the principles and the practices of Easy Language in 21 European countries. Its topics include terminological definitions, legal status, stakeholders, target groups, guidelines, practical outcomes, education, research, and a reflection on future perspectives related to Easy Language in each country. Written in an academic yet interesting and understandable style, this Handbook of Easy Languages in Europe aims to find a wide audience

    EFL teaching through english-practice work stations (EPWS) to enhance participation and interaction in english for third grade learners

    Get PDF
    Tesis (Pedagogía en Inglés)Teachers all over the world are constantly searching for new activities, new strategies and methodologies that can help them make their lessons more satisfying for their learners and making it possible to enhance their learning process as well as their results. This has become a titanic effort for those teachers who do not count on the amount of time necessary for their lesson planning and to design their activities. But there are numerous Internet websites or teachers on social networks that are willing to give and share ideas. While doing the previous research for this thesis the authors came across with Debbie Diller’s book “Literacy Work Stations: Making Centers Work” (2003), where she explains how Literacy Work Stations (henceforth, LWS) work. After reading vast information about this method, there are several aspects presented that are very similar to the one seen at pre-elementary school by two of the authors were this methodology consisted of working in stations during short periods and rotating between them, so learners can work on different subjects. The researchers think that this kind of methodology is the one that generated faster and deeper development of the four skills of the English language for the two of them. And so the main purpose of this research is to foresee its usefulness and to propose an innovative strategy to enhance participation and interaction, not only in Spanish but mainly in English, inside an English as a Foreign Language (henceforth, EFL) lesson. The topic of this research is related to the implementation of a different strategy in elementary grades called English-Practice Work Stations to enhance participation and interaction during EFL lessons

    All mixed up? Finding the optimal feature set for general readability prediction and its application to English and Dutch

    Get PDF
    Readability research has a long and rich tradition, but there has been too little focus on general readability prediction without targeting a specific audience or text genre. Moreover, though NLP-inspired research has focused on adding more complex readability features there is still no consensus on which features contribute most to the prediction. In this article, we investigate in close detail the feasibility of constructing a readability prediction system for English and Dutch generic text using supervised machine learning. Based on readability assessments by both experts and a crowd, we implement different types of text characteristics ranging from easy-to-compute superficial text characteristics to features requiring a deep linguistic processing, resulting in ten different feature groups. Both a regression and classification setup are investigated reflecting the two possible readability prediction tasks: scoring individual texts or comparing two texts. We show that going beyond correlation calculations for readability optimization using a wrapper-based genetic algorithm optimization approach is a promising task which provides considerable insights in which feature combinations contribute to the overall readability prediction. Since we also have gold standard information available for those features requiring deep processing we are able to investigate the true upper bound of our Dutch system. Interestingly, we will observe that the performance of our fully-automatic readability prediction pipeline is on par with the pipeline using golden deep syntactic and semantic information
    corecore