35,179 research outputs found

    Multilayer Network of Language: a Unified Framework for Structural Analysis of Linguistic Subsystems

    Get PDF
    Recently, the focus of complex networks research has shifted from the analysis of isolated properties of a system toward a more realistic modeling of multiple phenomena - multilayer networks. Motivated by the prosperity of multilayer approach in social, transport or trade systems, we propose the introduction of multilayer networks for language. The multilayer network of language is a unified framework for modeling linguistic subsystems and their structural properties enabling the exploration of their mutual interactions. Various aspects of natural language systems can be represented as complex networks, whose vertices depict linguistic units, while links model their relations. The multilayer network of language is defined by three aspects: the network construction principle, the linguistic subsystem and the language of interest. More precisely, we construct a word-level (syntax, co-occurrence and its shuffled counterpart) and a subword level (syllables and graphemes) network layers, from five variations of original text (in the modeled language). The obtained results suggest that there are substantial differences between the networks structures of different language subsystems, which are hidden during the exploration of an isolated layer. The word-level layers share structural properties regardless of the language (e.g. Croatian or English), while the syllabic subword level expresses more language dependent structural properties. The preserved weighted overlap quantifies the similarity of word-level layers in weighted and directed networks. Moreover, the analysis of motifs reveals a close topological structure of the syntactic and syllabic layers for both languages. The findings corroborate that the multilayer network framework is a powerful, consistent and systematic approach to model several linguistic subsystems simultaneously and hence to provide a more unified view on language

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Generating readable texts for readers with low basic skills

    Get PDF
    Most NLG systems generate texts for readers with good reading ability, but SkillSum adapts its output for readers with poor literacy. Evaluation with lowskilled readers confirms that SkillSum's knowledge-based microplanning choices enhance readability. We also discuss future readability improvements

    Automatic Detection of Online Jihadist Hate Speech

    Full text link
    We have developed a system that automatically detects online jihadist hate speech with over 80% accuracy, by using techniques from Natural Language Processing and Machine Learning. The system is trained on a corpus of 45,000 subversive Twitter messages collected from October 2014 to December 2016. We present a qualitative and quantitative analysis of the jihadist rhetoric in the corpus, examine the network of Twitter users, outline the technical procedure used to train the system, and discuss examples of use.Comment: 31 page

    Metadata and ontologies for organizing students’ memories and learning: standards and convergence models for context awareness

    Get PDF
    Este artículo trata de las ontologías que sirven para la comprensión en contexto y la Gestión de la Información Personal (PIM)y su aplicabilidad al proyecto Memex Metadata(M2). M2 es un proyecto de investigación de la Universidad de Carolina del Norte en Chapel Hill para mejorar la memoria digital de los alumnos utilizando tablet PC, la tecnología SenseCam de Microsoft y otras tecnologías móviles(p.ej. un dispositivo de GPS) para capturar el contexto del aprendizaje. Este artículo presenta el proyecto M2, dicute el concepto de los portafolios digitales en las actuales tendencias educativas, relacionándolos con las tecnologías emergentes, revisa las ontologías relevantes y su relación con el proyecto CAF (Context Awareness Framework), y concluye identificando las líneas de investigación futuras.This paper focuses on ontologies supporting context awareness and Personal Information Management (PIM) and their applicability in Memex Metadata (M2) project. M2 is a research project of the University of North Carolina at Chapel Hill to improve student digital memories using the tablet PC, Microsoft’s SenseCam technology, and other mobile technologies (e.g., a GPS device) to capture context. The M2 project offers new opportunities studying students’ learning with digital technologies. This paper introduces the M2 project; discusses E-portfolios and current educational trends related to pervasive computing; reviews relevant ontologies and their relationship to the projects’ CAF (context awareness framework), and concludes by identifying future research directions
    • …
    corecore