35,179 research outputs found
Multilayer Network of Language: a Unified Framework for Structural Analysis of Linguistic Subsystems
Recently, the focus of complex networks research has shifted from the
analysis of isolated properties of a system toward a more realistic modeling of
multiple phenomena - multilayer networks. Motivated by the prosperity of
multilayer approach in social, transport or trade systems, we propose the
introduction of multilayer networks for language. The multilayer network of
language is a unified framework for modeling linguistic subsystems and their
structural properties enabling the exploration of their mutual interactions.
Various aspects of natural language systems can be represented as complex
networks, whose vertices depict linguistic units, while links model their
relations. The multilayer network of language is defined by three aspects: the
network construction principle, the linguistic subsystem and the language of
interest. More precisely, we construct a word-level (syntax, co-occurrence and
its shuffled counterpart) and a subword level (syllables and graphemes) network
layers, from five variations of original text (in the modeled language). The
obtained results suggest that there are substantial differences between the
networks structures of different language subsystems, which are hidden during
the exploration of an isolated layer. The word-level layers share structural
properties regardless of the language (e.g. Croatian or English), while the
syllabic subword level expresses more language dependent structural properties.
The preserved weighted overlap quantifies the similarity of word-level layers
in weighted and directed networks. Moreover, the analysis of motifs reveals a
close topological structure of the syntactic and syllabic layers for both
languages. The findings corroborate that the multilayer network framework is a
powerful, consistent and systematic approach to model several linguistic
subsystems simultaneously and hence to provide a more unified view on language
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Generating readable texts for readers with low basic skills
Most NLG systems generate texts for readers with good reading ability, but SkillSum adapts its output for readers with poor literacy. Evaluation with lowskilled readers confirms that SkillSum's knowledge-based microplanning choices enhance readability. We also discuss future readability improvements
Automatic Detection of Online Jihadist Hate Speech
We have developed a system that automatically detects online jihadist hate
speech with over 80% accuracy, by using techniques from Natural Language
Processing and Machine Learning. The system is trained on a corpus of 45,000
subversive Twitter messages collected from October 2014 to December 2016. We
present a qualitative and quantitative analysis of the jihadist rhetoric in the
corpus, examine the network of Twitter users, outline the technical procedure
used to train the system, and discuss examples of use.Comment: 31 page
Metadata and ontologies for organizing students’ memories and learning: standards and convergence models for context awareness
Este artĂculo trata de las ontologĂas que sirven para la comprensiĂłn en contexto y la GestiĂłn de la InformaciĂłn Personal (PIM)y su aplicabilidad al proyecto Memex Metadata(M2). M2 es un proyecto de investigaciĂłn de la Universidad de Carolina del Norte en Chapel Hill para mejorar la memoria digital de los alumnos utilizando tablet PC, la tecnologĂa SenseCam de Microsoft y otras tecnologĂas mĂłviles(p.ej. un dispositivo de GPS) para capturar el contexto del aprendizaje. Este artĂculo presenta el proyecto M2, dicute el concepto de los portafolios digitales en las actuales tendencias educativas, relacionándolos con las tecnologĂas emergentes, revisa las ontologĂas relevantes y su relaciĂłn con el proyecto CAF (Context Awareness Framework), y concluye identificando las lĂneas de investigaciĂłn futuras.This paper focuses on ontologies supporting context awareness and Personal Information Management (PIM) and their
applicability in Memex Metadata (M2) project. M2 is a research project of the University of North Carolina at Chapel Hill to
improve student digital memories using the tablet PC, Microsoft’s SenseCam technology, and other mobile technologies (e.g.,
a GPS device) to capture context. The M2 project offers new opportunities studying students’ learning with digital
technologies. This paper introduces the M2 project; discusses E-portfolios and current educational trends related to pervasive
computing; reviews relevant ontologies and their relationship to the projects’ CAF (context awareness framework), and
concludes by identifying future research directions
- …