4,423 research outputs found
Introduction: Modeling, Learning and Processing of Text-Technological Data Structures
Researchers in many disciplines, sometimes working in close cooperation, have been concerned with modeling textual data in order to account for texts as the prime information unit of written communication. The list of disciplines includes computer science and linguistics as well as more specialized disciplines like computational linguistics and text technology. What many of these efforts have in common is the aim to model textual data by means of abstract data types or data structures that support at least the semi-automatic processing of texts in any area of written communication
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Flexible RDF data extraction from Wiktionary - Leveraging the power of community build linguistic wikis
We present a declarative approach implemented in a comprehensive opensource
framework (based on DBpedia) to extract lexical-semantic resources (an ontology about language use) from Wiktionary. The data currently includes language, part of speech, senses, definitions, synonyms, taxonomies (hyponyms, hyperonyms, synonyms, antonyms) and translations for each lexical word. Main focus is on flexibility to the loose schema and configurability towards differing language-editions ofWiktionary. This is achieved by a declarative mediator/wrapper approach. The goal is, to allow the addition of languages just by configuration without the need of programming, thus enabling the swift and resource-conserving adaptation of wrappers by domain experts. The extracted data is as fine granular as the source data in Wiktionary and additionally follows the lemon model. It enables use cases like disambiguation or machine translation. By offering a linked data service, we hope to extend DBpedia’s central role in the LOD infrastructure to the world of Open Linguistics.
Searching textual and model-based process descriptions based on a unified data format
Documenting business processes using process models is common practice in many organizations. However, not all process information is best captured in process models. Hence, many organizations complement these models with textual descriptions that specify additional details. The problem with this supplementary use of textual descriptions is that existing techniques for automatically searching process repositories are limited to process models. They are not capable of taking the information from textual descriptions into account and, therefore, provide incomplete search results. In this paper, we address this problem and propose a technique that is capable of searching textual as well as model-based process descriptions. It automatically extracts activity-related and behavioral information from both descriptions types and stores it in a unified data format. An evaluation with a large Austrian bank demonstrates that the additional consideration of textual descriptions allows us to identify more relevant processes from a repository
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Lingmotif: una Herramienta de Análisis de Sentimiento Enfocada en el Usuario
In this paper, we describe Lingmotif, a lexicon-based, linguistically-motivated, user-friendly, GUI-enabled, multi-platform, Sentiment Analysis desktop application. Lingmotif can perform SA on any type of input texts, regardless of their length and topic. The analysis is based on the identification of sentiment-laden words and phrases contained in the application's rich core lexicons, and employs context rules to account for sentiment shifters. It offers easy-to-interpret visual representations of quantitative data, as well as a detailed, qualitative analysis of the text in terms of its sentiment. Lingmotif can also take user-provided plugin lexicons in order to account for domain-specific sentiment expression. As of version 1.0, Lingmotif analyzes English and Spanish texts. Lingmotif thus aims to become a general-purpose Sentiment Analysis tool for discourse analysis, rhetoric, psychology, marketing, the language industries, and others.En este artĂculo se describe Lingmotif, una aplicaciĂłn de Análisis de Sentimiento multi-plataforma, con interfaz gráfica de usuario amigable, motivada lingĂĽĂsticamente y basada en lĂ©xico. Lingmotif efectĂşa Análisis de Sentimiento sobre cualquier tipo de texto, independientemente de su tamaño o tema. El análisis se basa en la identificaciĂłn en el texto de palabras y frases con carga afectiva, contenidas en los diccionarios de la aplicaciĂłn, y aplica reglas de contexto para dar cabida a modificadores del sentimiento. Ofrece representaciones gráficas fáciles de interpretar de los datos cuantitativos, asĂ como un análisis detallado del texto. Lingmotif tambiĂ©n puede utilizar lĂ©xicos del usuario a modo de plugins, de tal modo que es posible analizar de forma efectiva la expresiĂłn del sentimiento en dominios especĂficos. La versiĂłn 1.0 de Lingmotif está preparada para trabajar con textos en español e inglĂ©s. De este modo, se conforma como una herramienta de propĂłsito general en el ámbito del Análisis de Sentimiento para el análisis del discurso, retĂłrica, psicologĂa, marketing, las industrias de la lengua y otras.This research was supported by Spain’s MINECO through the funding of project Lingmotif2 (FFI2016-78141-P)
- …