467 research outputs found
Traitement automatique des données hétérogènes liées à l'aménagement des territoires
National audienceLa notion d'aménagement du territoire fait référence à différents concepts tels que les informations spatiales et temporelles, les acteurs, les opinions, l'histoire, la politique, etc. Aujourd'hui, avec le développement des technologies numériques (blogs, forums, réseaux sociaux, etc.), l'ensemble des acteurs impliqués s'expriment et tous les documents textuels ainsi produits constituent une source considérable d'informations qu'il est crucial d'analyser. Dans cet article, nous souhaitons poser les premières bases d'une méthode automatique d'extraction de connaissances permettant d'analyser le ressenti (opinion et/ou sentiment) des acteurs impliqués à partir d'un corpus de données totalement hétérogènes constitués spécifiquement pour un territoire. Une telle approche, qui se situe dans le domaine de la science des données, offrira aux décideurs et aux usagers d'un territoire un environnement leur permettant d'en obtenir les clefs de lecture et d'en mesurer tous les enjeux et les contours
Models of Scholarly Communication and Citation Analysis
Informetric/bibliometric analyses have to a large extent been relying on an assumption that research is essentially cumulative in its nature, which is not the least visible in the rational for using citation analyses to assess quality of research. However, when reviewing both the theoretical literature on how research is organized and studies analyzing the structures of research fields through informetric mapping methods, it becomes clear that cumulative organization is just one category of several ways of organizing research and scholarly communication, Consequently, the way the role of citations is interpreted in research assessment has to be revised. Based on the review of previous research, this paper suggests a model for categorizing different modes of scholarly communication. We test this model through three different kinds of semantic labelling analyses on abstracts and research papers from the fields of biomedicine, computer science and educational research. The model proposed suggests three main categories of scholarly communication: cumulative, negotiating and distinctive; and when matching the labels identified in the semantic analysis to the three categories, we find evidence of the three different ways of communicating research that supports the model
Recommended from our members
Neural approaches to discourse coherence: modeling, evaluation and application
Discourse coherence is an important aspect of text quality that refers to the way different textual units relate to each other. In this thesis, I investigate neural approaches to modeling discourse coherence. I present a multi-task neural network where the main task is to predict a document-level coherence score and the secondary task is to learn word-level syntactic features. Additionally, I examine the effect of using contextualised word representations in single-task and multi-task setups. I evaluate my models on a synthetic dataset where incoherent documents are created by shuffling the sentence order in coherent original documents. The results show the efficacy of my multi-task learning approach, particularly when enhanced with contextualised embeddings, achieving new state-of-the-art results in ranking the coherent documents higher than the incoherent ones (96.9%). Furthermore, I apply my approach to the realistic domain of people’s everyday writing, such as emails and online posts, and further demonstrate its ability to capture various degrees of coherence. In order to further investigate the linguistic properties captured by coherence models, I create two datasets that exhibit syntactic and semantic alterations. Evaluating different models on these datasets reveals their ability to capture syntactic perturbations but their inadequacy to detect semantic changes. I find that semantic alterations are instead captured by models that first build sentence representations from averaged word embeddings, then apply a set of linear transformations over input sentence pairs. Finally, I present an application for coherence models in the pedagogical domain. I first demonstrate that state of-the-art neural approaches to automated essay scoring (AES) are not robust to adversarially created, grammatical, but incoherent sequences of sentences. Accordingly, I propose a framework for integrating and jointly training a coherence model with a state-of-the-art neural AES system in order to enhance its ability to detect such adversarial input. I show that this joint framework maintains a performance comparable to the state-of-the-art AES system in predicting a holistic essay score while significantly outperforming it in adversarial detection
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
Recommended from our members
Social Network Extraction from Text
In the pre-digital age, when electronically stored information was non-existent, the only ways of creating representations of social networks were by hand through surveys, inter- views, and observations. In this digital age of the internet, numerous indications of social interactions and associations are available electronically in an easy to access manner as structured meta-data. This lessens our dependence on manual surveys and interviews for creating and studying social networks. However, there are sources of networks that remain untouched simply because they are not associated with any meta-data. Primary examples of such sources include the vast amounts of literary texts, news articles, content of emails, and other forms of unstructured and semi-structured texts.
The main contribution of this thesis is the introduction of natural language processing and applied machine learning techniques for uncovering social networks in such sources of unstructured and semi-structured texts. Specifically, we propose three novel techniques for mining social networks from three types of texts: unstructured texts (such as literary texts), emails, and movie screenplays. For each of these types of texts, we demonstrate the utility of the extracted networks on three applications (one for each type of text)
- …