Search CORE

5,101 research outputs found

Exploiting Text and Network Context for Geolocation of Social Media Users

Author: Baldwin Timothy
Cohn Trevor
Rahimi Afshin
Vu Duy
Publication venue
Publication date: 01/01/2015
Field of study

Research on automatically geolocating social media users has conventionally been based on the text content of posts from a given user or the social network of the user, with very little crossover between the two, and no bench-marking of the two approaches over compara- ble datasets. We bring the two threads of research together in first proposing a text-based method based on adaptive grids, followed by a hybrid network- and text-based method. Evaluating over three Twitter datasets, we show that the empirical difference between text- and network-based methods is not great, and that hybridisation of the two is superior to the component methods, especially in contexts where the user graph is not well connected. We achieve state-of-the-art results on all three datasets

arXiv.org e-Print Archive

University of Queensland eSpace

Controlling for Unobserved Confounds in Classification Using Correlational Constraints

Author: Culotta Aron
Landeiro Virgile
Publication venue
Publication date: 03/05/2017
Field of study

As statistical classifiers become integrated into real-world applications, it is important to consider not only their accuracy but also their robustness to changes in the data distribution. In this paper, we consider the case where there is an unobserved confounding variable

z

that influences both the features

\mathbf{x}

and the class variable

y

. When the influence of

z

changes from training to testing data, we find that the classifier accuracy can degrade rapidly. In our approach, we assume that we can predict the value of

z

at training time with some error. The prediction for

z

is then fed to Pearl's back-door adjustment to build our model. Because of the attenuation bias caused by measurement error in

z

, standard approaches to controlling for

z

are ineffective. In response, we propose a method to properly control for the influence of

z

by first estimating its relationship with the class variable

y

, then updating predictions for

z

to match that estimated relationship. By adjusting the influence of

z

, we show that we can build a model that exceeds competing baselines on accuracy as well as on robustness over a range of confounding relationships.Comment: 9 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Predicting the Law Area and Decisions of French Supreme Court Cases

Author: Sulea Octavia-Maria
van Genabith Josef
Vela Mihaela
Zampieri Marcos
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we investigate the application of text classification methods to predict the law area and the decision of cases judged by the French Supreme Court. We also investigate the influence of the time period in which a ruling was made over the textual form of the case description and the extent to which it is necessary to mask the judge's motivation for a ruling to emulate a real-world test scenario. We report results of 96% f1 score in predicting a case ruling, 90% f1 score in predicting the law area of a case, and 75.9% f1 score in estimating the time span when a ruling has been issued using a linear Support Vector Machine (SVM) classifier trained on lexical features.Comment: RANLP 201

arXiv.org e-Print Archive