5,101 research outputs found
Exploiting Text and Network Context for Geolocation of Social Media Users
Research on automatically geolocating social media users has conventionally
been based on the text content of posts from a given user or the social network
of the user, with very little crossover between the two, and no bench-marking
of the two approaches over compara- ble datasets. We bring the two threads of
research together in first proposing a text-based method based on adaptive
grids, followed by a hybrid network- and text-based method. Evaluating over
three Twitter datasets, we show that the empirical difference between text- and
network-based methods is not great, and that hybridisation of the two is
superior to the component methods, especially in contexts where the user graph
is not well connected. We achieve state-of-the-art results on all three
datasets
Controlling for Unobserved Confounds in Classification Using Correlational Constraints
As statistical classifiers become integrated into real-world applications, it
is important to consider not only their accuracy but also their robustness to
changes in the data distribution. In this paper, we consider the case where
there is an unobserved confounding variable that influences both the
features and the class variable . When the influence of
changes from training to testing data, we find that the classifier accuracy can
degrade rapidly. In our approach, we assume that we can predict the value of
at training time with some error. The prediction for is then fed to
Pearl's back-door adjustment to build our model. Because of the attenuation
bias caused by measurement error in , standard approaches to controlling for
are ineffective. In response, we propose a method to properly control for
the influence of by first estimating its relationship with the class
variable , then updating predictions for to match that estimated
relationship. By adjusting the influence of , we show that we can build a
model that exceeds competing baselines on accuracy as well as on robustness
over a range of confounding relationships.Comment: 9 page
Predicting the Law Area and Decisions of French Supreme Court Cases
In this paper, we investigate the application of text classification methods
to predict the law area and the decision of cases judged by the French Supreme
Court. We also investigate the influence of the time period in which a ruling
was made over the textual form of the case description and the extent to which
it is necessary to mask the judge's motivation for a ruling to emulate a
real-world test scenario. We report results of 96% f1 score in predicting a
case ruling, 90% f1 score in predicting the law area of a case, and 75.9% f1
score in estimating the time span when a ruling has been issued using a linear
Support Vector Machine (SVM) classifier trained on lexical features.Comment: RANLP 201
- …