5,101 research outputs found

    Exploiting Text and Network Context for Geolocation of Social Media Users

    Full text link
    Research on automatically geolocating social media users has conventionally been based on the text content of posts from a given user or the social network of the user, with very little crossover between the two, and no bench-marking of the two approaches over compara- ble datasets. We bring the two threads of research together in first proposing a text-based method based on adaptive grids, followed by a hybrid network- and text-based method. Evaluating over three Twitter datasets, we show that the empirical difference between text- and network-based methods is not great, and that hybridisation of the two is superior to the component methods, especially in contexts where the user graph is not well connected. We achieve state-of-the-art results on all three datasets

    Controlling for Unobserved Confounds in Classification Using Correlational Constraints

    Full text link
    As statistical classifiers become integrated into real-world applications, it is important to consider not only their accuracy but also their robustness to changes in the data distribution. In this paper, we consider the case where there is an unobserved confounding variable zz that influences both the features x\mathbf{x} and the class variable yy. When the influence of zz changes from training to testing data, we find that the classifier accuracy can degrade rapidly. In our approach, we assume that we can predict the value of zz at training time with some error. The prediction for zz is then fed to Pearl's back-door adjustment to build our model. Because of the attenuation bias caused by measurement error in zz, standard approaches to controlling for zz are ineffective. In response, we propose a method to properly control for the influence of zz by first estimating its relationship with the class variable yy, then updating predictions for zz to match that estimated relationship. By adjusting the influence of zz, we show that we can build a model that exceeds competing baselines on accuracy as well as on robustness over a range of confounding relationships.Comment: 9 page

    Predicting the Law Area and Decisions of French Supreme Court Cases

    Get PDF
    In this paper, we investigate the application of text classification methods to predict the law area and the decision of cases judged by the French Supreme Court. We also investigate the influence of the time period in which a ruling was made over the textual form of the case description and the extent to which it is necessary to mask the judge's motivation for a ruling to emulate a real-world test scenario. We report results of 96% f1 score in predicting a case ruling, 90% f1 score in predicting the law area of a case, and 75.9% f1 score in estimating the time span when a ruling has been issued using a linear Support Vector Machine (SVM) classifier trained on lexical features.Comment: RANLP 201
    • …
    corecore