9,340 research outputs found
Legal Judgement Prediction for UK Courts
Legal Judgement Prediction (LJP) is the task of automatically predicting the outcome of a court case given only the case document. During the last five years researchers have successfully attempted this task for the supreme courts of three jurisdictions: the European Union, France, and China. Motivation includes the many real world applications including: a prediction system that can be used at the judgement drafting stage, and the identification of the most important words and phrases within a judgement. The aim of our research was to build, for the first time, an LJP model for UK court cases. This required the creation of a labelled data set of UK court judgements and the subsequent application of machine learning models. We evaluated different feature representations and different algorithms. Our best performing model achieved: 69.05% accuracy and 69.02 F1 score. We demonstrate that LJP is a promising area of further research for UK courts by achieving high model performance and the ability to easily extract useful features
Distributed Robust Learning
We propose a framework for distributed robust statistical learning on {\em
big contaminated data}. The Distributed Robust Learning (DRL) framework can
reduce the computational time of traditional robust learning methods by several
orders of magnitude. We analyze the robustness property of DRL, showing that
DRL not only preserves the robustness of the base robust learning method, but
also tolerates contaminations on a constant fraction of results from computing
nodes (node failures). More precisely, even in presence of the most adversarial
outlier distribution over computing nodes, DRL still achieves a breakdown point
of at least , where is the break down point of
corresponding centralized algorithm. This is in stark contrast with naive
division-and-averaging implementation, which may reduce the breakdown point by
a factor of when computing nodes are used. We then specialize the
DRL framework for two concrete cases: distributed robust principal component
analysis and distributed robust regression. We demonstrate the efficiency and
the robustness advantages of DRL through comprehensive simulations and
predicting image tags on a large-scale image set.Comment: 18 pages, 2 figure
Personality in Computational Advertising: A Benchmark
In the last decade, new ways of shopping online have increased the
possibility of buying products and services more easily and faster
than ever. In this new context, personality is a key determinant
in the decision making of the consumer when shopping. A personâs
buying choices are influenced by psychological factors like
impulsiveness; indeed some consumers may be more susceptible
to making impulse purchases than others. Since affective metadata
are more closely related to the userâs experience than generic
parameters, accurate predictions reveal important aspects of userâs
attitudes, social life, including attitude of others and social identity.
This work proposes a highly innovative research that uses a personality
perspective to determine the unique associations among the
consumerâs buying tendency and advert recommendations. In fact,
the lack of a publicly available benchmark for computational advertising
do not allow both the exploration of this intriguing research
direction and the evaluation of recent algorithms. We present the
ADS Dataset, a publicly available benchmark consisting of 300 real
advertisements (i.e., Rich Media Ads, Image Ads, Text Ads) rated
by 120 unacquainted individuals, enriched with Big-Five usersâ
personality factors and 1,200 personal usersâ pictures
Fighting Authorship Linkability with Crowdsourcing
Massive amounts of contributed content -- including traditional literature,
blogs, music, videos, reviews and tweets -- are available on the Internet
today, with authors numbering in many millions. Textual information, such as
product or service reviews, is an important and increasingly popular type of
content that is being used as a foundation of many trendy community-based
reviewing sites, such as TripAdvisor and Yelp. Some recent results have shown
that, due partly to their specialized/topical nature, sets of reviews authored
by the same person are readily linkable based on simple stylometric features.
In practice, this means that individuals who author more than a few reviews
under different accounts (whether within one site or across multiple sites) can
be linked, which represents a significant loss of privacy.
In this paper, we start by showing that the problem is actually worse than
previously believed. We then explore ways to mitigate authorship linkability in
community-based reviewing. We first attempt to harness the global power of
crowdsourcing by engaging random strangers into the process of re-writing
reviews. As our empirical results (obtained from Amazon Mechanical Turk)
clearly demonstrate, crowdsourcing yields impressively sensible reviews that
reflect sufficiently different stylometric characteristics such that prior
stylometric linkability techniques become largely ineffective. We also consider
using machine translation to automatically re-write reviews. Contrary to what
was previously believed, our results show that translation decreases authorship
linkability as the number of intermediate languages grows. Finally, we explore
the combination of crowdsourcing and machine translation and report on the
results
A Measurement of Rb using a Double Tagging Method
The fraction of Z to bbbar events in hadronic Z decays has been measured by
the OPAL experiment using the data collected at LEP between 1992 and 1995. The
Z to bbbar decays were tagged using displaced secondary vertices, and high
momentum electrons and muons. Systematic uncertainties were reduced by
measuring the b-tagging efficiency using a double tagging technique. Efficiency
correlations between opposite hemispheres of an event are small, and are well
understood through comparisons between real and simulated data samples. A value
of Rb = 0.2178 +- 0.0011 +- 0.0013 was obtained, where the first error is
statistical and the second systematic. The uncertainty on Rc, the fraction of Z
to ccbar events in hadronic Z decays, is not included in the errors. The
dependence on Rc is Delta(Rb)/Rb = -0.056*Delta(Rc)/Rc where Delta(Rc) is the
deviation of Rc from the value 0.172 predicted by the Standard Model. The
result for Rb agrees with the value of 0.2155 +- 0.0003 predicted by the
Standard Model.Comment: 42 pages, LaTeX, 14 eps figures included, submitted to European
Physical Journal
Evaluating two methods for Treebank grammar compaction
Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be âread offâ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar.
In this paper, we explore ways by which a treebank grammar can be reduced in size or âcompactedâ, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision
- âŠ