13,833 research outputs found
Designing Fair Ranking Schemes
Items from a database are often ranked based on a combination of multiple
criteria. A user may have the flexibility to accept combinations that weigh
these criteria differently, within limits. On the other hand, this choice of
weights can greatly affect the fairness of the produced ranking. In this paper,
we develop a system that helps users choose criterion weights that lead to
greater fairness.
We consider ranking functions that compute the score of each item as a
weighted sum of (numeric) attribute values, and then sort items on their score.
Each ranking function can be expressed as a vector of weights, or as a point in
a multi-dimensional space. For a broad range of fairness criteria, we show how
to efficiently identify regions in this space that satisfy these criteria.
Using this identification method, our system is able to tell users whether
their proposed ranking function satisfies the desired fairness criteria and, if
it does not, to suggest the smallest modification that does. We develop
user-controllable approximation that and indexing techniques that are applied
during preprocessing, and support sub-second response times during the online
phase. Our extensive experiments on real datasets demonstrate that our methods
are able to find solutions that satisfy fairness criteria effectively and
efficiently
Learning what matters - Sampling interesting patterns
In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user's interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.Comment: PAKDD 2017, extended versio
MLPerf Inference Benchmark
Machine-learning (ML) hardware and software system demand is burgeoning.
Driven by ML applications, the number of different ML inference systems has
exploded. Over 100 organizations are building ML inference chips, and the
systems that incorporate existing models span at least three orders of
magnitude in power consumption and five orders of magnitude in performance;
they range from embedded devices to data-center solutions. Fueling the hardware
are a dozen or more software frameworks and libraries. The myriad combinations
of ML hardware and ML software make assessing ML-system performance in an
architecture-neutral, representative, and reproducible manner challenging.
There is a clear need for industry-wide standard ML benchmarking and evaluation
criteria. MLPerf Inference answers that call. In this paper, we present our
benchmarking method for evaluating ML inference systems. Driven by more than 30
organizations as well as more than 200 ML engineers and practitioners, MLPerf
prescribes a set of rules and best practices to ensure comparability across
systems with wildly differing architectures. The first call for submissions
garnered more than 600 reproducible inference-performance measurements from 14
organizations, representing over 30 systems that showcase a wide range of
capabilities. The submissions attest to the benchmark's flexibility and
adaptability.Comment: ISCA 202
Finding k-Dissimilar Paths with Minimum Collective Length
Shortest path computation is a fundamental problem in road networks. However,
in many real-world scenarios, determining solely the shortest path is not
enough. In this paper, we study the problem of finding k-Dissimilar Paths with
Minimum Collective Length (kDPwML), which aims at computing a set of paths from
a source s to a target t such that all paths are pairwise dissimilar by at
least \theta and the sum of the path lengths is minimal. We introduce an exact
algorithm for the kDPwML problem, which iterates over all possible s-t paths
while employing two pruning techniques to reduce the prohibitively expensive
computational cost. To achieve scalability, we also define the much smaller set
of the simple single-via paths, and we adapt two algorithms for kDPwML queries
to iterate over this set. Our experimental analysis on real road networks shows
that iterating over all paths is impractical, while iterating over the set of
simple single-via paths can lead to scalable solutions with only a small
trade-off in the quality of the results.Comment: Extended version of the SIGSPATIAL'18 paper under the same titl
Inductive queries for a drug designing robot scientist
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments
- …