39,994 research outputs found
Matching Code and Law: Achieving Algorithmic Fairness with Optimal Transport
Increasingly, discrimination by algorithms is perceived as a societal and
legal problem. As a response, a number of criteria for implementing algorithmic
fairness in machine learning have been developed in the literature. This paper
proposes the Continuous Fairness Algorithm (CFA) which enables a
continuous interpolation between different fairness definitions. More
specifically, we make three main contributions to the existing literature.
First, our approach allows the decision maker to continuously vary between
specific concepts of individual and group fairness. As a consequence, the
algorithm enables the decision maker to adopt intermediate ``worldviews'' on
the degree of discrimination encoded in algorithmic processes, adding nuance to
the extreme cases of ``we're all equal'' (WAE) and ``what you see is what you
get'' (WYSIWYG) proposed so far in the literature. Second, we use optimal
transport theory, and specifically the concept of the barycenter, to maximize
decision maker utility under the chosen fairness constraints. Third, the
algorithm is able to handle cases of intersectionality, i.e., of
multi-dimensional discrimination of certain groups on grounds of several
criteria. We discuss three main examples (credit applications; college
admissions; insurance contracts) and map out the legal and policy implications
of our approach. The explicit formalization of the trade-off between individual
and group fairness allows this post-processing approach to be tailored to
different situational contexts in which one or the other fairness criterion may
take precedence. Finally, we evaluate our model experimentally.Comment: Vastly extended new version, now including computational experiment
Evaluation of research activities of universities of Ukraine and Belarus: a set of bibliometric indicators and its implementation
Monitoring bibliometric indicators of University rankings is considered as a
subject of a University library activity. In order to fulfill comparative
assessment of research activities of the universities of Ukraine and Belarus
the authors introduced a set of bibliometric indicators. A comparative
assessment of the research activities of corresponding universities was
fulfilled; the data on the leading universities are presented. The sensitivity
of the one of the indicators to rapid changes of the research activity of
universities and the fact that the other one is normalized across the fields of
science condition advantage of the proposed set over the one that was used in
practice of the corresponding national rankings
Mapping the UK Webspace: Fifteen Years of British Universities on the Web
This paper maps the national UK web presence on the basis of an analysis of
the .uk domain from 1996 to 2010. It reviews previous attempts to use web
archives to understand national web domains and describes the dataset. Next, it
presents an analysis of the .uk domain, including the overall number of links
in the archive and changes in the link density of different second-level
domains over time. We then explore changes over time within a particular
second-level domain, the academic subdomain .ac.uk, and compare linking
practices with variables, including institutional affiliation, league table
ranking, and geographic location. We do not detect institutional affiliation
affecting linking practices and find only partial evidence of league table
ranking affecting network centrality, but find a clear inverse relationship
between the density of links and the geographical distance between
universities. This echoes prior findings regarding offline academic activity,
which allows us to argue that real-world factors like geography continue to
shape academic relationships even in the Internet age. We conclude with
directions for future uses of web archive resources in this emerging area of
research.Comment: To appear in the proceeding of WebSci 201
Recommended from our members
Meta-analysis of massively parallel reporter assays enables prediction of regulatory function across cell types.
Deciphering the potential of noncoding loci to influence gene regulation has been the subject of intense research, with important implications in understanding genetic underpinnings of human diseases. Massively parallel reporter assays (MPRAs) can measure regulatory activity of thousands of DNA sequences and their variants in a single experiment. With increasing number of publically available MPRA data sets, one can now develop data-driven models which, given a DNA sequence, predict its regulatory activity. Here, we performed a comprehensive meta-analysis of several MPRA data sets in a variety of cellular contexts. We first applied an ensemble of methods to predict MPRA output in each context and observed that the most predictive features are consistent across data sets. We then demonstrate that predictive models trained in one cellular context can be used to predict MPRA output in another, with loss of accuracy attributed to cell-type-specific features. Finally, we show that our approach achieves top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" Challenge for predicting effects of single-nucleotide variants. Overall, our analysis provides insights into how MPRA data can be leveraged to highlight functional regulatory regions throughout the genome and can guide effective design of future experiments by better prioritizing regions of interest
Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization
In Automatic Text Summarization, preprocessing is an important phase to
reduce the space of textual representation. Classically, stemming and
lemmatization have been widely used for normalizing words. However, even using
normalization on large texts, the curse of dimensionality can disturb the
performance of summarizers. This paper describes a new method for normalization
of words to further reduce the space of representation. We propose to reduce
each word to its initial letters, as a form of Ultra-stemming. The results show
that Ultra-stemming not only preserve the content of summaries produced by this
representation, but often the performances of the systems can be dramatically
improved. Summaries on trilingual corpora were evaluated automatically with
Fresa. Results confirm an increase in the performance, regardless of summarizer
system used.Comment: 22 pages, 12 figures, 9 table
- …