Search CORE

44,996 research outputs found

Diversity and Inclusion Metrics in Subset Selection

Author: Baker Dylan
Denton Emily
Gebru Timnit
Hanna Alex
Hutchinson Ben
Mitchell Margaret
Moorosi Nyalleng
Morgenstern Jamie
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/02/2020
Field of study

The ethical concept of fairness has recently been applied in machine learning (ML) settings to describe a wide range of constraints and objectives. When considering the relevance of ethical concepts to subset selection problems, the concepts of diversity and inclusion are additionally applicable in order to create outputs that account for social power and access differentials. We introduce metrics based on these concepts, which can be applied together, separately, and in tandem with additional fairness constraints. Results from human subject experiments lend support to the proposed criteria. Social choice methods can additionally be leveraged to aggregate and choose preferable sets, and we detail how these may be applied

arXiv.org e-Print Archive

Feature weighting techniques for CBR in software effort estimation studies: A review and empirical evaluation

Author: Aha D. W.
Ashley K. D.
Bardsiri V. K.
Bareiss R.
Cain T.
Hedges L.
Higgins J.
Kirsopp C.
Mohri T.
Skalak D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/09/2014
Field of study

Context : Software effort estimation is one of the most important activities in the software development process. Unfortunately, estimates are often substantially wrong. Numerous estimation methods have been proposed including Case-based Reasoning (CBR). In order to improve CBR estimation accuracy, many researchers have proposed feature weighting techniques (FWT). Objective: Our purpose is to systematically review the empirical evidence to determine whether FWT leads to improved predictions. In addition we evaluate these techniques from the perspectives of (i) approach (ii) strengths and weaknesses (iii) performance and (iv) experimental evaluation approach including the data sets used. Method: We conducted a systematic literature review of published, refereed primary studies on FWT (2000-2014). Results: We identified 19 relevant primary studies. These reported a range of different techniques. 17 out of 19 make benchmark comparisons with standard CBR and 16 out of 17 studies report improved accuracy. Using a one-sample sign test this positive impact is significant (p = 0:0003). Conclusion: The actionable conclusion from this study is that our review of all relevant empirical evidence supports the use of FWTs and we recommend that researchers and practitioners give serious consideration to their adoption

Brunel University Research Archive

Controllability of Social Networks and the Strategic Use of Random Information

Author: Casamassima Francesca
Cremonini Marco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2017
Field of study

This work is aimed at studying realistic social control strategies for social networks based on the introduction of random information into the state of selected driver agents. Deliberately exposing selected agents to random information is a technique already experimented in recommender systems or search engines, and represents one of the few options for influencing the behavior of a social context that could be accepted as ethical, could be fully disclosed to members, and does not involve the use of force or of deception. Our research is based on a model of knowledge diffusion applied to a time-varying adaptive network, and considers two well-known strategies for influencing social contexts. One is the selection of few influencers for manipulating their actions in order to drive the whole network to a certain behavior; the other, instead, drives the network behavior acting on the state of a large subset of ordinary, scarcely influencing users. The two approaches have been studied in terms of network and diffusion effects. The network effect is analyzed through the changes induced on network average degree and clustering coefficient, while the diffusion effect is based on two ad-hoc metrics defined to measure the degree of knowledge diffusion and skill level, as well as the polarization of agent interests. The results, obtained through simulations on synthetic networks, show a rich dynamics and strong effects on the communication structure and on the distribution of knowledge and skills, supporting our hypothesis that the strategic use of random information could represent a realistic approach to social network controllability, and that with both strategies, in principle, the control effect could be remarkable

arXiv.org e-Print Archive

Directory of Open Access Journals

Daily Stress Recognition from Mobile Phone Data, Weather Conditions and Individual Traits

Author: Cohen S.
Faust V.
Garcia S.
John O. P.
Muaremi A.
Plarre K.
Scherer K. R.
Singh S. R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/10/2014
Field of study

Research has proven that stress reduces quality of life and causes many diseases. For this reason, several researchers devised stress detection systems based on physiological parameters. However, these systems require that obtrusive sensors are continuously carried by the user. In our paper, we propose an alternative approach providing evidence that daily stress can be reliably recognized based on behavioral metrics, derived from the user's mobile phone activity and from additional indicators, such as the weather conditions (data pertaining to transitory properties of the environment) and the personality traits (data concerning permanent dispositions of individuals). Our multifactorial statistical model, which is person-independent, obtains the accuracy score of 72.28% for a 2-class daily stress recognition problem. The model is efficient to implement for most of multimedia applications due to highly reduced low-dimensional feature space (32d). Moreover, we identify and discuss the indicators which have strong predictive power.Comment: ACM Multimedia 2014, November 3-7, 2014, Orlando, Florida, US

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Learning Determinantal Point Processes

Author: Kulesza Alex
Taskar Ben
Publication venue
Publication date: 01/01/2011
Field of study

Determinantal point processes (DPPs), which arise in random matrix theory and quantum physics, are natural models for subset selection problems where diversity is preferred. Among many remarkable properties, DPPs offer tractable algorithms for exact inference, including computing marginal probabilities and sampling; however, an important open question has been how to learn a DPP from labeled training data. In this paper we propose a natural feature-based parameterization of conditional DPPs, and show how it leads to a convex and efficient learning formulation. We analyze the relationship between our model and binary Markov random fields with repulsive potentials, which are qualitatively similar but computationally intractable. Finally, we apply our approach to the task of extractive summarization, where the goal is to choose a small subset of sentences conveying the most important information from a set of documents. In this task there is a fundamental tradeoff between sentences that are highly relevant to the collection as a whole, and sentences that are diverse and not repetitive. Our parameterization allows us to naturally balance these two characteristics. We evaluate our system on data from the DUC 2003/04 multi-document summarization task, achieving state-of-the-art results

arXiv.org e-Print Archive

CiteSeerX

ASSESSING THE RELATIVE INFLUENCES OF ABIOTIC AND BIOTIC FACTORS ON A SPECIES’ DISTRIBUTION USING PSEUDO-ABSENCE AND FUNCTIONAL TRAIT DATA: A CASE STUDY WITH THE AMERICAN EEL (Anguilla rostrata)

Author: Woods Taylor E
Publication venue: VCU Scholars Compass
Publication date: 01/01/2018
Field of study

Species’ distributions are influenced by abiotic and biotic factors but direct comparison of their relative importance is difficult, particularly when working with complex, multi-species datasets. Here, we present a flexible method to compare abiotic and biotic influences at common scales. First, data representing abiotic and biotic factors are collected using a combination of geographic information system, remotely sensed, and species’ functional trait data. Next, the relative influences of each predictor variable on the occurrence of a focal species are compared. Specifically, ‘sample’ data from sites of known occurrence are compared with ‘background’ data (i.e. pseudo-absence data collected at sites where occurrence is unknown, combined with sample data). Predictor variables that may have the strongest influence on the focal species are identified as those where sample data are clearly distinct from the corresponding background distribution. To demonstrate the method, effects of hydrology, physical habitat, and co-occurring fish functional traits are assessed relative to the contemporary (1950 – 1990) distribution of the American Eel (Anguilla rostrata) in six Mid-Atlantic (USA) rivers. We find that Eel distribution has likely been influenced by the functional characteristics of co-occurring fishes and by local dam density, but not by other physical habitat or hydrologic factors

The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics

Author: Samoilenko Anna
Yasseri Taha
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/12/2013
Field of study

Activity of modern scholarship creates online footprints galore. Along with traditional metrics of research quality, such as citation counts, online images of researchers and institutions increasingly matter in evaluating academic impact, decisions about grant allocation, and promotion. We examined 400 biographical Wikipedia articles on academics from four scientific fields to test if being featured in the world's largest online encyclopedia is correlated with higher academic notability (assessed through citation counts). We found no statistically significant correlation between Wikipedia articles metrics (length, number of edits, number of incoming links from other articles, etc.) and academic notability of the mentioned researchers. We also did not find any evidence that the scientists with better WP representation are necessarily more prominent in their fields. In addition, we inspected the Wikipedia coverage of notable scientists sampled from Thomson Reuters list of "highly cited researchers". In each of the examined fields, Wikipedia failed in covering notable scholars properly. Both findings imply that Wikipedia might be producing an inaccurate image of academics on the front end of science. By shedding light on how public perception of academic progress is formed, this study alerts that a subjective element might have been introduced into the hitherto structured system of academic evaluation.Comment: To appear in EPJ Data Science. To have the Additional Files and Datasets e-mail the corresponding autho

arXiv.org e-Print Archive

Springer - Publisher Connector

Learning Detection with Diverse Proposals

Author: Azadi Samaneh
Darrell Trevor
Feng Jiashi
Publication venue
Publication date: 11/04/2017
Field of study

To predict a set of diverse and informative proposals with enriched representations, this paper introduces a differentiable Determinantal Point Process (DPP) layer that is able to augment the object detection architectures. Most modern object detection architectures, such as Faster R-CNN, learn to localize objects by minimizing deviations from the ground-truth but ignore correlation between multiple proposals and object categories. Non-Maximum Suppression (NMS) as a widely used proposal pruning scheme ignores label- and instance-level relations between object candidates resulting in multi-labeled detections. In the multi-class case, NMS selects boxes with the largest prediction scores ignoring the semantic relation between categories of potential election. In contrast, our trainable DPP layer, allowing for Learning Detection with Diverse Proposals (LDDP), considers both label-level contextual information and spatial layout relationships between proposals without increasing the number of parameters of the network, and thus improves location and category specifications of final detected bounding boxes substantially during both training and inference schemes. Furthermore, we show that LDDP keeps it superiority over Faster R-CNN even if the number of proposals generated by LDPP is only ~30% as many as those for Faster R-CNN.Comment: Accepted to CVPR 201

arXiv.org e-Print Archive