44,996 research outputs found
Diversity and Inclusion Metrics in Subset Selection
The ethical concept of fairness has recently been applied in machine learning
(ML) settings to describe a wide range of constraints and objectives. When
considering the relevance of ethical concepts to subset selection problems, the
concepts of diversity and inclusion are additionally applicable in order to
create outputs that account for social power and access differentials. We
introduce metrics based on these concepts, which can be applied together,
separately, and in tandem with additional fairness constraints. Results from
human subject experiments lend support to the proposed criteria. Social choice
methods can additionally be leveraged to aggregate and choose preferable sets,
and we detail how these may be applied
Feature weighting techniques for CBR in software effort estimation studies: A review and empirical evaluation
Context : Software effort estimation is one of the most important activities in the software development process. Unfortunately, estimates are often substantially wrong. Numerous estimation methods have been proposed including Case-based Reasoning (CBR). In order to improve CBR estimation accuracy, many researchers have proposed feature weighting techniques (FWT). Objective: Our purpose is to systematically review the empirical evidence to determine whether FWT leads to improved predictions. In addition we evaluate these techniques from the perspectives of (i) approach (ii) strengths and weaknesses (iii) performance and (iv) experimental evaluation approach including the data sets used. Method: We conducted a systematic literature review of published, refereed primary studies on FWT (2000-2014). Results: We identified 19 relevant primary studies. These reported a range of different techniques. 17 out of 19 make benchmark comparisons with standard CBR and 16 out of 17 studies report improved accuracy. Using a one-sample sign test this positive impact is significant (p = 0:0003). Conclusion: The actionable conclusion from this study is that our review of all relevant empirical evidence supports the use of FWTs and we recommend that researchers and practitioners give serious consideration to their adoption
Controllability of Social Networks and the Strategic Use of Random Information
This work is aimed at studying realistic social control strategies for social
networks based on the introduction of random information into the state of
selected driver agents. Deliberately exposing selected agents to random
information is a technique already experimented in recommender systems or
search engines, and represents one of the few options for influencing the
behavior of a social context that could be accepted as ethical, could be fully
disclosed to members, and does not involve the use of force or of deception.
Our research is based on a model of knowledge diffusion applied to a
time-varying adaptive network, and considers two well-known strategies for
influencing social contexts. One is the selection of few influencers for
manipulating their actions in order to drive the whole network to a certain
behavior; the other, instead, drives the network behavior acting on the state
of a large subset of ordinary, scarcely influencing users. The two approaches
have been studied in terms of network and diffusion effects. The network effect
is analyzed through the changes induced on network average degree and
clustering coefficient, while the diffusion effect is based on two ad-hoc
metrics defined to measure the degree of knowledge diffusion and skill level,
as well as the polarization of agent interests. The results, obtained through
simulations on synthetic networks, show a rich dynamics and strong effects on
the communication structure and on the distribution of knowledge and skills,
supporting our hypothesis that the strategic use of random information could
represent a realistic approach to social network controllability, and that with
both strategies, in principle, the control effect could be remarkable
Daily Stress Recognition from Mobile Phone Data, Weather Conditions and Individual Traits
Research has proven that stress reduces quality of life and causes many
diseases. For this reason, several researchers devised stress detection systems
based on physiological parameters. However, these systems require that
obtrusive sensors are continuously carried by the user. In our paper, we
propose an alternative approach providing evidence that daily stress can be
reliably recognized based on behavioral metrics, derived from the user's mobile
phone activity and from additional indicators, such as the weather conditions
(data pertaining to transitory properties of the environment) and the
personality traits (data concerning permanent dispositions of individuals). Our
multifactorial statistical model, which is person-independent, obtains the
accuracy score of 72.28% for a 2-class daily stress recognition problem. The
model is efficient to implement for most of multimedia applications due to
highly reduced low-dimensional feature space (32d). Moreover, we identify and
discuss the indicators which have strong predictive power.Comment: ACM Multimedia 2014, November 3-7, 2014, Orlando, Florida, US
Learning Determinantal Point Processes
Determinantal point processes (DPPs), which arise in random matrix theory and
quantum physics, are natural models for subset selection problems where
diversity is preferred. Among many remarkable properties, DPPs offer tractable
algorithms for exact inference, including computing marginal probabilities and
sampling; however, an important open question has been how to learn a DPP from
labeled training data. In this paper we propose a natural feature-based
parameterization of conditional DPPs, and show how it leads to a convex and
efficient learning formulation. We analyze the relationship between our model
and binary Markov random fields with repulsive potentials, which are
qualitatively similar but computationally intractable. Finally, we apply our
approach to the task of extractive summarization, where the goal is to choose a
small subset of sentences conveying the most important information from a set
of documents. In this task there is a fundamental tradeoff between sentences
that are highly relevant to the collection as a whole, and sentences that are
diverse and not repetitive. Our parameterization allows us to naturally balance
these two characteristics. We evaluate our system on data from the DUC 2003/04
multi-document summarization task, achieving state-of-the-art results
ASSESSING THE RELATIVE INFLUENCES OF ABIOTIC AND BIOTIC FACTORS ON A SPECIES’ DISTRIBUTION USING PSEUDO-ABSENCE AND FUNCTIONAL TRAIT DATA: A CASE STUDY WITH THE AMERICAN EEL (Anguilla rostrata)
Species’ distributions are influenced by abiotic and biotic factors but direct comparison of their relative importance is difficult, particularly when working with complex, multi-species datasets. Here, we present a flexible method to compare abiotic and biotic influences at common scales. First, data representing abiotic and biotic factors are collected using a combination of geographic information system, remotely sensed, and species’ functional trait data. Next, the relative influences of each predictor variable on the occurrence of a focal species are compared. Specifically, ‘sample’ data from sites of known occurrence are compared with ‘background’ data (i.e. pseudo-absence data collected at sites where occurrence is unknown, combined with sample data). Predictor variables that may have the strongest influence on the focal species are identified as those where sample data are clearly distinct from the corresponding background distribution. To demonstrate the method, effects of hydrology, physical habitat, and co-occurring fish functional traits are assessed relative to the contemporary (1950 – 1990) distribution of the American Eel (Anguilla rostrata) in six Mid-Atlantic (USA) rivers. We find that Eel distribution has likely been influenced by the functional characteristics of co-occurring fishes and by local dam density, but not by other physical habitat or hydrologic factors
The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics
Activity of modern scholarship creates online footprints galore. Along with
traditional metrics of research quality, such as citation counts, online images
of researchers and institutions increasingly matter in evaluating academic
impact, decisions about grant allocation, and promotion. We examined 400
biographical Wikipedia articles on academics from four scientific fields to
test if being featured in the world's largest online encyclopedia is correlated
with higher academic notability (assessed through citation counts). We found no
statistically significant correlation between Wikipedia articles metrics
(length, number of edits, number of incoming links from other articles, etc.)
and academic notability of the mentioned researchers. We also did not find any
evidence that the scientists with better WP representation are necessarily more
prominent in their fields. In addition, we inspected the Wikipedia coverage of
notable scientists sampled from Thomson Reuters list of "highly cited
researchers". In each of the examined fields, Wikipedia failed in covering
notable scholars properly. Both findings imply that Wikipedia might be
producing an inaccurate image of academics on the front end of science. By
shedding light on how public perception of academic progress is formed, this
study alerts that a subjective element might have been introduced into the
hitherto structured system of academic evaluation.Comment: To appear in EPJ Data Science. To have the Additional Files and
Datasets e-mail the corresponding autho
Learning Detection with Diverse Proposals
To predict a set of diverse and informative proposals with enriched
representations, this paper introduces a differentiable Determinantal Point
Process (DPP) layer that is able to augment the object detection architectures.
Most modern object detection architectures, such as Faster R-CNN, learn to
localize objects by minimizing deviations from the ground-truth but ignore
correlation between multiple proposals and object categories. Non-Maximum
Suppression (NMS) as a widely used proposal pruning scheme ignores label- and
instance-level relations between object candidates resulting in multi-labeled
detections. In the multi-class case, NMS selects boxes with the largest
prediction scores ignoring the semantic relation between categories of
potential election. In contrast, our trainable DPP layer, allowing for Learning
Detection with Diverse Proposals (LDDP), considers both label-level contextual
information and spatial layout relationships between proposals without
increasing the number of parameters of the network, and thus improves location
and category specifications of final detected bounding boxes substantially
during both training and inference schemes. Furthermore, we show that LDDP
keeps it superiority over Faster R-CNN even if the number of proposals
generated by LDPP is only ~30% as many as those for Faster R-CNN.Comment: Accepted to CVPR 201
- …