325 research outputs found
Weighted Random Walk Sampling for Multi-Relational Recommendation
In the information overloaded web, personalized recommender systems are
essential tools to help users find most relevant information. The most
heavily-used recommendation frameworks assume user interactions that are
characterized by a single relation. However, for many tasks, such as
recommendation in social networks, user-item interactions must be modeled as a
complex network of multiple relations, not only a single relation. Recently
research on multi-relational factorization and hybrid recommender models has
shown that using extended meta-paths to capture additional information about
both users and items in the network can enhance the accuracy of recommendations
in such networks. Most of this work is focused on unweighted heterogeneous
networks, and to apply these techniques, weighted relations must be simplified
into binary ones. However, information associated with weighted edges, such as
user ratings, which may be crucial for recommendation, are lost in such
binarization. In this paper, we explore a random walk sampling method in which
the frequency of edge sampling is a function of edge weight, and apply this
generate extended meta-paths in weighted heterogeneous networks. With this
sampling technique, we demonstrate improved performance on multiple data sets
both in terms of recommendation accuracy and model generation efficiency
Fairness in Information Access Systems
Recommendation, information retrieval, and other information access systems
pose unique challenges for investigating and applying the fairness and
non-discrimination concepts that have been developed for studying other machine
learning systems. While fair information access shares many commonalities with
fair classification, the multistakeholder nature of information access
applications, the rank-based problem setting, the centrality of personalization
in many cases, and the role of user response complicate the problem of
identifying precisely what types and operationalizations of fairness may be
relevant, let alone measuring or promoting them.
In this monograph, we present a taxonomy of the various dimensions of fair
information access and survey the literature to date on this new and
rapidly-growing topic. We preface this with brief introductions to information
access and algorithmic fairness, to facilitate use of this work by scholars
with experience in one (or neither) of these fields who wish to learn about
their intersection. We conclude with several open problems in fair information
access, along with some suggestions for how to approach research in this space
Recommended from our members
Flatter is better: Percentile Transformations for Recommender Systems
It is well known that explicit user ratings in recommender systems are biased toward high ratings and that users differ significantly in their usage of the rating scale. Implementers usually compensate for these issues through rating normalization or the inclusion of a user bias term in factorization models. However, these methods adjust only for the central tendency of users’ distributions. In this work, we demonstrate that a lack of flatness in rating distributions is negatively correlated with recommendation performance. We propose a rating transformation model that compensates for skew in the rating distribution as well as its central tendency by converting ratings into percentile values as a pre-processing step before recommendation generation. This transformation flattens the rating distribution, better compensates for differences in rating distributions, and improves recommendation performance. We also show that a smoothed version of this transformation can yield more intuitive results for users with very narrow rating distributions. A comprehensive set of experiments, with state-of-the-art recommendation algorithms in four real-world datasets, show improved ranking performance for these percentile transformations.
</p
Exploring Author Gender in Book Rating and Recommendation
Collaborative filtering algorithms find useful patterns in rating and consumption data and exploit these patterns to guide users to good items. Many of the patterns in rating datasets reflect important real-world differences between the various users and items in the data; other patterns may be irrelevant or possibly undesirable for social or ethical reasons, particularly if they reflect undesired discrimination, such as gender or ethnic discrimination in publishing. In this work, we examine the response of collaborative filtering recommender algorithms to the distribution of their input data with respect to a dimension of social concern, namely content creator gender. Using publicly-available book ratings data, we measure the distribution of the genders of the authors of books in user rating profiles and recommendation lists produced from this data. We find that common collaborative filtering algorithms differ in the gender distribution of their recommendation lists, and in the relationship of that output distribution to user profile distribution
Searching for transits in the Wide Field Camera Transit Survey with difference-imaging light curves
The Wide Field Camera Transit Survey is a pioneer program aiming at for searching extra-solar planets in the near-infrared. The images from the survey are processed by a data reduction pipeline, which uses aperture photometry to construct the light curves. We produce an alternative set of light curves using the difference-imaging method for the most complete field in the survey and carry out a quantitative comparison between the photometric precision achieved with both methods. The results show that differencephotometry light curves present an important improvement for stars with J > 16. We report an implementation on the box-fitting transit detection algorithm, which performs a trapezoid-fit to the folded light curve, providing more accurate results than the boxfitting model. We describe and optimize a set of selection criteria to search for transit candidates, including the V-shape parameter calculated by our detection algorithm. The optimized selection criteria are applied to the aperture photometry and difference-imaging light curves, resulting in the automatic detection of the best 200 transit candidates from a sample of ~475 000 sources. We carry out a detailed analysis in the 18 best detections and classify them as transiting planet and eclipsing binary candidates. We present one planet candidate orbiting a late G-type star. No planet candidate around M-stars has been found, confirming the null detection hypothesis and upper limits on the occurrence rate of short-period giant planets around M-dwarfs presented in a prior study. We extend the search for transiting planets to stars with J ≤ 18, which enables us to set a stricter upper limit of 1.1%. Furthermore, we present the detection of five faint extremely-short period eclipsing binaries and three M-dwarf/M-dwarf binary candidates. The detections demonstrate the benefits of using the difference-imaging light curves, especially when going to fainter magnitudes.Peer reviewe
Warp signatures of the Galactic disk as seen in mid infrared from Midcourse Space Experiment
The gross features in the distribution of stars as well as warm (T >~ 100 K)
interstellar dust in the Galactic disk have been investigated using the recent
mid infrared survey by Midcourse Space Experiment (MSX) at 8, 12, 14 & 21
micron bands. An attempt has been made to determine the location of the
Galactic mid-plane at various longitudes, using two approaches : (i) fitting
exponential functions to the latitude profiles and (ii) statistical
indicators.The former method is successful for the inner Galaxy (-90 < l < 90),
and quantifies characteristic angular scales along latitude, which have been
translated to linear scale heights (z_h) and radial length scales (R_l) using
geometric description of the Galactic disk. The distribution of warm dust in
the Galactic disk is found to be characterised by R_l < 6 kpc and 60 < z_h <~
100 pc, in agreement with other studies. The location of the Galactic mid-plane
as a function of longitude, for stars as well as warm dust, has been searched
for signatures of warp-like feature in their distribution, by fitting sinusoid
with phase and amplitude as parameters. In every case, the warp signature has
been detected. An identical analysis of the DIRBE/COBE data in all its ten
bands covering the entire infrared spectrum (1.25-240 micron), also leads to
detection of warp signatures with very similar phase as found from the MSX
data. Our results have been compared with those from other studies.Comment: To be published in 'Astronomy and Astrophysics' (12 pages including 9
figures & 4 tables
Accuracy of optical spectroscopy for the detection of cervical intraepithelial neoplasia without colposcopic tissue information; a step toward automation for low resource settings
Optical spectroscopy has been proposed as an accurate and low-cost alternative for detection of cervical
intraepithelial neoplasia. We previously published an algorithm using optical spectroscopy as an adjunct to colposcopy
and found good accuracy (sensitivity ¼ 1.00 [95% confidence interval ðCIÞ ¼ 0.92 to 1.00], specificity ¼
0.71 [95% CI ¼ 0.62 to 0.79]). Those results used measurements taken by expert colposcopists as well as the colposcopy
diagnosis. In this study, we trained and tested an algorithm for the detection of cervical intraepithelial
neoplasia (i.e., identifying those patients who had histology reading CIN 2 or worse) that did not include the colposcopic
diagnosis. Furthermore, we explored the interaction between spectroscopy and colposcopy, examining
the importance of probe placement expertise. The colposcopic diagnosis-independent spectroscopy algorithm
had a sensitivity of 0.98 (95% CI ¼ 0.89 to 1.00) and a specificity of 0.62 (95% CI ¼ 0.52 to 0.71). The difference
in the partial area under the ROC curves between spectroscopy with and without the colposcopic diagnosis was
statistically significant at the patient level (p ¼ 0.05) but not the site level (p ¼ 0.13). The results suggest that the
device has high accuracy over a wide range of provider accuracy and hence could plausibly be implemented by
providers with limited training
Strategies and challenges associated with recruiting retirement village communities and residents into a group exercise intervention
Background: Randomized controlled trials (RCTs) provide the highest level of scientific evidence, but successful participant recruitment is critical to ensure the external and internal validity of results. This study describes the strategies associated with recruiting older adults at increased falls risk residing in retirement villages into an 18-month cluster RCT designed to evaluate the effects of a dual-task exercise program on falls and physical and cognitive function. Methods: Recruitment of adults aged ≥65 at increased falls risk residing within retirement villages (size 60–350 residents) was initially designed to occur over 12 months using two distinct cohorts (C). Recruitment occurred via a three-stage approach that included liaising with: 1) village operators, 2) independent village managers, and 3) residents. To recruit residents, a variety of different approaches were used, including distribution of information pack, on-site presentations, free muscle and functional testing, and posters displayed in common areas. Results: Due to challenges with recruitment, three cohorts were established between February 2014 and April 2015 (14 months). Sixty retirement villages were initially invited, of which 32 declined or did not respond, leaving 28 villages that expressed interest. A total of 3947 individual letters of invitation were subsequently distributed to residents of these villages, from which 517 (13.1%) expressions of interest (EOI) were received. Across three cohorts with different recruitment strategies adopted there were only modest differences in the number of EOI received (10.5 to 15.3%), which suggests that no particular recruitment approach was most effective. Following the initial screening of these residents, 398 (77.0%) participants were deemed eligible to participate, but a final sample of 300 (58.0% of the 517 EOI) consented and was randomized; 7.6% of the 3947 residents invited. Principal reasons for not participating, despite being eligible, were poor health, lack of time and no GP approval. Conclusion: This study highlights that there are significant challenges associated with recruiting sufficient numbers of older adults from independent living retirement villages into an exercise intervention designed to improve health and well-being. Trial registration: Australian New Zealand Clinical Trials Registry: ACTRN12613001 161718. Date registered 23rd October 2013
Implications of climate change for agricultural productivity in the early twenty-first century
This paper reviews recent literature concerning a wide range of processes through which climate change could potentially impact global-scale agricultural productivity, and presents projections of changes in relevant meteorological, hydrological and plant physiological quantities from a climate model ensemble to illustrate key areas of uncertainty. Few global-scale assessments have been carried out, and these are limited in their ability to capture the uncertainty in climate projections, and omit potentially important aspects such as extreme events and changes in pests and diseases. There is a lack of clarity on how climate change impacts on drought are best quantified from an agricultural perspective, with different metrics giving very different impressions of future risk. The dependence of some regional agriculture on remote rainfall, snowmelt and glaciers adds to the complexity. Indirect impacts via sea-level rise, storms and diseases have not been quantified. Perhaps most seriously, there is high uncertainty in the extent to which the direct effects of CO2 rise on plant physiology will interact with climate change in affecting productivity. At present, the aggregate impacts of climate change on global-scale agricultural productivity cannot be reliably quantified
Seismic risk assessment for developing countries : Pakistan as a case study
Modern Earthquake Risk Assessment (ERA) methods usually require seismo-tectonic information for Probabilistic Seismic Hazard Assessment (PSHA) that may not be readily available in developing countries. To bypass this drawback, this paper presents a practical event-based PSHA method that uses instrumental seismicity, available historical seismicity, as well as limited information on geology and tectonic setting. Historical seismicity is integrated with instrumental seismicity to determine the long-term hazard. The tectonic setting is included by assigning seismic source zones associated with known major faults. Monte Carlo simulations are used to generate earthquake catalogues with randomized key hazard parameters. A case study region in Pakistan is selected to demonstrate the effectiveness of the method. The results indicate that the proposed method produces seismic hazard maps consistent with previous studies, thus being suitable for generating such maps in regions where limited data are available. The PSHA procedure is developed as an integral part of an ERA framework named EQRAM. The framework is also used to determine seismic risk in terms of annual losses for the study region
- …