723 research outputs found

    Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment

    Full text link
    Automated data-driven decision making systems are increasingly being used to assist, or even replace humans in many settings. These systems function by learning from historical decisions, often taken by humans. In order to maximize the utility of these systems (or, classifiers), their training involves minimizing the errors (or, misclassifications) over the given historical data. However, it is quite possible that the optimally trained classifier makes decisions for people belonging to different social groups with different misclassification rates (e.g., misclassification rates for females are higher than for males), thereby placing these groups at an unfair disadvantage. To account for and avoid such unfairness, in this paper, we introduce a new notion of unfairness, disparate mistreatment, which is defined in terms of misclassification rates. We then propose intuitive measures of disparate mistreatment for decision boundary-based classifiers, which can be easily incorporated into their formulation as convex-concave constraints. Experiments on synthetic as well as real world datasets show that our methodology is effective at avoiding disparate mistreatment, often at a small cost in terms of accuracy.Comment: To appear in Proceedings of the 26th International World Wide Web Conference (WWW), 2017. Code available at: https://github.com/mbilalzafar/fair-classificatio

    Kernel density classification and boosting: an L2 sub analysis

    Get PDF
    Kernel density estimation is a commonly used approach to classification. However, most of the theoretical results for kernel methods apply to estimation per se and not necessarily to classification. In this paper we show that when estimating the difference between two densities, the optimal smoothing parameters are increasing functions of the sample size of the complementary group, and we provide a small simluation study which examines the relative performance of kernel density methods when the final goal is classification. A relative newcomer to the classification portfolio is “boosting”, and this paper proposes an algorithm for boosting kernel density classifiers. We note that boosting is closely linked to a previously proposed method of bias reduction in kernel density estimation and indicate how it will enjoy similar properties for classification. We show that boosting kernel classifiers reduces the bias whilst only slightly increasing the variance, with an overall reduction in error. Numerical examples and simulations are used to illustrate the findings, and we also suggest further areas of research

    Forecasting Player Behavioral Data and Simulating in-Game Events

    Full text link
    Understanding player behavior is fundamental in game data science. Video games evolve as players interact with the game, so being able to foresee player experience would help to ensure a successful game development. In particular, game developers need to evaluate beforehand the impact of in-game events. Simulation optimization of these events is crucial to increase player engagement and maximize monetization. We present an experimental analysis of several methods to forecast game-related variables, with two main aims: to obtain accurate predictions of in-app purchases and playtime in an operational production environment, and to perform simulations of in-game events in order to maximize sales and playtime. Our ultimate purpose is to take a step towards the data-driven development of games. The results suggest that, even though the performance of traditional approaches such as ARIMA is still better, the outcomes of state-of-the-art techniques like deep learning are promising. Deep learning comes up as a well-suited general model that could be used to forecast a variety of time series with different dynamic behaviors

    A 180 Kpc Tidal Tail in the Luminous Infrared Merger Arp 299

    Get PDF
    We present VLA HI observations and UH88 deep optical B- and R-band observations of the IR luminous merger Arp 299 (= NGC 3690 + IC 694). These data reveal a gas-rich, optically faint tidal tail with a length of over 180 kpc. The size of this tidal feature necessitates an old interaction age for the merger (~750 Myr since first periapse), which is currently experiencing a very young star burst (~20 Myr). The observations reveal a most remarkable structure within the tidal tail: it appears to be composed of two parallel filaments separated by ~20 kpc. One of the filaments is gas rich with little if any starlight, while the other is gas poor. We believe that this bifurcation results from a warped disk in one of the progenitors. The quantities and kinematics of the tidal HI suggest that Arp 299 results from the collision of a retrograde Sab-Sb galaxy (IC 694) and a prograde Sbc-Sc galaxy (NGC 3690) that occurred 750 Myr ago and which will merge into a single object in ~60 Myr. We suggest that the present IR luminous phase in this system is due in part to the retrograde spin of IC 694. Finally, we discuss the apparent lack of tidal dwarf galaxies within the tail.Comment: LaTex, 14 pages, 11 figures, 4 tables, uses emulateapj.sty. Accepted to AJ for July 1999. For version with full-resolution images see http://www.cv.nrao.edu/~jhibbard/a299/HIpaper/a299HI.htm

    Prediction of Dengue Incidence Using Search Query Surveillance

    Get PDF
    Improvements in surveillance, prediction of outbreaks and the monitoring of the epidemiology of dengue virus in countries with underdeveloped surveillance systems are of great importance to ministries of health and other public health decision makers who are often constrained by budget or man-power. Google Flu Trends has proven successful in providing an early warning system for outbreaks of influenza weeks before case data are reported. We believe that there is greater potential for this technique for dengue, as the incidence of this pathogen can vary by a factor of ten in some settings, making prediction all the more important in public health planning. In this paper, we demonstrate the utility of Google search terms in predicting dengue incidence in Singapore and Bangkok, Thailand using several regression techniques. Incidence data were provided by the Singapore Ministry of Health and the Thailand Bureau of Epidemiology. We find our models predict incident cases well (correlation greater than 0.8) and periods of high incidence equally well (AUC greater than 0.95). All data and analysis code used in our study are available free online and can be adapted to other settings

    Why Are Some Plant Genera More Invasive Than Others?

    Get PDF
    Determining how biological traits are related to the ability of groups of organisms to become economically damaging when established outside of their native ranges is a major goal of population biology, and important in the management of invasive species. Little is known about why some taxonomic groups are more likely to become pests than others among plants. We investigated traits that discriminate vascular plant genera, a level of taxonomic generality at which risk assessment and screening could be more effectively performed, according to the proportion of naturalized species which are pests. We focused on the United States and Canada, and, because our purpose is ultimately regulatory, considered species classified as weeds or noxious. Using contingency tables, we identified 11 genera of vascular plants that are disproportionately represented by invasive species. Results from boosted regression tree analyses show that these categories reflect biological differences. In summary, approximately 25% of variation in genus proportions of weeds or noxious species was explained by biological covariates. Key explanatory traits included genus means for wetland habitat affinity, chromosome number, and seed mass
    corecore