7,590 research outputs found

    Improving fairness in machine learning systems: What do industry practitioners need?

    Full text link
    The potential for machine learning (ML) systems to amplify social inequities and unfairness is receiving increasing popular and academic attention. A surge of recent work has focused on the development of algorithmic tools to assess and mitigate such unfairness. If these tools are to have a positive impact on industry practice, however, it is crucial that their design be informed by an understanding of real-world needs. Through 35 semi-structured interviews and an anonymous survey of 267 ML practitioners, we conduct the first systematic investigation of commercial product teams' challenges and needs for support in developing fairer ML systems. We identify areas of alignment and disconnect between the challenges faced by industry practitioners and solutions proposed in the fair ML research literature. Based on these findings, we highlight directions for future ML and HCI research that will better address industry practitioners' needs.Comment: To appear in the 2019 ACM CHI Conference on Human Factors in Computing Systems (CHI 2019

    Can lies be faked? Comparing low-stakes and high-stakes deception video datasets from a Machine Learning perspective

    Full text link
    Despite the great impact of lies in human societies and a meager 54% human accuracy for Deception Detection (DD), Machine Learning systems that perform automated DD are still not viable for proper application in real-life settings due to data scarcity. Few publicly available DD datasets exist and the creation of new datasets is hindered by the conceptual distinction between low-stakes and high-stakes lies. Theoretically, the two kinds of lies are so distinct that a dataset of one kind could not be used for applications for the other kind. Even though it is easier to acquire data on low-stakes deception since it can be simulated (faked) in controlled settings, these lies do not hold the same significance or depth as genuine high-stakes lies, which are much harder to obtain and hold the practical interest of automated DD systems. To investigate whether this distinction holds true from a practical perspective, we design several experiments comparing a high-stakes DD dataset and a low-stakes DD dataset evaluating their results on a Deep Learning classifier working exclusively from video data. In our experiments, a network trained in low-stakes lies had better accuracy classifying high-stakes deception than low-stakes, although using low-stakes lies as an augmentation strategy for the high-stakes dataset decreased its accuracy.Comment: 11 pages, 3 figure

    9. Differential Item Functioning In Licensure Tests

    Get PDF
    When test scores are used to make important decisions, as is typically the case with licensure tests, the validity of test score interpretations is extremely critical. The validity of the decision (e.g., pass or fail the licensure examination) relies heavily on the validity of the test score that is used in making the licensure decision. So, although validity is always a critical component in test score interpretation, it has increased importance when the score is used in high-stakes decision situations such as licensure testing. Issues in validity for licensure tests have been addressed in Chapter 4 of this volume. The focus of this chapter is on techniques that have been developed for identifying one source of test interpretation invalidity: differential item functioning (DIF) by identifiable groups. The chapter begins with a discussion of what constitutes differential item functioning and under what circumstances differential item functioning poses a source of test interpretation invalidity. Next, various methods for identifying test items that function differentially are highlighted. This section focuses principally on multiple-choice test items although a separate subsection on applications of DIF methods with constructed-response type items is presented. The chapter ends with a conclusion section that makes recommendations for future developments in the area of identification of test items that function inappropriately for different sub-populations. This chapter concentrates on the individual items that comprise the test, not on administrative or other aspects of testing that also might influence examinee test performance. Specifically, this chapter considers ways to identify items that function differentially for identifiable sub-populations. Other reasons for score performance differences (e.g., speeded conditions, administration medium, test anxiety/wiseness) are extremely important. However, these issues are beyond the scope of this chapter. The focus of this chapter is on discussing different approaches that have promise for identifying items that function differentially in licensure tests. It is not the intent of this chapter to present step-by-step details on calculating these various methods. The reader should reference other books that present formulas for such calculations, particularly Berk (1982), Camilli and Shepard (1994), and Holland and Wainer (1993). Further, this chapter is not designed to be a comprehensive resource for DIF methods; instead, the chapter samples from these methods those techniques that are relevant or dominant in use for DIF analysis with licensure test applications

    9. Differential Item Functioning In Licensure Tests

    Get PDF
    When test scores are used to make important decisions, as is typically the case with licensure tests, the validity of test score interpretations is extremely critical. The validity of the decision (e.g., pass or fail the licensure examination) relies heavily on the validity of the test score that is used in making the licensure decision. So, although validity is always a critical component in test score interpretation, it has increased importance when the score is used in high-stakes decision situations such as licensure testing. Issues in validity for licensure tests have been addressed in Chapter 4 of this volume. The focus of this chapter is on techniques that have been developed for identifying one source of test interpretation invalidity: differential item functioning (DIF) by identifiable groups. The chapter begins with a discussion of what constitutes differential item functioning and under what circumstances differential item functioning poses a source of test interpretation invalidity. Next, various methods for identifying test items that function differentially are highlighted. This section focuses principally on multiple-choice test items although a separate subsection on applications of DIF methods with constructed-response type items is presented. The chapter ends with a conclusion section that makes recommendations for future developments in the area of identification of test items that function inappropriately for different sub-populations. This chapter concentrates on the individual items that comprise the test, not on administrative or other aspects of testing that also might influence examinee test performance. Specifically, this chapter considers ways to identify items that function differentially for identifiable sub-populations. Other reasons for score performance differences (e.g., speeded conditions, administration medium, test anxiety/wiseness) are extremely important. However, these issues are beyond the scope of this chapter. The focus of this chapter is on discussing different approaches that have promise for identifying items that function differentially in licensure tests. It is not the intent of this chapter to present step-by-step details on calculating these various methods. The reader should reference other books that present formulas for such calculations, particularly Berk (1982), Camilli and Shepard (1994), and Holland and Wainer (1993). Further, this chapter is not designed to be a comprehensive resource for DIF methods; instead, the chapter samples from these methods those techniques that are relevant or dominant in use for DIF analysis with licensure test applications

    Social transmission of leadership preference:knowledge of group membership and partisan media reporting moderates perceptions of leadership ability from facial cues to competence and dominance

    Get PDF
    While first impressions of dominance and competence can influence leadership preference, social transmission of leadership preference has received little attention. The capacity to transmit, store and compute information has increased greatly over recent history, and the new media environment may encourage partisanship (i.e. ‘echo chambers’), misinformation and rumour spreading to support political and social causes and be conducive both to emotive writing and emotional contagion, which may shape voting behaviour. In our pre-registered experiment, we examined whether implicit associations between facial cues to dominance and competence (intelligence) and leadership ability are strengthened by partisan media and knowledge that leaders support or oppose us on a socio-political issue of personal importance. Social information, in general, reduced well-established implicit associations between facial cues and leadership ability. However, as predicted, social knowledge of group membership reduced preferences for facial cues to high dominance and intelligence in out-group leaders. In the opposite-direction to our original prediction, this ‘in-group bias’ was greater under less partisan versus partisan media, with partisan writing eliciting greater state anxiety across the sample. Partisanship also altered the salience of women’s facial appearance (i.e., cues to high dominance and intelligence) in out-group versus in-group leaders. Independent of the media environment, men and women displayed an in-group bias toward facial cues of dominance in same-sex leaders. Our findings reveal effects of minimal social information (facial appearance, group membership, media reporting) on leadership judgements, which may have implications for patterns of voting or socio-political behaviour at the local or national level

    Toward a Holistic Model of Deception: Subject Matter Expert Validation

    Get PDF
    Security challenges require greater insight and flexibility into the way deception can be identified and responded to. Deception research in interactions has identified behaviors indicative of truth-telling and deceit. Deception in military environments has focused on planning deception, where approaches have been developed to deceive others, but neglecting counter-deception perspectives. To address these challenges a holistic approach to deception is advocated. A literature review of deception was conducted followed by validation interviews with Subject Matter Experts (SMEs). Explanatory thematic analysis of interviews conducted with SMEs (n=19) led to the development of meta-themes related to the ‘deceiver’, their ‘intent; ‘strategies and tactics’ of deception, ‘interpretation’ by the target and ‘target’ decision-making strengths and vulnerabilities. This led to the development of the Holistic Model of Deception (HMD), an approach where strategies reflect context. The implications of this approach are considered alongside the limitations and future directions required to validate the HMD
    • 

    corecore