17 research outputs found

    A Celebration of West Point Authors, July - December 2019

    Get PDF
    Highlighting the 456 collected works of scholarship published and presented between July - December 2019.https://digitalcommons.usmalibrary.org/books/1022/thumbnail.jp

    Achieving Causal Fairness in Recommendation

    Get PDF
    Recommender systems provide personalized services for users seeking information and play an increasingly important role in online applications. While most research papers focus on inventing machine learning algorithms to fit user behavior data and maximizing predictive performance in recommendation, it is also very important to develop fairness-aware machine learning algorithms such that the decisions made by them are not only accurate but also meet desired fairness requirements. In personalized recommendation, although there are many works focusing on fairness and discrimination, how to achieve user-side fairness in bandit recommendation from a causal perspective still remains a challenging task. Besides, the deployed systems utilize user-item interaction data to train models and then generate new data by online recommendation. This feedback loop in recommendation often results in various biases in observational data. The goal of this dissertation is to address challenging issues in achieving causal fairness in recommender systems: achieving user-side fairness and counterfactual fairness in bandit-based recommendation, mitigating confounding and sample selection bias simultaneously in recommendation and robustly improving bandit learning process with biased offline data. In this dissertation, we developed the following algorithms and frameworks for research problems related to causal fairness in recommendation. • We developed a contextual bandit algorithm to achieve group level user-side fairness and two UCB-based causal bandit algorithms to achieve counterfactual individual fairness for personalized recommendation; • We derived sufficient and necessary graphical conditions for identifying and estimating three causal quantities under the presence of confounding and sample selection biases and proposed a framework for leveraging the causal bound derived from the confounded and selection biased offline data to robustly improve online bandit learning process; • We developed a framework for discrimination analysis with the benefit of multiple causes of the outcome variable to deal with hidden confounding; • We proposed a new causal-based fairness notion and developed algorithms for determining whether an individual or a group of individuals is discriminated in terms of equality of effort

    Achieving Causal Fairness in Recommendation

    Get PDF
    Recommender systems provide personalized services for users seeking information and play an increasingly important role in online applications. While most research papers focus on inventing machine learning algorithms to fit user behavior data and maximizing predictive performance in recommendation, it is also very important to develop fairness-aware machine learning algorithms such that the decisions made by them are not only accurate but also meet desired fairness requirements. In personalized recommendation, although there are many works focusing on fairness and discrimination, how to achieve user-side fairness in bandit recommendation from a causal perspective still remains a challenging task. Besides, the deployed systems utilize user-item interaction data to train models and then generate new data by online recommendation. This feedback loop in recommendation often results in various biases in observational data. The goal of this dissertation is to address challenging issues in achieving causal fairness in recommender systems: achieving user-side fairness and counterfactual fairness in bandit-based recommendation, mitigating confounding and sample selection bias simultaneously in recommendation and robustly improving bandit learning process with biased offline data. In this dissertation, we developed the following algorithms and frameworks for research problems related to causal fairness in recommendation. • We developed a contextual bandit algorithm to achieve group level user-side fairness and two UCB-based causal bandit algorithms to achieve counterfactual individual fairness for personalized recommendation; • We derived sufficient and necessary graphical conditions for identifying and estimating three causal quantities under the presence of confounding and sample selection biases and proposed a framework for leveraging the causal bound derived from the confounded and selection biased offline data to robustly improve online bandit learning process; • We developed a framework for discrimination analysis with the benefit of multiple causes of the outcome variable to deal with hidden confounding; • We proposed a new causal-based fairness notion and developed algorithms for determining whether an individual or a group of individuals is discriminated in terms of equality of effort

    A Data-Driven Approach for Modeling Agents

    Get PDF
    Agents are commonly created on a set of simple rules driven by theories, hypotheses, and assumptions. Such modeling premise has limited use of real-world data and is challenged when modeling real-world systems due to the lack of empirical grounding. Simultaneously, the last decade has witnessed the production and availability of large-scale data from various sensors that carry behavioral signals. These data sources have the potential to change the way we create agent-based models; from simple rules to driven by data. Despite this opportunity, the literature has neglected to offer a modeling approach to generate granular agent behaviors from data, creating a gap in the literature. This dissertation proposes a novel data-driven approach for modeling agents to bridge the research gap. The approach is composed of four detailed steps including data preparation, attribute model creation, behavior model creation, and integration. The connection between and within each step is established using data flow diagrams. The practicality of the approach is demonstrated with a human mobility model that uses millions of location footprints collected from social media. In this model, the generation of movement behavior is tested with five machine learning/statistical modeling techniques covering a large number of model/data configurations. Results show that Random Forest-based learning is the most effective for the mobility use case. Furthermore, agent attribute values are obtained/generated with machine learning and translational assignment techniques. The proposed approach is evaluated in two ways. First, the use case model is compared to another model which is developed using a state-of-the-art data-driven approach. The model’s prediction performance is comparable to the state-of-the-art model. The plausibility of behaviors and model structure in the use case model is found to be closer to real-world than the state-of-the-art model. This outcome indicates that the proposed approach produces realistic results. Second, a standard mobility dataset is used for driving the mobility model in place of social media data. Despite its small size, the data and model resembled the results gathered from the primary use case indicating the possibility of using different datasets with the proposed approach

    Assessing the Efficacy of the ELECTRA Pre-Trained Language Model for Multi-Class Sarcasm Subcategory Classification

    Get PDF
    Sarcasm detection remains a challenging task in the discipline of natural language processing, primarily due to the large levels of nuance, subjectivity, and context-sensitivity in expression of the sentiment. Pre-trained large language models have been employed in a variety of sarcasm detection tasks, including binary sarcasm detection and the classification of sarcastic speech subcategories. However, such models remain compute-hungry solutions and thus there has been a recent trend towards attempting to mitigate this through the creation of more lightweight models- including ELECTRA. This dissertation seeks to assess the efficacy of the ELECTRA pre-trained large language model, known for its computational efficiency and performant results in various natural language processing tasks, for multi-class sarcasm subcategory classification. This research proposes a partial fine-tuning approach to generalise on sarcastic data before the model is applied in several manners to the task while employing feature engineering techniques to remove overlap between hierarchical data categories. Preliminary results yield a macro F1 Score of 0.0787 for 6-class classification and 0.2363 for3-class classification, indicating potential for further improvement and application within the field

    Studying Users Interactions and Behavior In Social Media Using Natural Language Processing

    Get PDF
    Social media platforms have been growing at a rapid pace, attracting users\u27 engagement with the online content due to their convenience facilitated by many useful features. Such platforms provide users with interactive options such as likes, dislikes as well as a way of expressing their opinions in the form of text (i.e., comments). As more people engage in different social media platforms, such platforms will increase in both size and importance. This growth in social media data is becoming a vital new area for scholars and researchers to explore this new form of communication. The huge data from social media has been a massive aid to researchers in the mission of exploring the public\u27s behavior and opinion pursuing different venues in social media research. In recent years, social media platforms have facilitated the way people communicate and interact with each other. The recent approach in analyzing the human language in social media has been mostly powered by the use of Natural Language Processing (NLP) and deep learning techniques. NLP techniques are some of the most promising methods used in social media analyses, including content categorization, topic discovery and modeling, sentiment analysis. Such powerful methods have boosted the process of understanding human language by enabling researchers to aggregate data relating to certain events addressing several social issues. The ability of posting comments on these online platforms has allowed some users to post racist and obscene contents, and to spread hate on these platforms. In some cases, this kind of toxic behavior might turn the comment section from a space where users can share their views to a place where hate and profanity are spread. Such issues are observed across various social media platforms and many users are often exposed to these kinds of behaviors which requires comment moderators to spend a lot of time filtering out these inappropriate comments. Moreover, such textual inappropriate contents can be targeted towards users irrespective of age, concerning a variety of topics not only controversial, and triggered by various events. Our work is primarily focused on studying, detecting and analyzing users\u27 exposure to this kind of toxicity on different social media platforms utilizing state-of-art techniques in both deep learning and natural language processing areas, and facilitated by exclusively collected and curated datasets that address various domains. The different domains, or applications, benefit from a unified and versatile pipeline that could be applied to various scenarios. Those applications we target in this dissertation are: (1) the detection and measurement of kids\u27 exposure to inappropriate comments posted on YouTube videos targeting young users, (2) the association between topics of contents cover by mainstream news media and the toxicity of the comments and interactions by users, (3) the user interaction with, sentiment, and general behavior towards different topics discussed in social media platforms in light of major events (i.e., the outbreak of the COVID-19 pandemic). Our technical contribution is not limited to only the integration of the various techniques borrowed from the deep learning and natural language processing literature to those new and emerging problem spaces, for socially relevant computing problems, but also in comprehensively studying various approaches to determine their feasibility and relevant to the discussed problems, coupled with insights on the integration, as well as a rich set of conclusions supported with systematic measurements and in-depth analyses towards making the online space safer
    corecore