126 research outputs found

    Machine Learning for Synthetic Data Generation: A Review

    Full text link
    Data plays a crucial role in machine learning. However, in real-world applications, there are several problems with data, e.g., data are of low quality; a limited number of data points lead to under-fitting of the machine learning model; it is hard to access the data due to privacy, safety and regulatory concerns. Synthetic data generation offers a promising new avenue, as it can be shared and used in ways that real-world data cannot. This paper systematically reviews the existing works that leverage machine learning models for synthetic data generation. Specifically, we discuss the synthetic data generation works from several perspectives: (i) applications, including computer vision, speech, natural language, healthcare, and business; (ii) machine learning methods, particularly neural network architectures and deep generative models; (iii) privacy and fairness issue. In addition, we identify the challenges and opportunities in this emerging field and suggest future research directions

    2022 SDSU Data Science Symposium Presentation Abstracts

    Get PDF
    This document contains abstracts for presentations and posters 2022 SDSU Data Science Symposium

    2022 SDSU Data Science Symposium Presentation Abstracts

    Get PDF
    This document contains abstracts for presentations and posters 2022 SDSU Data Science Symposium

    2022 SDSU Data Science Symposium Program

    Get PDF
    https://openprairie.sdstate.edu/ds_symposium_programs/1003/thumbnail.jp

    SoK: On the Impossible Security of Very Large Foundation Models

    Full text link
    Large machine learning models, or so-called foundation models, aim to serve as base-models for application-oriented machine learning. Although these models showcase impressive performance, they have been empirically found to pose serious security and privacy issues. We may however wonder if this is a limitation of the current models, or if these issues stem from a fundamental intrinsic impossibility of the foundation model learning problem itself. This paper aims to systematize our knowledge supporting the latter. More precisely, we identify several key features of today's foundation model learning problem which, given the current understanding in adversarial machine learning, suggest incompatibility of high accuracy with both security and privacy. We begin by observing that high accuracy seems to require (1) very high-dimensional models and (2) huge amounts of data that can only be procured through user-generated datasets. Moreover, such data is fundamentally heterogeneous, as users generally have very specific (easily identifiable) data-generating habits. More importantly, users' data is filled with highly sensitive information, and maybe heavily polluted by fake users. We then survey lower bounds on accuracy in privacy-preserving and Byzantine-resilient heterogeneous learning that, we argue, constitute a compelling case against the possibility of designing a secure and privacy-preserving high-accuracy foundation model. We further stress that our analysis also applies to other high-stake machine learning applications, including content recommendation. We conclude by calling for measures to prioritize security and privacy, and to slow down the race for ever larger models.Comment: 13 page

    Essays on behavioral responses to dishonest and anti-social decision making

    Get PDF
    The first essay examines how people respond to dishonest behavior that benefits them (but hurts a third party) by examining their trust towards the dishonest decision-maker. The results of a laboratory experiment do not provide conclusive evidence, with people trusting honest and dishonest decision-makers to a similar extent. Instead, trust is higher when the moral dilemma is completely avoided by chance and the highest payoff was obtained without having to lie. This implies that moral dilemmas are better prevented than cured. The second essay assesses the extent to which parents engage in norm compliance and norm enforcement in front of their children, with the aim of teaching them the importance of social norms. The field experiment shows that parents are more likely to punish a norm violation by a stranger in front of their children and that they are more likely to help a stranger in need in this case. As such, this essay documents that parents teach their children not simply by modeling desired behavior to them, but also by punishing undesirable behavior. Finally, the third essay examines how anti-social incentives, imposed by an employer, which encourage workers to harm an outside party to the benefit of the employer may backfire and hurt subsequent productivity of workers. In a laboratory experiment, subjects in the role of workers punish employers who imposed antisocial incentives by performing worse in a task that benefits the employer. This effect disappears when the employer had no control over the incentives imposed on the worker. Together, these results show that workers negatively reciprocate the psychological costs imposed on them intentionally and exemplify the importance of organizational leadership

    Cultivating capacities in community-based researchers in low-resource settings: Lessons from a participatory study on violence and mental health in Sri Lanka

    Get PDF
    Participatory methods, which rely heavily on community-based data collectors, are growing in popularity to deliver much-needed evidence on violence and mental health in low- and middle-income countries. These settings, along with local researchers, encounter the highest burden of violence and mental ill-health, with the fewest resources to respond. Despite increased focus on wellbeing for research participants and, to a lesser degree, professional researchers in such studies, the role-specific needs of community-based researchers receive scant attention. This co-produced paper draws insights from one group’s experience to identify rewards, challenges, and recommendations for supporting wellbeing and development of community-based researchers in sensitive participatory projects in low-resource settings. Twenty-one community-based researchers supporting a mixed-methods study on youth, violence and mental health in Sri Lanka submitted 63 reflexive structured journal entries across three rounds of data collection. We applied Attride-Stirling’s method for thematic analysis to explore peer researchers’ learning about research, violence and mental health; personal-professional boundaries; challenges in sensitive research; and experiences of support from the core team. Sri Lanka’s first study capturing experiences of diverse community-based researchers aims to inform the growing number of global health and development actors relying on such talent to deliver sensitive and emotionally difficult work in resource-limited and potentially volatile settings. Viewing participatory research as an opportunity for mutual learning among both community-based and professional researchers, we identify practice gaps and opportunities to foster respectful team dynamics and create generative and safe co-production projects for all parties. Intentional choices around communication, training, human and consumable resources, project design, and navigating instable research conditions can strengthen numerous personal and professional capacities across teams. Such individual and collective growth holds potential to benefit short- and long-term quality of evidence and inform action on critical issues, including violence and mental health, facing high-burden, low-resource contexts

    Advances in Information Security and Privacy

    Get PDF
    With the recent pandemic emergency, many people are spending their days in smart working and have increased their use of digital resources for both work and entertainment. The result is that the amount of digital information handled online is dramatically increased, and we can observe a significant increase in the number of attacks, breaches, and hacks. This Special Issue aims to establish the state of the art in protecting information by mitigating information risks. This objective is reached by presenting both surveys on specific topics and original approaches and solutions to specific problems. In total, 16 papers have been published in this Special Issue
    corecore