126 research outputs found
Machine Learning for Synthetic Data Generation: A Review
Data plays a crucial role in machine learning. However, in real-world
applications, there are several problems with data, e.g., data are of low
quality; a limited number of data points lead to under-fitting of the machine
learning model; it is hard to access the data due to privacy, safety and
regulatory concerns. Synthetic data generation offers a promising new avenue,
as it can be shared and used in ways that real-world data cannot. This paper
systematically reviews the existing works that leverage machine learning models
for synthetic data generation. Specifically, we discuss the synthetic data
generation works from several perspectives: (i) applications, including
computer vision, speech, natural language, healthcare, and business; (ii)
machine learning methods, particularly neural network architectures and deep
generative models; (iii) privacy and fairness issue. In addition, we identify
the challenges and opportunities in this emerging field and suggest future
research directions
2022 SDSU Data Science Symposium Presentation Abstracts
This document contains abstracts for presentations and posters 2022 SDSU Data Science Symposium
2022 SDSU Data Science Symposium Presentation Abstracts
This document contains abstracts for presentations and posters 2022 SDSU Data Science Symposium
2022 SDSU Data Science Symposium Program
https://openprairie.sdstate.edu/ds_symposium_programs/1003/thumbnail.jp
SoK: On the Impossible Security of Very Large Foundation Models
Large machine learning models, or so-called foundation models, aim to serve
as base-models for application-oriented machine learning. Although these models
showcase impressive performance, they have been empirically found to pose
serious security and privacy issues. We may however wonder if this is a
limitation of the current models, or if these issues stem from a fundamental
intrinsic impossibility of the foundation model learning problem itself. This
paper aims to systematize our knowledge supporting the latter. More precisely,
we identify several key features of today's foundation model learning problem
which, given the current understanding in adversarial machine learning, suggest
incompatibility of high accuracy with both security and privacy. We begin by
observing that high accuracy seems to require (1) very high-dimensional models
and (2) huge amounts of data that can only be procured through user-generated
datasets. Moreover, such data is fundamentally heterogeneous, as users
generally have very specific (easily identifiable) data-generating habits. More
importantly, users' data is filled with highly sensitive information, and maybe
heavily polluted by fake users. We then survey lower bounds on accuracy in
privacy-preserving and Byzantine-resilient heterogeneous learning that, we
argue, constitute a compelling case against the possibility of designing a
secure and privacy-preserving high-accuracy foundation model. We further stress
that our analysis also applies to other high-stake machine learning
applications, including content recommendation. We conclude by calling for
measures to prioritize security and privacy, and to slow down the race for ever
larger models.Comment: 13 page
Essays on behavioral responses to dishonest and anti-social decision making
The first essay examines how people respond to dishonest behavior that benefits them (but hurts a third party) by examining their trust towards the dishonest decision-maker. The results of a laboratory experiment do not provide conclusive evidence, with people trusting honest and dishonest decision-makers to a similar extent. Instead, trust is higher when the moral dilemma is completely avoided by chance and the highest payoff was obtained without having to lie. This implies that moral dilemmas are better prevented than cured. The second essay assesses the extent to which parents engage in norm compliance and norm enforcement in front of their children, with the aim of teaching them the importance of social norms. The field experiment shows that parents are more likely to punish a norm violation by a stranger in front of their children and that they are more likely to help a stranger in need in this case. As such, this essay documents that parents teach their children not simply by modeling desired behavior to them, but also by punishing undesirable behavior. Finally, the third essay examines how anti-social incentives, imposed by an employer, which encourage workers to harm an outside party to the benefit of the employer may backfire and hurt subsequent productivity of workers. In a laboratory experiment, subjects in the role of workers punish employers who imposed antisocial incentives by performing worse in a task that benefits the employer. This effect disappears when the employer had no control over the incentives imposed on the worker. Together, these results show that workers negatively reciprocate the psychological costs imposed on them intentionally and exemplify the importance of organizational leadership
Cultivating capacities in community-based researchers in low-resource settings: Lessons from a participatory study on violence and mental health in Sri Lanka
Participatory methods, which rely heavily on community-based data collectors, are growing in popularity to deliver much-needed evidence on violence and mental health in low- and middle-income countries. These settings, along with local researchers, encounter the highest burden of violence and mental ill-health, with the fewest resources to respond. Despite increased focus on wellbeing for research participants and, to a lesser degree, professional researchers in such studies, the role-specific needs of community-based researchers receive scant attention. This co-produced paper draws insights from one group’s experience to identify rewards, challenges, and recommendations for supporting wellbeing and development of community-based researchers in sensitive participatory projects in low-resource settings. Twenty-one community-based researchers supporting a mixed-methods study on youth, violence and mental health in Sri Lanka submitted 63 reflexive structured journal entries across three rounds of data collection. We applied Attride-Stirling’s method for thematic analysis to explore peer researchers’ learning about research, violence and mental health; personal-professional boundaries; challenges in sensitive research; and experiences of support from the core team. Sri Lanka’s first study capturing experiences of diverse community-based researchers aims to inform the growing number of global health and development actors relying on such talent to deliver sensitive and emotionally difficult work in resource-limited and potentially volatile settings. Viewing participatory research as an opportunity for mutual learning among both community-based and professional researchers, we identify practice gaps and opportunities to foster respectful team dynamics and create generative and safe co-production projects for all parties. Intentional choices around communication, training, human and consumable resources, project design, and navigating instable research conditions can strengthen numerous personal and professional capacities across teams. Such individual and collective growth holds potential to benefit short- and long-term quality of evidence and inform action on critical issues, including violence and mental health, facing high-burden, low-resource contexts
Advances in Information Security and Privacy
With the recent pandemic emergency, many people are spending their days in smart working and have increased their use of digital resources for both work and entertainment. The result is that the amount of digital information handled online is dramatically increased, and we can observe a significant increase in the number of attacks, breaches, and hacks. This Special Issue aims to establish the state of the art in protecting information by mitigating information risks. This objective is reached by presenting both surveys on specific topics and original approaches and solutions to specific problems. In total, 16 papers have been published in this Special Issue
- …