111 research outputs found

    Prompting AI Art: An Investigation into the Creative Skill of Prompt Engineering

    Full text link
    Humankind is entering a novel era of creativity - an era in which anybody can synthesize digital content. The paradigm under which this revolution takes place is prompt-based learning (or in-context learning). This paradigm has found fruitful application in text-to-image generation where it is being used to synthesize digital images from zero-shot text prompts in natural language for the purpose of creating AI art. This activity is referred to as prompt engineering - the practice of iteratively crafting prompts to generate and improve images. In this paper, we investigate prompt engineering as a novel creative skill for creating prompt-based art. In three studies with participants recruited from a crowdsourcing platform, we explore whether untrained participants could 1) recognize the quality of prompts, 2) write prompts, and 3) improve their prompts. Our results indicate that participants could assess the quality of prompts and respective images. This ability increased with the participants' experience and interest in art. Participants further were able to write prompts in rich descriptive language. However, even though participants were specifically instructed to generate artworks, participants' prompts were missing the specific vocabulary needed to apply a certain style to the generated images. Our results suggest that prompt engineering is a learned skill that requires expertise and practice. Based on our findings and experience with running our studies with participants recruited from a crowdsourcing platform, we provide ten recommendations for conducting experimental research on text-to-image generation and prompt engineering with a paid crowd. Our studies offer a deeper understanding of prompt engineering thereby opening up avenues for research on the future of prompt engineering. We conclude by speculating on four possible futures of prompt engineering.Comment: 29 pages, 10 figure

    How can a foundation be outlined for a successful serious game to increase reading engagement

    Get PDF
    This study aims to support a mandatory reading of a novella for high school students by a serious game. The study includes 41 students. The first class is included in the experimental study, which uses a serious game to read the novella. The second class served as the control group and engaged only in analog reading. The evaluation is based on a questionnaire with reading, user, and narrative engagement items. Furthermore, the assessment consists of in-depth interviews with teachers and students. The findings positively affected students’ engagement in the experimental group. Primarily focused attention and reward are higher in the experimental group. However, there was no difference in the narrative engagement between the two groups, indicating that the story (digital or not) is well explained. The qualitative findings revealed positive comments, especially for the reading engagement and the story world. The novelty in this study is the outlined game design process, guided by elements in the foundation, game design, prototyping, and implementation. For the game design, we outlined how to transform the principles from Sweetser and Wyeth to applied design implementations. An important aspect was to illustrate the protagonist with schizophrenia

    Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques

    Full text link
    The growing use of voice user interfaces has led to a surge in the collection and storage of speech data. While data collection allows for the development of efficient tools powering most speech services, it also poses serious privacy issues for users as centralized storage makes private personal speech data vulnerable to cyber threats. With the increasing use of voice-based digital assistants like Amazon's Alexa, Google's Home, and Apple's Siri, and with the increasing ease with which personal speech data can be collected, the risk of malicious use of voice-cloning and speaker/gender/pathological/etc. recognition has increased. This thesis proposes solutions for anonymizing speech and evaluating the degree of the anonymization. In this work, anonymization refers to making personal speech data unlinkable to an identity while maintaining the usefulness (utility) of the speech signal (e.g., access to linguistic content). We start by identifying several challenges that evaluation protocols need to consider to evaluate the degree of privacy protection properly. We clarify how anonymization systems must be configured for evaluation purposes and highlight that many practical deployment configurations do not permit privacy evaluation. Furthermore, we study and examine the most common voice conversion-based anonymization system and identify its weak points before suggesting new methods to overcome some limitations. We isolate all components of the anonymization system to evaluate the degree of speaker PPI associated with each of them. Then, we propose several transformation methods for each component to reduce as much as possible speaker PPI while maintaining utility. We promote anonymization algorithms based on quantization-based transformation as an alternative to the most-used and well-known noise-based approach. Finally, we endeavor a new attack method to invert anonymization.Comment: PhD Thesis Pierre Champion | Universit\'e de Lorraine - INRIA Nancy | for associated source code, see https://github.com/deep-privacy/SA-toolki

    Security and Privacy on Generative Data in AIGC: A Survey

    Full text link
    The advent of artificial intelligence-generated content (AIGC) represents a pivotal moment in the evolution of information technology. With AIGC, it can be effortless to generate high-quality data that is challenging for the public to distinguish. Nevertheless, the proliferation of generative data across cyberspace brings security and privacy issues, including privacy leakages of individuals and media forgery for fraudulent purposes. Consequently, both academia and industry begin to emphasize the trustworthiness of generative data, successively providing a series of countermeasures for security and privacy. In this survey, we systematically review the security and privacy on generative data in AIGC, particularly for the first time analyzing them from the perspective of information security properties. Specifically, we reveal the successful experiences of state-of-the-art countermeasures in terms of the foundational properties of privacy, controllability, authenticity, and compliance, respectively. Finally, we summarize the open challenges and potential exploration directions from each of theses properties

    Repeatable and reusable research - Exploring the needs of users for a Data Portal for Disease Phenotyping

    Get PDF
    Background: Big data research in the field of health sciences is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means that researchers and health professionals often have different phenotype definitions for the same condition. This lack of agreement makes it hard to compare different study findings and hinders the ability to conduct repeatable and reusable research. Objective: This thesis aims to examine the requirements of various users, such as researchers, clinicians, machine learning experts, and managers, for both new and existing data portals for phenotypes (concept libraries). Methods: Exploratory sequential mixed methods were used in this thesis to look at which concept libraries are available, how they are used, what their characteristics are, where there are gaps, and what needs to be done in the future from the point of view of the people who use them. This thesis consists of three phases: 1) two qualitative studies, including one-to-one interviews with researchers, clinicians, machine learning experts, and senior research managers in health data science, as well as focus group discussions with researchers working with the Secured Anonymized Information Linkage databank, 2) the creation of an email survey (i.e., the Concept Library Usability Scale), and 3) a quantitative study with researchers, health professionals, and clinicians. Results: Most of the participants thought that the prototype concept library would be a very helpful resource for conducting repeatable research, but they specified that many requirements are needed before its development. Although all the participants stated that they were aware of some existing concept libraries, most of them expressed negative perceptions about them. The participants mentioned several facilitators that would encourage them to: 1) share their work, such as receiving citations from other researchers; and 2) reuse the work of others, such as saving a lot of time and effort, which they frequently spend on creating new code lists from scratch. They also pointed out several barriers that could inhibit them from: 1) sharing their work, such as concerns about intellectual property (e.g., if they shared their methods before publication, other researchers would use them as their own); and 2) reusing others' work, such as a lack of confidence in the quality and validity of their code lists. Participants suggested some developments that they would like to see happen in order to make research that is done with routine data more reproducible, such as the availability of a drive for more transparency in research methods documentation, such as publishing complete phenotype definitions and clear code lists. Conclusions: The findings of this thesis indicated that most participants valued a concept library for phenotypes. However, only half of the participants felt that they would contribute by providing definitions for the concept library, and they reported many barriers regarding sharing their work on a publicly accessible platform such as the CALIBER research platform. Analysis of interviews, focus group discussions, and qualitative studies revealed that different users have different requirements, facilitators, barriers, and concerns about concept libraries. This work was to investigate if we should develop concept libraries in Kuwait to facilitate the development of improved data sharing. However, at the end of this thesis the recommendation is this would be unlikely to be cost effective or highly valued by users and investment in open access research publications may be of more value to the Kuwait research/academic community

    Guideline for Trustworthy Artificial Intelligence -- AI Assessment Catalog

    Full text link
    Artificial Intelligence (AI) has made impressive progress in recent years and represents a key technology that has a crucial impact on the economy and society. However, it is clear that AI and business models based on it can only reach their full potential if AI applications are developed according to high quality standards and are effectively protected against new AI risks. For instance, AI bears the risk of unfair treatment of individuals when processing personal data e.g., to support credit lending or staff recruitment decisions. The emergence of these new risks is closely linked to the fact that the behavior of AI applications, particularly those based on Machine Learning (ML), is essentially learned from large volumes of data and is not predetermined by fixed programmed rules. Thus, the issue of the trustworthiness of AI applications is crucial and is the subject of numerous major publications by stakeholders in politics, business and society. In addition, there is mutual agreement that the requirements for trustworthy AI, which are often described in an abstract way, must now be made clear and tangible. One challenge to overcome here relates to the fact that the specific quality criteria for an AI application depend heavily on the application context and possible measures to fulfill them in turn depend heavily on the AI technology used. Lastly, practical assessment procedures are needed to evaluate whether specific AI applications have been developed according to adequate quality standards. This AI assessment catalog addresses exactly this point and is intended for two target groups: Firstly, it provides developers with a guideline for systematically making their AI applications trustworthy. Secondly, it guides assessors and auditors on how to examine AI applications for trustworthiness in a structured way

    Breaking data silos with Federated Learning

    Get PDF
    Federated learning has been recognized as a promising technology with the potential to revolutionize the field of Artificial Intelligence (AI). By leveraging its decentralized nature, it has the potential to overcome known barriers to AI, such as data acquisition and privacy, paving the way for unprecedented advances in AI. This dissertation argues the benefits of this technology as a catalyst for the irruption of AI both in the public and private sector. Federated learning promotes cooperation among otherwise competitive entities by enabling cooperative efforts to achieve a common goal. In this dissertation, I investigate the goodness-of-fit of this technology in several contexts, with a focus on its application in power systems, financial institutions, and public administrations. The dissertation comprises five papers that investigate various aspects of federated learning in the aforementioned contexts. In particular, the first two papers explore promising venues in the energy sector, where federated learning offers a compelling solution to privately exploit the vast amounts of data and decentralized ownership of data by consumers. The third paper elaborates on another paradigmatic example, in which federated learning is used to foster cooperation among financial institutions to produce accurate credit risk models. The fourth paper makes a juxtaposition with the previous ones centered on the private sector. It elaborates on the use cases of federated learning for public administrations to reduce barriers to cooperation. Lastly, the fifth and last article acts as a finale of this dissertation, compiles the earlier work and elaborates on the constraints and opportunities associated with adopting this technology, as well as a framework for doing so.R-AGR-3787 - EU 2020 - MDOT (01/07/2020 - 31/12/2023) - FRIDGEN GilbertR-AGR-3728 - PEARL/IS/13342933/DFS (01/01/2020 - 31/12/2024) - FRIDGEN Gilber

    Detection and Mitigation of Steganographic Malware

    Get PDF
    A new attack trend concerns the use of some form of steganography and information hiding to make malware stealthier and able to elude many standard security mechanisms. Therefore, this Thesis addresses the detection and the mitigation of this class of threats. In particular, it considers malware implementing covert communications within network traffic or cloaking malicious payloads within digital images. The first research contribution of this Thesis is in the detection of network covert channels. Unfortunately, the literature on the topic lacks of real traffic traces or attack samples to perform precise tests or security assessments. Thus, a propaedeutic research activity has been devoted to develop two ad-hoc tools. The first allows to create covert channels targeting the IPv6 protocol by eavesdropping flows, whereas the second allows to embed secret data within arbitrary traffic traces that can be replayed to perform investigations in realistic conditions. This Thesis then starts with a security assessment concerning the impact of hidden network communications in production-quality scenarios. Results have been obtained by considering channels cloaking data in the most popular protocols (e.g., TLS, IPv4/v6, and ICMPv4/v6) and showcased that de-facto standard intrusion detection systems and firewalls (i.e., Snort, Suricata, and Zeek) are unable to spot this class of hazards. Since malware can conceal information (e.g., commands and configuration files) in almost every protocol, traffic feature or network element, configuring or adapting pre-existent security solutions could be not straightforward. Moreover, inspecting multiple protocols, fields or conversations at the same time could lead to performance issues. Thus, a major effort has been devoted to develop a suite based on the extended Berkeley Packet Filter (eBPF) to gain visibility over different network protocols/components and to efficiently collect various performance indicators or statistics by using a unique technology. This part of research allowed to spot the presence of network covert channels targeting the header of the IPv6 protocol or the inter-packet time of generic network conversations. In addition, the approach based on eBPF turned out to be very flexible and also allowed to reveal hidden data transfers between two processes co-located within the same host. Another important contribution of this part of the Thesis concerns the deployment of the suite in realistic scenarios and its comparison with other similar tools. Specifically, a thorough performance evaluation demonstrated that eBPF can be used to inspect traffic and reveal the presence of covert communications also when in the presence of high loads, e.g., it can sustain rates up to 3 Gbit/s with commodity hardware. To further address the problem of revealing network covert channels in realistic environments, this Thesis also investigates malware targeting traffic generated by Internet of Things devices. In this case, an incremental ensemble of autoencoders has been considered to face the ''unknown'' location of the hidden data generated by a threat covertly exchanging commands towards a remote attacker. The second research contribution of this Thesis is in the detection of malicious payloads hidden within digital images. In fact, the majority of real-world malware exploits hiding methods based on Least Significant Bit steganography and some of its variants, such as the Invoke-PSImage mechanism. Therefore, a relevant amount of research has been done to detect the presence of hidden data and classify the payload (e.g., malicious PowerShell scripts or PHP fragments). To this aim, mechanisms leveraging Deep Neural Networks (DNNs) proved to be flexible and effective since they can learn by combining raw low-level data and can be updated or retrained to consider unseen payloads or images with different features. To take into account realistic threat models, this Thesis studies malware targeting different types of images (i.e., favicons and icons) and various payloads (e.g., URLs and Ethereum addresses, as well as webshells). Obtained results showcased that DNNs can be considered a valid tool for spotting the presence of hidden contents since their detection accuracy is always above 90% also when facing ''elusion'' mechanisms such as basic obfuscation techniques or alternative encoding schemes. Lastly, when detection or classification are not possible (e.g., due to resource constraints), approaches enforcing ''sanitization'' can be applied. Thus, this Thesis also considers autoencoders able to disrupt hidden malicious contents without degrading the quality of the image

    Autonomy, Efficiency, Privacy and Traceability in Blockchain-enabled IoT Data Marketplace

    Full text link
    Personal data generated from IoT devices is a new economic asset that individuals can trade to generate revenue on the emerging data marketplaces. Blockchain technology can disrupt the data marketplace and make trading more democratic, trustworthy, transparent and secure. Nevertheless, the adoption of blockchain to create an IoT data marketplace requires consideration of autonomy and efficiency, privacy, and traceability. Conventional centralized approaches are built around a trusted third party that conducts and controls all management operations such as managing contracts, pricing, billing, reputation mechanisms etc, raising concern that providers lose control over their data. To tackle this issue, an efficient, autonomous and fully-functional marketplace system is needed, with no trusted third party involved in operational tasks. Moreover, an inefficient allocation of buyers’ demands on battery-operated IoT devices poses a challenge for providers to serve multiple buyers’ demands simultaneously in real-time without disrupting their SLAs (service level agreements). Furthermore, a poor privacy decision to make personal data accessible to unknown or arbitrary buyers may have adverse consequences and privacy violations for providers. Lastly, a buyer could buy data from one marketplace and without the knowledge of the provider, resell bought data to users registered in other marketplaces. This may either lead to monetary loss or privacy violation for the provider. To address such issues, a data ownership traceability mechanism is essential that can track the change in ownership of data due to its trading within and across marketplace systems. However, data ownership traceability is hard because of ownership ambiguity, undisclosed reselling, and dispersal of ownership across multiple marketplaces. This thesis makes the following novel contributions. First, we propose an autonomous and efficient IoT data marketplace, MartChain, offering key mechanisms for a marketplace leveraging smart contracts to record agreement details, participant ratings, and data prices in blockchain without involving any mediator. Second, MartChain is underpinned by an Energy-aware Demand Selection and Allocation (EDSA) mechanism for optimally selecting and allocating buyers' demands on provider’s IoT devices while satisfying the battery, quality and allocation constraints. EDSA maximizes the revenue of the provider while meeting the buyers’ requirements and ensuring the completion of the selected demands without any interruptions. The proof-of-concept implementation on the Ethereum blockchain shows that our approach is viable and benefits the provider and buyer by creating an autonomous and efficient real-time data trading model. Next, we propose KYBChain, a Know-Your-Buyer in the privacy-aware decentralized IoT data marketplace that performs a multi-faceted assessment of various characteristics of buyers and evaluates their privacy rating. Privacy rating empowers providers to make privacy-aware informed decisions about data sharing. Quantitative analysis to evaluate the utility of privacy rating demonstrates that the use of privacy rating by the providers results in a decrease of data leakage risk and generated revenue, correlating with the classical risk-utility trade-off. Evaluation results of KYBChain on Ethereum reveal that the overheads in terms of gas consumption, throughput and latency introduced by our privacy rating mechanism compared to a marketplace that does not incorporate a privacy rating system are insignificant relative to its privacy gains. Finally, we propose TrailChain which generates a trusted trade trail for tracking the data ownership spanning multiple decentralized marketplaces. Our solution includes mechanisms for detecting any unauthorized data reselling to prevent privacy violations and a fair resell payment sharing scheme to distribute payment among data owners for authorized reselling. We performed qualitative and quantitative evaluations to demonstrate the effectiveness of TrailChain in tracking data ownership using four private Ethereum networks. Qualitative security analysis demonstrates that TrailChain is resilient against several malicious activities and security attacks. Simulations show that our method detects undisclosed reselling within the same marketplace and across different marketplaces. Besides, it also identifies whether the provider has authorized the reselling and fairly distributes the revenue among the data owners at marginal overhead

    From BookTok to Bookshelf: Algorithms and Book Recommendations on TikTok

    Get PDF
    TikTok, a social media platform focused on short-form videos, is gaining a reputation for renewing interest in books (Bateman 2022; Harris 2021). While reviewing and recommending books is not new, the ability to do so on a large scale used to be limited to a select group of critics. Social media allows readers to voice their opinions, and by gaining followings these readers can then influence at a similar scale as traditional reviewers. This raises various questions as to how culture is created and curated. Today, this curation is done largely by algorithms through recommending and promoting content. The rise of BookTok emphasizes this, combining recommendations with TikTok’s algorithm to boost the popularity of certain books. In particular, BookTok has made headlines by repeatedly raising backlist books back onto the bestseller lists. This increases the shift from traditional curators of culture to a community of fellow readers, which can in turn popularize specific genres. Thus, the main question this thesis aims to answer is: what distinguishes BookTok from other digital platforms, enabling it to have such a cultural impact going beyond the online book community? The BookTok phenomenon will be explained by using a mixed-method approach looking at how creators use platform affordances, aesthetic features, and their algorithmic imaginaries to appeal to both users and the TikTok algorithm. The data used in this thesis consists of 148 BookTok videos gathered over a two-week period from the “For You” page. A content analysis was conducted to find patterns in the construction of the videos, the use of specific aesthetic features, and the selection of recommended book titles. Based on this data, it was possible to detect and describe different genres of BookTok videos and to identify the use of relevant platform affordances. This was complemented by a thematic analysis of interviews with three video creators, selected from the authors of the material in the dataset. The interviews gave insight into the algorithmic imaginary of the creators and how the construction of the algorithm informs the creative process. The analysis showed that while the algorithm is what makes the recommendations popular by distributing them to a receptive audience, the TikTok format is what makes the recommendations memorable and has a positive impact on book sales. As the algorithm informs every aspect of the book recommendations, from the creator’s decisions of picking a certain book to the decisions on when to make the video and who the algorithm subsequently recommends the video to, the book recommendations on BookTok can be examined as examples of algorithmic curation. By taking up the topic of literature and literary readership from a digital culture perspective, this thesis aims to contribute to the greater discussions on algorithms, personalization, and its’ effect on cultural production and curation.Master's Thesis in Digital CultureDIKULT350MAHF-DIKU
    • …
    corecore