935 research outputs found

    Vision of a Visipedia

    Get PDF
    The web is not perfect: while text is easily searched and organized, pictures (the vast majority of the bits that one can find online) are not. In order to see how one could improve the web and make pictures first-class citizens of the web, I explore the idea of Visipedia, a visual interface for Wikipedia that is able to answer visual queries and enables experts to contribute and organize visual knowledge. Five distinct groups of humans would interact through Visipedia: users, experts, editors, visual workers, and machine vision scientists. The latter would gradually build automata able to interpret images. I explore some of the technical challenges involved in making Visipedia happen. I argue that Visipedia will likely grow organically, combining state-of-the-art machine vision with human labor

    Design requirements for generating deceptive content to protect document repositories

    Get PDF
    For nearly 30 years, fake digital documents have been used to identify external intruders and malicious insider threats. Unfortunately, while fake files hold potential to assist in data theft detection, there is little evidence of their application outside of niche organisations and academic institutions. The barrier to wider adoption appears to be the difficulty in constructing deceptive content. The current generation of solutions principally: (1) use unrealistic random data; (2) output heavily formatted or specialised content, that is difficult to apply to other environments; (3) require users to manually build the content, which is not scalable, or (4) employ an existing production file, which creates a protection paradox. This paper introduces a set of requirements for generating automated fake file content: (1) enticing, (2) realistic, (3) minimise disruption, (4) adaptive, (5) scalable protective coverage, (6) minimise sensitive artefacts and copyright infringement, and (7) contain no distinguishable characteristics. These requirements have been drawn from literature on natural science, magical performances, human deceit, military operations, intrusion detection and previous fake file solutions. These requirements guide the design of an automated fake file content construction system, providing an opportunity for the next generation of solutions to find greater commercial application and widespread adoption

    A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks

    Full text link
    Software testing activities scrutinize the artifacts and the behavior of a software product to find possible defects and ensure that the product meets its expected requirements. Recently, Deep Reinforcement Learning (DRL) has been successfully employed in complex testing tasks such as game testing, regression testing, and test case prioritization to automate the process and provide continuous adaptation. Practitioners can employ DRL by implementing from scratch a DRL algorithm or using a DRL framework. DRL frameworks offer well-maintained implemented state-of-the-art DRL algorithms to facilitate and speed up the development of DRL applications. Developers have widely used these frameworks to solve problems in various domains including software testing. However, to the best of our knowledge, there is no study that empirically evaluates the effectiveness and performance of implemented algorithms in DRL frameworks. Moreover, some guidelines are lacking from the literature that would help practitioners choose one DRL framework over another. In this paper, we empirically investigate the applications of carefully selected DRL algorithms on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing. For the game testing task, we conduct experiments on a simple game and use DRL algorithms to explore the game to detect bugs. Results show that some of the selected DRL frameworks such as Tensorforce outperform recent approaches in the literature. To prioritize test cases, we run experiments on a CI environment where DRL algorithms from different frameworks are used to rank the test cases. Our results show that the performance difference between implemented algorithms in some cases is considerable, motivating further investigation.Comment: Accepted for publication at EMSE (Empirical Software Engineering journal) 202

    The AI Family: The Information Security Managers Best Frenemy?

    Get PDF
    In this exploratory study, we deliberately pull apart the Artificial from the Intelligence, the material from the human. We first assessed the existing technological controls available to Information Security Managers (ISMs) to ensure their in-depth defense strategies. Based on the AI watch taxonomy, we then discuss each of the 15 technologies and their potential impact on the transformation of jobs in the field of security (i.e., AI trainers, AI explainers and AI sustainers). Additionally, in a pilot study we collect the evaluation and the narratives of the employees (n=6) of a small financial institution in a focus group session. We particularly focus on their perception of the role of AI systems in the future of cyber security

    A Machine Learning-oriented Survey on Tiny Machine Learning

    Full text link
    The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused computing technologies (e.g., smart cities, automotive, and medical robotics). Given its multidisciplinary nature, the field of TinyML has been approached from many different angles: this comprehensive survey wishes to provide an up-to-date overview focused on all the learning algorithms within TinyML-based solutions. The survey is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodological flow, allowing for a systematic and complete literature survey. In particular, firstly we will examine the three different workflows for implementing a TinyML-based system, i.e., ML-oriented, HW-oriented, and co-design. Secondly, we propose a taxonomy that covers the learning panorama under the TinyML lens, examining in detail the different families of model optimization and design, as well as the state-of-the-art learning techniques. Thirdly, this survey will present the distinct features of hardware devices and software tools that represent the current state-of-the-art for TinyML intelligent edge applications. Finally, we discuss the challenges and future directions.Comment: Article currently under review at IEEE Acces

    Automating Society : Taking Stock of Automated Decision-Making in the EU

    Get PDF
    This is the first comprehensive study regarding the state of automated decision-making in Europe. Experts have looked at the situation at the EU level but also in 12 Member States: Belgium, Denmark, Finland, France, Germany, Italy, Netherlands Poland, Slovenia, Spain, Sweden and the UK. They assessed not only the political discussions and initiatives in these countries but also present a section "ADM in Action" for all states, listing examples of automated decision-making already in use

    Automating Society: Taking Stock of Automated Decision-Making in the EU. BertelsmannStiftung Studies 2019

    Get PDF
    Imagine you’re looking for a job. The company you are applying to says you can have a much easier application process if you provide them with your username and password for your personal email account. They can then just scan all your emails and develop a personality profile based on the result. No need to waste time filling out a boring questionnaire and, because it’s much harder to manipulate all your past emails than to try to give the ‘correct’ answers to a questionnaire, the results of the email scan will be much more accurate and truthful than any conventional personality profiling. Wouldn’t that be great? Everyone wins—the company looking for new personnel, because they can recruit people on the basis of more accurate profiles, you, because you save time and effort and don’t end up in a job you don’t like, and the company offering the profiling service because they have a cool new business model

    Towards a set of metrics to guide the generation of fake computer file systems

    Get PDF
    Fake file systems are used in the field of cyber deception to bait intruders and fool forensic investigators. File system researchers also frequently generate their own synthetic document repositories, due to data privacy and copyright concerns associated with experimenting on real-world corpora. For both these fields, realism is critical. Unfortunately, after creating a set of files and folders, there are no current testing standards that can be applied to validate their authenticity, or conversely, reliably automate their detection. This paper reviews the previous 30 years of file system surveys on real world corpora, to identify a set of discrete measures for generating synthetic file systems. Statistical distributions, such as size, age and lifetime of files, common file types, compression and duplication ratios, directory distribution and depth (and its relationship with numbers of files and sub-directories) were identified and the respective merits discussed. Additionally, this paper highlights notable absences in these surveys, which could be beneficial, such as analysing, on mass, the text content distribution, file naming habits, and comparing file access times against traditional working hours
    corecore