5,877 research outputs found

    The Role of Evidence in Establishing Trust in Repositories

    Get PDF
    This article arises from work by the Digital Curation Centre (DCC) Working Group examining mechanisms to roll out audit and certification services for digital repositories in the United Kingdom. Our attempt to develop a program for applying audit and certification processes and tools took as its starting point the RLG-NARA Audit Checklist for Certifying Digital Repositories. Our intention was to appraise critically the checklist and conceive a means of applying its mechanics within a diverse range of repository environments. We were struck by the realization that while a great deal of effort has been invested in determining the characteristics of a 'trusted digital repository', far less effort has concentrated on the ways in which the presence of the attributes can be demonstrated and their qualities measured. With this in mind we sought to explore the role of evidence within the certification process, and to identify examples of the types of evidence (e.g., documentary, observational, and testimonial) that might be desirable during the course of a repository audit.

    Bringing self assessment home: repository profiling and key lines of enquiry within DRAMBORA

    Get PDF
    Digital repositories are a manifestation of complex organizational, financial, legal, technological, procedural, and political interrelationships. Accompanying each of these are innate uncertainties, exacerbated by the relative immaturity of understanding prevalent within the digital preservation domain. Recent efforts have sought to identify core characteristics that must be demonstrable by successful digital repositories, expressed in the form of check-list documents, intended to support the processes of repository accreditation and certification. In isolation though, the available guidelines lack practical applicability; confusion over evidential requirements and difficulties associated with the diversity that exists among repositories (in terms of mandate, available resources, supported content and legal context) are particularly problematic. A gap exists between the available criteria and the ways and extent to which conformity can be demonstrated. The Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) is a methodology for undertaking repository self assessment, developed jointly by the Digital Curation Centre (DCC) and DigitalPreservationEurope (DPE). DRAMBORA requires repositories to expose their organization, policies and infrastructures to rigorous scrutiny through a series of highly structured exercises, enabling them to build a comprehensive registry of their most pertinent risks, arranged into a structure that facilitates effective management. It draws on experiences accumulated throughout 18 evaluative pilot assessments undertaken in an internationally diverse selection of repositories, digital libraries and data centres (including institutions and services such as the UK National Digital Archive of Datasets, the National Archives of Scotland, Gallica at the National Library of France and the CERN Document Server). Other organizations, such as the British Library, have been using sections of DRAMBORA within their own risk assessment procedures. Despite the attractive benefits of a bottom up approach, there are implicit challenges posed by neglecting a more objective perspective. Following a sustained period of pilot audits undertaken by DPE, DCC and the DELOS Digital Preservation Cluster aimed at evaluating DRAMBORA, it was stated that had respective project members not been present to facilitate each assessment, and contribute their objective, external perspectives, the results may have been less useful. Consequently, DRAMBORA has developed in a number of ways, to enable knowledge transfer from the responses of comparable repositories, and incorporate more opportunities for structured question sets, or key lines of enquiry, that provoke more comprehensive awareness of the applicability of particular threats and opportunities

    The Need of an Optimal QoS Repository and Assessment Framework in Forming a Trusted Relationship in Cloud: A Systematic Review

    Full text link
    © 2017 IEEE. Due to the cost-effectiveness and scalable features of the cloud the demand of its services is increasing every next day. Quality of Service (QOS) is one of the crucial factor in forming a viable Service Level Agreement (SLA) between a consumer and the provider that enable them to establish and maintain a trusted relationship with each other. SLA identifies and depicts the service requirements of the user and the level of service promised by provider. Availability of enormous service solutions is troublesome for cloud users in selecting the right service provider both in terms of price and the degree of promised services. On the other end a service provider need a centralized and reliable QoS repository and assessment framework that help them in offering an optimal amount of marginal resources to requested consumer. Although there are number of existing literatures that assist the interaction parties to achieve their desired goal in some way, however, there are still many gaps that need to be filled for establishing and maintaining a trusted relationship between them. In this paper we tried to identify all those gaps that is necessary for a trusted relationship between a service provider and service consumer. The aim of this research is to present an overview of the existing literature and compare them based on different criteria such as QoS integration, QoS repository, QoS filtering, trusted relationship and an SLA

    Experience: Quality benchmarking of datasets used in software effort estimation

    Get PDF
    Data is a cornerstone of empirical software engineering (ESE) research and practice. Data underpin numerous process and project management activities, including the estimation of development effort and the prediction of the likely location and severity of defects in code. Serious questions have been raised, however, over the quality of the data used in ESE. Data quality problems caused by noise, outliers, and incompleteness have been noted as being especially prevalent. Other quality issues, although also potentially important, have received less attention. In this study, we assess the quality of 13 datasets that have been used extensively in research on software effort estimation. The quality issues considered in this article draw on a taxonomy that we published previously based on a systematic mapping of data quality issues in ESE. Our contributions are as follows: (1) an evaluation of the “fitness for purpose” of these commonly used datasets and (2) an assessment of the utility of the taxonomy in terms of dataset benchmarking. We also propose a template that could be used to both improve the ESE data collection/submission process and to evaluate other such datasets, contributing to enhanced awareness of data quality issues in the ESE community and, in time, the availability and use of higher-quality datasets

    Assessing the Sustainability and Trustworthiness of Federated Learning Models

    Full text link
    Artificial intelligence (AI) plays a pivotal role in various sectors, influencing critical decision-making processes in our daily lives. Within the AI landscape, novel AI paradigms, such as Federated Learning (FL), focus on preserving data privacy while collaboratively training AI models. In such a context, a group of experts from the European Commission (AI-HLEG) has identified sustainable AI as one of the key elements that must be considered to provide trustworthy AI. While existing literature offers several taxonomies and solutions for assessing the trustworthiness of FL models, a significant gap exists in considering sustainability and the carbon footprint associated with FL. Thus, this work introduces the sustainability pillar to the most recent and comprehensive trustworthy FL taxonomy, making this work the first to address all AI-HLEG requirements. The sustainability pillar assesses the FL system environmental impact, incorporating notions and metrics for hardware efficiency, federation complexity, and energy grid carbon intensity. Then, this work designs and implements an algorithm for evaluating the trustworthiness of FL models by incorporating the sustainability pillar. Extensive evaluations with the FederatedScope framework and various scenarios varying federation participants, complexities, hardware, and energy grids demonstrate the usefulness of the proposed solution

    A Framework for Hybrid Intrusion Detection Systems

    Get PDF
    Web application security is a definite threat to the world’s information technology infrastructure. The Open Web Application Security Project (OWASP), generally defines web application security violations as unauthorized or unintentional exposure, disclosure, or loss of personal information. These breaches occur without the company’s knowledge and it often takes a while before the web application attack is revealed to the public, specifically because the security violations are fixed. Due to the need to protect their reputation, organizations have begun researching solutions to these problems. The most widely accepted solution is the use of an Intrusion Detection System (IDS). Such systems currently rely on either signatures of the attack used for the data breach or changes in the behavior patterns of the system to identify an intruder. These systems, either signature-based or anomaly-based, are readily understood by attackers. Issues arise when attacks are not noticed by an existing IDS because the attack does not fit the pre-defined attack signatures the IDS is implemented to discover. Despite current IDSs capabilities, little research has identified a method to detect all potential attacks on a system. This thesis intends to address this problem. A particular emphasis will be placed on detecting advanced attacks, such as those that take place at the application layer. These types of attacks are able to bypass existing IDSs, increase the potential for a web application security breach to occur and not be detected. In particular, the attacks under study are all web application layer attacks. Those included in this thesis are SQL injection, cross-site scripting, directory traversal and remote file inclusion. This work identifies common and existing data breach detection methods as well as the necessary improvements for IDS models. Ultimately, the proposed approach combines an anomaly detection technique measured by cross entropy and a signature-based attack detection framework utilizing genetic algorithm. The proposed hybrid model for data breach detection benefits organizations by increasing security measures and allowing attacks to be identified in less time and more efficiently

    Mining and visualizing uncertain data objects and named data networking traffics by fuzzy self-organizing map

    Get PDF
    Uncertainty is widely spread in real-world data. Uncertain data-in computer science-is typically found in the area of sensor networks where the sensors sense the environment with certain error. Mining and visualizing uncertain data is one of the new challenges that face uncertain databases. This paper presents a new intelligent hybrid algorithm that applies fuzzy set theory into the context of the Self-Organizing Map to mine and visualize uncertain objects. The algorithm is tested in some benchmark problems and the uncertain traffics in Named Data Networking (NDN). Experimental results indicate that the proposed algorithm is precise and effective in terms of the applied performance criteria.Peer ReviewedPostprint (published version

    Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey

    Full text link
    Modern language models (LMs) have been successfully employed in source code generation and understanding, leading to a significant increase in research focused on learning-based code intelligence, such as automated bug repair, and test case generation. Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls, which hinder realistic performance and further impact their reliability and applicability in real-world deployment. Such challenges drive the need for a comprehensive understanding - not just identifying these issues but delving into their possible implications and existing solutions to build more reliable language models tailored to code intelligence. Based on a well-defined systematic research approach, we conducted an extensive literature review to uncover the pitfalls inherent in LM4Code. Finally, 67 primary studies from top-tier venues have been identified. After carefully examining these studies, we designed a taxonomy of pitfalls in LM4Code research and conducted a systematic study to summarize the issues, implications, current solutions, and challenges of different pitfalls for LM4Code systems. We developed a comprehensive classification scheme that dissects pitfalls across four crucial aspects: data collection and labeling, system design and learning, performance evaluation, and deployment and maintenance. Through this study, we aim to provide a roadmap for researchers and practitioners, facilitating their understanding and utilization of LM4Code in reliable and trustworthy ways

    Designing and Implementing an Advanced Algorithm to Measure the Trustworthiness Level of Federated Learning Models

    Get PDF
    Artificial intelligence (AI) has immersed our daily lives and assists in the decision process of critical sectors such as medicine and law. Therefore it is now more important than ever before that AI systems developed are reliable, ethical, and do not cause harm to humans. The High-Level Expert Group on AI (AI-HLEG) of the European Commission has laid the foundation by defining seven key requirements for trustworthy AI systems. To address concerns about privacy risks associated with centralized learning approaches federated learning (FL) has emerged as a promising and widely used alternative. FL allows multiple clients to collaboratively train machine learning models without the need for sharing private data. Because of the high adaption of FL systems, ensuring that they are trustworthy is crucial. Previous research efforts have proposed a trustworthy FL taxonomy with six pillars, each comprehensively defined with notions and metrics. This taxonomy covers six of the seven requirements defined by the AI-HLEG. However, one notable aspect that has been largely overlooked by research is the requirement for environmental well-being in trustworthy AI/FL. This leaves a significant gap between the expectations set by governing bodies and the guidelines applied and measured by researchers. This master thesis addresses this gap by introducing the sustainability pillar to the trustworthy FL taxonomy and thus presenting the first taxonomy that comprehensively addresses all the requirements defined by the AI-HLEG. The sustainability pillar focuses on assessing the environmental impact of FL systems and incorporates three main aspects: hardware efficiency, federation complexity, and the carbon intensity of the energy grid, each with well-defined metrics. As a second contribution, this master thesis extends an existing prototype to evaluate the trustworthiness of FL systems with the sustainability pillar. The prototype is then extensively evaluated in various scenarios, involving different federation configurations. The results shed light on the trustworthiness of different federation configurations in different settings with varying complexities, hardware, and energy grids used. Importantly, the sustainability pillar’s score corrects the overall trust score by considering the environmental impact of FL systems across seven key pillars. Thus, the proposed taxonomy and prototype are the first to comprehensively address all seven AI-HLEG requirements and lay the foundation for a more accurate trustworthiness assessment of FL systems
    • …
    corecore