538 research outputs found

    Transfer and Inventory Components of Developing Repository Services

    Get PDF
    4th International Conference on Open RepositoriesThis presentation was part of the session : Conference PresentationsDate: 2009-05-19 10:00 AM – 11:30 AMAt the Library of Congress, our most basic data management needs are not surprising: How do we know what we have, where it is, and who it belongs to? How do we get files "new and legacy" from where they are to where they need to be? And how do we record and track events in the life cycle of our files? This presentation describes current work at the Library in implementing tools to meet these needs as a set of modular services -- Transfer, Transport, and Inventory -- that will fit into a larger scheme of repository services to be developed. These modular services do not equate to everything needed to call a system a repository. But this is a set of services that equate to many aspects of "ingest" and "archiving" the registry of a deposit activity, the controlled transfer and transport of files, and an inventory system that can be used to track files, record events in those files life cycles, and provide basic file-level discovery and auditing. This is the first stage in the development of a suite of tools to help the Library ensure long-term stewardship of its digital assets

    A personal data framework for exchanging knowledge about users in new financial services

    Get PDF
    Personal data is a key asset for many companies, since this is the essence in providing personalized services. Not all companies, and specifically new entrants to the markets, have the opportunity to access the data they need to run their business. In this paper, we describe a comprehensive personal data framework that allows service providers to share and exchange personal data and knowledge about users, while facilitating users to decide who can access which data and why. We analyze the challenges related to personal data collection, integration, retrieval, and identity and privacy management, and present the framework architecture that addresses them. We also include the validation of the framework in a banking scenario, where social and financial data is collected and properly combined to generate new socio-economic knowledge about users that is then used by a personal lending service

    Evaluating Security Aspects for Building a Secure Virtual Machine

    Get PDF
    One of the essential characteristics of cloud computing that revolutionized the IT business is the sharing of computing resources. Despite all the benefits, security is a major concern in a cloud virtualization environment. Among those security issues is securely managing the Virtual Machine (VM) images that contain operating systems, configured platforms, and data. Confidentiality, availability, and integrity of such images pose major concerns as it determines the overall security of the virtual machines. This paper identified and discussed the attributes that define the degree of security in VM images. It will address this problem by explaining the different methods and frameworks developed in the past to address implementing secure VM images. Finally, this paper analyses the security issues and attributes and proposes a framework that will include an approach that helps to develop secure VM images. This work aims to enhance the security of cloud environments

    Generator-Retriever-Generator: A Novel Approach to Open-domain Question Answering

    Full text link
    Open-domain question answering (QA) tasks usually require the retrieval of relevant information from a large corpus to generate accurate answers. We propose a novel approach called Generator-Retriever-Generator (GRG) that combines document retrieval techniques with a large language model (LLM), by first prompting the model to generate contextual documents based on a given question. In parallel, a dual-encoder network retrieves documents that are relevant to the question from an external corpus. The generated and retrieved documents are then passed to the second LLM, which generates the final answer. By combining document retrieval and LLM generation, our approach addresses the challenges of open-domain QA, such as generating informative and contextually relevant answers. GRG outperforms the state-of-the-art generate-then-read and retrieve-then-read pipelines (GENREAD and RFiD) improving their performance at least by +5.2, +4.2, and +1.6 on TriviaQA, NQ, and WebQ datasets, respectively. We provide code, datasets, and checkpoints \footnote{\url{https://github.com/abdoelsayed2016/GRG}

    Large Language Models for Information Retrieval: A Survey

    Full text link
    As a primary means of information acquisition, information retrieval (IR) systems, such as search engines, have integrated themselves into our daily lives. These systems also serve as components of dialogue, question-answering, and recommender systems. The trajectory of IR has evolved dynamically from its origins in term-based methods to its integration with advanced neural models. While the neural models excel at capturing complex contextual signals and semantic nuances, thereby reshaping the IR landscape, they still face challenges such as data scarcity, interpretability, and the generation of contextually plausible yet potentially inaccurate responses. This evolution requires a combination of both traditional methods (such as term-based sparse retrieval methods with rapid response) and modern neural architectures (such as language models with powerful language understanding capacity). Meanwhile, the emergence of large language models (LLMs), typified by ChatGPT and GPT-4, has revolutionized natural language processing due to their remarkable language understanding, generation, generalization, and reasoning abilities. Consequently, recent research has sought to leverage LLMs to improve IR systems. Given the rapid evolution of this research trajectory, it is necessary to consolidate existing methodologies and provide nuanced insights through a comprehensive overview. In this survey, we delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers. Additionally, we explore promising directions within this expanding field

    Parallel querying of distributed ontologies with shared vocabulary

    Get PDF
    Ontologies and various semantic repositories became a convenient approach for implementing model-driven architectures of distributed systems on the Web. SPARQL is the standard query language for querying such. However, although SPARQL is well-established standard for querying semantic repositories in RDF and OWL format and there are commonly used APIs which supports it, like Jena for Java, its parallel option is not incorporated in them. This article presents a complete framework consisting of an object algebra for parallel RDF and an index-based implementation of the parallel query engine capable of dealing with the distributed RDF ontologies which share common vocabulary. It has been implemented in Java, and for validation of the algorithms has been applied to the problem of organizing virtual exhibitions on the Web

    A methodology and a platform to measure and assess software windows of vulnerability

    Get PDF
    Nowadays, it is impossible not to recognize how software solutions have changed the world and the crucial role they play in our daily life. With their quick spread, especially in Cloud and Internet of Things contexts, security risks to which they are exposed have risen as well. Unfortunately, even if a lot of techniques have been realized to protect infrastructures from attackers, they are not enough to achieve truly secure systems. Therefore, since the price to pay for recovering from an outbreak can be enormous, organizations need a way to assess security of products they use. A useful and very overlooked metric that can be considered in this situations is the software window of vulnerability, which is the amount of time a software has been vulnerable to an attack. The main reason why this metric is often neglected is because the information required to compute it are provided by heterogeneous sources, and there is not a standard framework or at least a model that can simplify the task. Hence, the aim of this thesis will be filling this lack, at first by defining a model to evaluate software windows of vulnerability and then by implementing a platform able to compute this metric for software of different systems. Since keeping the approach general is not feasible outside of the theoretical model, the implementation step will necessarily require a system specific choice. Therefore, GNU/Linux systems were selected specifically for two reasons: their recent rise in popularity in the previously mentioned fields and their software management policy (which is based on package managers) that allows to find the data required by the analysis more easily
    corecore