57 research outputs found

    A benchmark of Spanish language datasets for computationally driven research

    Get PDF
    In the domain of Galleries, Libraries, Archives and Museums (GLAM) institutions, creative and innovative tools and methodologies for content delivery and user engagement have recently gained international attention. New methods have been proposed to publish digital collections as datasets amenable to computational use. Standardised benchmarks can be useful to broaden the scope of machine-actionable collections and to promote cultural and linguistic diversity. In this article, we propose a methodology to select datasets for computationally driven research applied to Spanish text corpora. This work seeks to encourage Spanish and Latin American institutions to publish machine-actionable collections based on best practices and avoiding common mistakes.This research has been funded by the AETHER-UA (PID2020-112540RB-C43) Project from the Spanish Ministry of Science and Innovation

    A Backend Platform for Supporting the Reproducibility of Computational Experiments

    Full text link
    In recent years, the research community has raised serious questions about the reproducibility of scientific work. In particular, since many studies include some kind of computing work, reproducibility is also a technological challenge, not only in computer science, but in most research domains. Replicability and computational reproducibility are not easy to achieve, not only because researchers have diverse proficiency in computing technologies, but also because of the variety of computational environments that can be used. Indeed, it is challenging to recreate the same environment using the same frameworks, code, data sources, programming languages, dependencies, and so on. In this work, we propose an Integrated Development Environment allowing the share, configuration, packaging and execution of an experiment by setting the code and data used and defining the programming languages, code, dependencies, databases, or commands to execute to achieve consistent results for each experiment. After the initial creation and configuration, the experiment can be executed any number of times, always producing exactly the same results. Furthermore, it allows the execution of the experiment by using a different associated dataset, and it can be possible to verify the reproducibility and replicability of the results. This allows the creation of a reproducible pack that can be re-executed by anyone on any other computer. Our platform aims to allow researchers in any field to create a reproducibility package for their science that can be re-executed on any other computer. To evaluate our platform, we used it to reproduce 25 experiments extracted from published papers. We have been able to successfully reproduce 20 (80%) of these experiments achieving the results reported in such works with minimum effort, thus showing that our approach is effective

    LIFEDATA - a framework for traceable active learning projects

    Get PDF
    Active Learning has become a popular method for iteratively improving data-intensive Artificial Intelligence models. However, it often presents a significant challenge when dealing with large volumes of volatile data in projects, as with an Active Learning loop. This paper introduces LIFEDATA, a Python- based framework designed to assist developers in implementing Active Learning projects focusing on traceability. It supports seamless tracking of all artifacts, from data selection and labeling to model interpretation, thus promoting transparency throughout the entire model learning process and enhancing error debugging efficiency while ensuring experiment reproducibility. To showcase its applicability, we present two life science use cases. Moreover, the paper proposes an algorithm that combines query strategies to demonstrate LIFEDATA’s ability to reduce data labeling effort

    A tidal disruption event coincident with a high-energy neutrino

    Get PDF
    Cosmic neutrinos provide a unique window into the otherwise hidden mechanism of particle acceleration in astrophysical objects. The IceCube Collaboration recently reported the likely association of one high-energy neutrino with a flare from the relativistic jet of an active galaxy pointed towards the Earth. However a combined analysis of many similar active galaxies revealed no excess from the broader population, leaving the vast majority of the cosmic neutrino flux unexplained. Here we present the likely association of a radio-emitting tidal disruption event, AT2019dsg, with a second high-energy neutrino. AT2019dsg was identified as part of our systematic search for optical counterparts to high-energy neutrinos with the Zwicky Transient Facility. The probability of finding any coincident radio-emitting tidal disruption event by chance is 0.5%, while the probability of finding one as bright in bolometric energy flux as AT2019dsg is 0.2%. Our electromagnetic observations can be explained through a multizone model, with radio analysis revealing a central engine, embedded in a UV photosphere, that powers an extended synchrotron-emitting outflow. This provides an ideal site for petaelectronvolt neutrino production. Assuming that the association is genuine, our observations suggest that tidal disruption events with mildly relativistic outflows contribute to the cosmic neutrino flux

    Digital 3D Technologies for Humanities Research and Education: An Overview

    Get PDF
    Digital 3D modelling and visualization technologies have been widely applied to support research in the humanities since the 1980s. Since technological backgrounds, project opportunities, and methodological considerations for application are widely discussed in the literature, one of the next tasks is to validate these techniques within a wider scientific community and establish them in the culture of academic disciplines. This article resulted from a postdoctoral thesis and is intended to provide a comprehensive overview on the use of digital 3D technologies in the humanities with regards to (1) scenarios, user communities, and epistemic challenges; (2) technologies, UX design, and workflows; and (3) framework conditions as legislation, infrastructures, and teaching programs. Although the results are of relevance for 3D modelling in all humanities disciplines, the focus of our studies is on modelling of past architectural and cultural landscape objects via interpretative 3D reconstruction methods

    Towards practicalization of blockchain-based decentralized applications

    Get PDF
    Blockchain can be defined as an immutable ledger for recording transactions, maintained in a distributed network of mutually untrusting peers. Blockchain technology has been widely applied to various fields beyond its initial usage of cryptocurrency. However, blockchain itself is insufficient to meet all the desired security or efficiency requirements for diversified application scenarios. This dissertation focuses on two core functionalities that blockchain provides, i.e., robust storage and reliable computation. Three concrete application scenarios including Internet of Things (IoT), cybersecurity management (CSM), and peer-to-peer (P2P) content delivery network (CDN) are utilized to elaborate the general design principles for these two main functionalities. Among them, the IoT and CSM applications involve the design of blockchain-based robust storage and management while the P2P CDN requires reliable computation. Such general design principles derived from disparate application scenarios have the potential to realize practicalization of many other blockchain-enabled decentralized applications. In the IoT application, blockchain-based decentralized data management is capable of handling faulty nodes, as designed in the cybersecurity application. But an important issue lies in the interaction between external network and blockchain network, i.e., external clients must rely on a relay node to communicate with the full nodes in the blockchain. Compromization of such relay nodes may result in a security breach and even a blockage of IoT sensors from the network. Therefore, a censorship-resistant blockchain-based decentralized IoT management system is proposed. Experimental results from proof-of-concept implementation and deployment in a real distributed environment show the feasibility and effectiveness in achieving censorship resistance. The CSM application incorporates blockchain to provide robust storage of historical cybersecurity data so that with a certain level of cyber intelligence, a defender can determine if a network has been compromised and to what extent. The CSM functions can be categorized into three classes: Network-centric (N-CSM), Tools-centric (T-CSM) and Application-centric (A-CSM). The cyber intelligence identifies new attackers, victims, or defense capabilities. Moreover, a decentralized storage network (DSN) is integrated to reduce on-chain storage costs without undermining its robustness. Experiments with the prototype implementation and real-world cyber datasets show that the blockchain-based CSM solution is effective and efficient. The P2P CDN application explores and utilizes the functionality of reliable computation that blockchain empowers. Particularly, P2P CDN is promising to provide benefits including cost-saving and scalable peak-demand handling compared with centralized CDNs. However, reliable P2P delivery requires proper enforcement of delivery fairness. Unfortunately, most existing studies on delivery fairness are based on non-cooperative game-theoretic assumptions that are arguably unrealistic in the ad-hoc P2P setting. To address this issue, an expressive security requirement for desired fair P2P content delivery is defined and two efficient approaches based on blockchain for P2P downloading and P2P streaming are proposed. The proposed system guarantees the fairness for each party even when all others collude to arbitrarily misbehave and achieves asymptotically optimal on-chain costs and optimal delivery communication
    • …
    corecore