590 research outputs found

    Assessment, Usability, and Sociocultural Impacts of DataONE

    Get PDF
    DataONE, funded from 2009-2019 by the U.S. National Science Foundation, is an early example of a large-scale project that built both a cyberinfrastructure and culture of data discovery, sharing, and reuse. DataONE used a Working Group model, where a diverse group of participants collaborated on targeted research and development activities to achieve broader project goals. This article summarizes the work carried out by two of DataONE’s working groups: Usability & Assessment (2009-2019) and Sociocultural Issues (2009-2014). The activities of these working groups provide a unique longitudinal look at how scientists, librarians, and other key stakeholders engaged in convergence research to identify and analyze practices around research data management through the development of boundary objects, an iterative assessment program, and reflection. Members of the working groups disseminated their findings widely in papers, presentations, and datasets, reaching international audiences through publications in 25 different journals and presentations to over 5,000 people at interdisciplinary venues. The working groups helped inform the DataONE cyberinfrastructure and influenced the evolving data management landscape. By studying working groups over time, the paper also presents lessons learned about the working group model for global large-scale projects that bring together participants from multiple disciplines and communities in convergence research

    Interoperability and FAIRness through a novel combination of Web technologies

    Get PDF
    Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs

    An Integrated Big and Fast Data Analytics Platform for Smart Urban Transportation Management

    Full text link
    (c) 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.[EN] Smart urban transportation management can be considered as a multifaceted big data challenge. It strongly relies on the information collected into multiple, widespread, and heterogeneous data sources as well as on the ability to extract actionable insights from them. Besides data, full stack (from platform to services and applications) Information and Communications Technology (ICT) solutions need to be specifically adopted to address smart cities challenges. Smart urban transportation management is one of the key use cases addressed in the context of the EUBra-BIGSEA (Europe-Brazil Collaboration of Big Data Scientific Research through Cloud-Centric Applications) project. This paper specifically focuses on the City Administration Dashboard, a public transport analytics application that has been developed on top of the EUBra-BIGSEA platform and used by the Municipality stakeholders of Curitiba, Brazil, to tackle urban traffic data analysis and planning challenges. The solution proposed in this paper joins together a scalable big and fast data analytics platform, a flexible and dynamic cloud infrastructure, data quality and entity matching algorithms as well as security and privacy techniques. By exploiting an interoperable programming framework based on Python Application Programming Interface (API), it allows an easy, rapid and transparent development of smart cities applications.This work was supported by the European Commission through the Cooperation Programme under EUBra-BIGSEA Horizon 2020 Grant [Este projeto e resultante da 3a Chamada Coordenada BR-UE em Tecnologias da Informacao e Comunicacao (TIC), anunciada pelo Ministerio de Ciencia, Tecnologia e Inovacao (MCTI)] under Grant 690116.Fiore, S.; Elia, D.; Pires, CE.; Mestre, DG.; Cappiello, C.; Vitali, M.; Andrade, N.... (2019). An Integrated Big and Fast Data Analytics Platform for Smart Urban Transportation Management. IEEE Access. 7:117652-117677. https://doi.org/10.1109/ACCESS.2019.2936941S117652117677

    Using Text Similarity to Detect Social Interactions not Captured by Formal Reply Mechanisms

    Full text link
    In modeling social interaction online, it is important to understand when people are reacting to each other. Many systems have explicit indicators of replies, such as threading in discussion forums or replies and retweets in Twitter. However, it is likely these explicit indicators capture only part of people's reactions to each other, thus, computational social science approaches that use them to infer relationships or influence are likely to miss the mark. This paper explores the problem of detecting non-explicit responses, presenting a new approach that uses tf-idf similarity between a user's own tweets and recent tweets by people they follow. Based on a month's worth of posting data from 449 ego networks in Twitter, this method demonstrates that it is likely that at least 11% of reactions are not captured by the explicit reply and retweet mechanisms. Further, these uncaptured reactions are not evenly distributed between users: some users, who create replies and retweets without using the official interface mechanisms, are much more responsive to followees than they appear. This suggests that detecting non-explicit responses is an important consideration in mitigating biases and building more accurate models when using these markers to study social interaction and information diffusion.Comment: A final version of this work was published in the 2015 IEEE 11th International Conference on e-Science (e-Science
    corecore