863 research outputs found

    Standing together for reproducibility in large-scale computing: report on reproducibility@XSEDE

    Get PDF
    This is the final report on reproducibility@xsede, a one-day workshop held in conjunction with XSEDE14, the annual conference of the Extreme Science and Engineering Discovery Environment (XSEDE). The workshop's discussion-oriented agenda focused on reproducibility in large-scale computational research. Two important themes capture the spirit of the workshop submissions and discussions: (1) organizational stakeholders, especially supercomputer centers, are in a unique position to promote, enable, and support reproducible research; and (2) individual researchers should conduct each experiment as though someone will replicate that experiment. Participants documented numerous issues, questions, technologies, practices, and potentially promising initiatives emerging from the discussion, but also highlighted four areas of particular interest to XSEDE: (1) documentation and training that promotes reproducible research; (2) system-level tools that provide build- and run-time information at the level of the individual job; (3) the need to model best practices in research collaborations involving XSEDE staff; and (4) continued work on gateways and related technologies. In addition, an intriguing question emerged from the day's interactions: would there be value in establishing an annual award for excellence in reproducible research? Overvie

    Data-Driven Meets Theory-Driven Research in the Era of Big Data: Opportunities and Challenges for Information Systems Research

    Get PDF
    The era of big data provides many opportunities for conducting impactful research from both data-driven and theory-driven perspectives. However, data-driven and theory-driven research have progressed somewhat independently. In this paper, we develop a framework that articulates important differences between these two perspectives and propose a role for information systems research at their intersection. The framework presents a set of pathways that combine the data-driven and theory-driven perspectives. From these pathways, we derive a set of challenges, and show how they can be addressed by research in information systems. By doing so, we identify an important role that information systems research can play in advancing both data-driven and theory-driven research in the era of big data

    Computational reproducibility of Jupyter notebooks from biomedical publications

    Full text link
    Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications. We address computational reproducibility at two levels: First, using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks related to publications indexed in PubMed Central. We identified such notebooks by mining the articles full text, locating them on GitHub and re-running them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. Second, this study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over two years. Out of 27271 notebooks from 2660 GitHub repositories associated with 3467 articles, 22578 notebooks were written in Python, including 15817 that had their dependencies declared in standard requirement files and that we attempted to re-run automatically. For 10388 of these, all declared dependencies could be installed successfully, and we re-ran them to assess reproducibility. Of these, 1203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions. We zoom in on common problems, highlight trends and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.Comment: arXiv admin note: substantial text overlap with arXiv:2209.0430

    NEW ARTIFACTS FOR THE KNOWLEDGE DISCOVERY VIA DATA ANALYTICS (KDDA) PROCESS

    Get PDF
    Recently, the interest in the business application of analytics and data science has increased significantly. The popularity of data analytics and data science comes from the clear articulation of business problem solving as an end goal. To address limitations in existing literature, this dissertation provides four novel design artifacts for Knowledge Discovery via Data Analytics (KDDA). The first artifact is a Snail Shell KDDA process model that extends existing knowledge discovery process models, but addresses many existing limitations. At the top level, the KDDA Process model highlights the iterative nature of KDDA projects and adds two new phases, namely Problem Formulation and Maintenance. At the second level, generic tasks of the KDDA process model are presented in a comparative manner, highlighting the differences between the new KDDA process model and the traditional knowledge discovery process models. Two case studies are used to demonstrate how to use KDDA process model to guide real world KDDA projects. The second artifact, a methodology for theory building based on quantitative data is a novel application of KDDA process model. The methodology is evaluated using a theory building case from the public health domain. It is not only an instantiation of the Snail Shell KDDA process model, but also makes theoretical contributions to theory building. It demonstrates how analytical techniques can be used as quantitative gauges to assess important construct relationships during the formative phase of theory building. The third artifact is a data mining ontology, the DM3 ontology, to bridge the semantic gap between business users and KDDA expert and facilitate analytical model maintenance and reuse. The DM3 ontology is evaluated using both criteria-based approach and task-based approach. The fourth artifact is a decision support framework for MCDA software selection. The framework enables users choose relevant MCDA software based on a specific decision making situation (DMS). A DMS modeling framework is developed to structure the DMS based on the decision problem and the users\u27 decision preferences and. The framework is implemented into a decision support system and evaluated using application examples from the real-estate domain

    Key skills for co-learning and co-inquiry in two open platforms: a massive portal (EDUCARED) and a personal environment (weSPOT)

    Get PDF
    This paper presents a qualitative investigation on key skills for co-learning and co-inquiry in the digital age. The method applied was cyber-ethnography with asynchronous observation (forum and wiki) and synchronous discussions (webconference) for analysing skills developed by a co-learning community. This study focuses on participants from different countries who interacted during nine months in two open platforms: the massive educational portal EDUCARED of the “7th International Conference on Education 2012-2013" and weSPOT, an European “Working Environment with Social Personal and Open Technologies for inquiry based learning”. As a result of this study, it was observed that the EDUCARED portal led to the development of more explicit digital literacies, possibly because it was a simpler and familiar interface (forum). And in the weSPOT environment, experienced participants with digital technologies had more opportunities to develop other skills related to Critical-Creative Thinking and Scientific Reasoning

    Ontology based data warehousing for mining of heterogeneous and multidimensional data sources

    Get PDF
    Heterogeneous and multidimensional big-data sources are virtually prevalent in all business environments. System and data analysts are unable to fast-track and access big-data sources. A robust and versatile data warehousing system is developed, integrating domain ontologies from multidimensional data sources. For example, petroleum digital ecosystems and digital oil field solutions, derived from big-data petroleum (information) systems, are in increasing demand in multibillion dollar resource businesses worldwide. This work is recognized by Industrial Electronic Society of IEEE and appeared in more than 50 international conference proceedings and journals

    Business process management and digital innovations : a systematic literature review

    Get PDF
    Emerging technologies have capabilities to reshape business process management (BPM) from its traditional version to a more explorative variant. However, to exploit the full benefits of new IT, it is essential to reveal BPM’s research potential and to detect recent trends in practice. Therefore, this work presents a systematic literature review (SLR) with 231 recent academic articles (from 2014 until May 2019) that integrate BPM with digital innovations (DI). We position those articles against seven future BPM-DI trends that were inductively derived from an expert panel. By complementing the expected trends in practice with a state-of-the-art literature review, we are able to derive covered and uncovered themes in order to help bridge a rigor-relevance gap. The major technological impacts within the BPM field seem to focus on value creation, customer engagement and managing human-centric and knowledge-intensive business processes. Finally, our findings are categorized into specific calls for research and for action to let scholars and organizations better prepare for future digital needs

    Small fish in a big pond: an architectural approach to users privacy, rights and security in the age of big data

    Get PDF
    We focus on the challenges and issues associated with Big Data, and propose a novel architecture that uses the principles of Separation of Concerns and distributed computing to overcome many of the challenges associated with storage, analysis and integrity. We address the issue of asymmetrical distribution of power between the originators of data and the organizations and institutions that make use of that data by taking a systemic perspective to include both sides in our architectural design, shifting from a customer-provider relationship to a more symbiotic one in which control over access to customer data resides with the customer. We illustrate the affordances of the proposed architecture by describing its application in the domain of Social Networking Sites, where we furnish a mechanism to address problems of privacy and identity, and create the potential to open up online social networking to a richer set of possible applications
    • …
    corecore