9,696 research outputs found

    Automatic Detection of Public Development Projects in Large Open Source Ecosystems: An Exploratory Study on GitHub

    Full text link
    Hosting over 10 million of software projects, GitHub is one of the most important data sources to study behavior of developers and software projects. However, with the increase of the size of open source datasets, the potential threats to mining these datasets have also grown. As the dataset grows, it becomes gradually unrealistic for human to confirm quality of all samples. Some studies have investigated this problem and provided solutions to avoid threats in sample selection, but some of these solutions (e.g., finding development projects) require human intervention. When the amount of data to be processed increases, these semi-automatic solutions become less useful since the effort in need for human intervention is far beyond affordable. To solve this problem, we investigated the GHTorrent dataset and proposed a method to detect public development projects. The results show that our method can effectively improve the sample selection process in two ways: (1) We provide a simple model to automatically select samples (with 0.827 precision and 0.947 recall); (2) We also offer a complex model to help researchers carefully screen samples (with 63.2% less effort than manually confirming all samples, and can achieve 0.926 precision and 0.959 recall).Comment: Accepted by the SEKE2018 Conferenc

    A case study in open source innovation: developing the Tidepool Platform for interoperability in type 1 diabetes management.

    Get PDF
    OBJECTIVE:Develop a device-agnostic cloud platform to host diabetes device data and catalyze an ecosystem of software innovation for type 1 diabetes (T1D) management. MATERIALS AND METHODS:An interdisciplinary team decided to establish a nonprofit company, Tidepool, and build open-source software. RESULTS:Through a user-centered design process, the authors created a software platform, the Tidepool Platform, to upload and host T1D device data in an integrated, device-agnostic fashion, as well as an application ("app"), Blip, to visualize the data. Tidepool's software utilizes the principles of modular components, modern web design including REST APIs and JavaScript, cloud computing, agile development methodology, and robust privacy and security. DISCUSSION:By consolidating the currently scattered and siloed T1D device data ecosystem into one open platform, Tidepool can improve access to the data and enable new possibilities and efficiencies in T1D clinical care and research. The Tidepool Platform decouples diabetes apps from diabetes devices, allowing software developers to build innovative apps without requiring them to design a unique back-end (e.g., database and security) or unique ways of ingesting device data. It allows people with T1D to choose to use any preferred app regardless of which device(s) they use. CONCLUSION:The authors believe that the Tidepool Platform can solve two current problems in the T1D device landscape: 1) limited access to T1D device data and 2) poor interoperability of data from different devices. If proven effective, Tidepool's open source, cloud model for health data interoperability is applicable to other healthcare use cases

    Federated Robust Embedded Systems: Concepts and Challenges

    Get PDF
    The development within the area of embedded systems (ESs) is moving rapidly, not least due to falling costs of computation and communication equipment. It is believed that increased communication opportunities will lead to the future ESs no longer being parts of isolated products, but rather parts of larger communities or federations of ESs, within which information is exchanged for the benefit of all participants. This vision is asserted by a number of interrelated research topics, such as the internet of things, cyber-physical systems, systems of systems, and multi-agent systems. In this work, the focus is primarily on ESs, with their specific real-time and safety requirements. While the vision of interconnected ESs is quite promising, it also brings great challenges to the development of future systems in an efficient, safe, and reliable way. In this work, a pre-study has been carried out in order to gain a better understanding about common concepts and challenges that naturally arise in federations of ESs. The work was organized around a series of workshops, with contributions from both academic participants and industrial partners with a strong experience in ES development. During the workshops, a portfolio of possible ES federation scenarios was collected, and a number of application examples were discussed more thoroughly on different abstraction levels, starting from screening the nature of interactions on the federation level and proceeding down to the implementation details within each ES. These discussions led to a better understanding of what can be expected in the future federated ESs. In this report, the discussed applications are summarized, together with their characteristics, challenges, and necessary solution elements, providing a ground for the future research within the area of communicating ESs

    Multi-tenant Pub/Sub processing for real-time data streams

    Get PDF
    Devices and sensors generate streams of data across a diversity of locations and protocols. That data usually reaches a central platform that is used to store and process the streams. Processing can be done in real time, with transformations and enrichment happening on-the-fly, but it can also happen after data is stored and organized in repositories. In the former case, stream processing technologies are required to operate on the data; in the latter batch analytics and queries are of common use. This paper introduces a runtime to dynamically construct data stream processing topologies based on user-supplied code. These dynamic topologies are built on-the-fly using a data subscription model defined by the applications that consume data. Each user-defined processing unit is called a Service Object. Every Service Object consumes input data streams and may produce output streams that others can consume. The subscription-based programing model enables multiple users to deploy their own data-processing services. The runtime does the dynamic forwarding of data and execution of Service Objects from different users. Data streams can originate in real-world devices or they can be the outputs of Service Objects. The runtime leverages Apache STORM for parallel data processing, that combined with dynamic user-code injection provides multi-tenant stream processing topologies. In this work we describe the runtime, its features and implementation details, as well as we include a performance evaluation of some of its core components.This work is partially supported by the European Research Council (ERC) un- der the EU Horizon 2020 programme (GA 639595), the Spanish Ministry of Economy, Industry and Competitivity (TIN2015-65316-P) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Classification of changes in API evolution

    Get PDF
    Applications typically communicate with each other, accessing and exposing data and features by using Application Programming Interfaces (APIs). Even though API consumers expect APIs to be steady and well established, APIs are prone to continuous changes, experiencing different evolutive phases through their lifecycle. These changes are of different types, caused by different needs and are affecting consumers in different ways. In this paper, we identify and classify the changes that often happen to APIs, and investigate how all these changes are reflected in the documentation, release notes, issue tracker and API usage logs. The analysis of each step of a change, from its implementation to the impact that it has on API consumers, will help us to have a bigger picture of API evolution. Thus, we review the current state of the art in API evolution and, as a result, we define a classification framework considering both the changes that may occur to APIs and the reasons behind them. In addition, we exemplify the framework using a software platform offering a Web API, called District Health Information System (DHIS2), used collaboratively by several departments of World Health Organization (WHO).Peer ReviewedPostprint (author's final draft
    corecore