49,378 research outputs found

    DHLP 1&2: Giraph based distributed label propagation algorithms on heterogeneous drug-related networks

    Full text link
    Background and Objective: Heterogeneous complex networks are large graphs consisting of different types of nodes and edges. The knowledge extraction from these networks is complicated. Moreover, the scale of these networks is steadily increasing. Thus, scalable methods are required. Methods: In this paper, two distributed label propagation algorithms for heterogeneous networks, namely DHLP-1 and DHLP-2 have been introduced. Biological networks are one type of the heterogeneous complex networks. As a case study, we have measured the efficiency of our proposed DHLP-1 and DHLP-2 algorithms on a biological network consisting of drugs, diseases, and targets. The subject we have studied in this network is drug repositioning but our algorithms can be used as general methods for heterogeneous networks other than the biological network. Results: We compared the proposed algorithms with similar non-distributed versions of them namely MINProp and Heter-LP. The experiments revealed the good performance of the algorithms in terms of running time and accuracy.Comment: Source code available for Apache Giraph on Hadoo

    Structured Review of the Evidence for Effects of Code Duplication on Software Quality

    Get PDF
    This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)

    Scalable Persistent Storage for Erlang

    Get PDF
    The many core revolution makes scalability a key property. The RELEASE project aims to improve the scalability of Erlang on emergent commodity architectures with 100,000 cores. Such architectures require scalable and available persistent storage on up to 100 hosts. We enumerate the requirements for scalable and available persistent storage, and evaluate four popular Erlang DBMSs against these requirements. This analysis shows that Mnesia and CouchDB are not suitable persistent storage at our target scale, but Dynamo-like NoSQL DataBase Management Systems (DBMSs) such as Cassandra and Riak potentially are. We investigate the current scalability limits of the Riak 1.1.1 NoSQL DBMS in practice on a 100-node cluster. We establish for the first time scientifically the scalability limit of Riak as 60 nodes on the Kalkyl cluster, thereby confirming developer folklore. We show that resources like memory, disk, and network do not limit the scalability of Riak. By instrumenting Erlang/OTP and Riak libraries we identify a specific Riak functionality that limits scalability. We outline how later releases of Riak are refactored to eliminate the scalability bottlenecks. We conclude that Dynamo-style NoSQL DBMSs provide scalable and available persistent storage for Erlang in general, and for our RELEASE target architecture in particular

    Building scalable digital library ingestion pipelines using microservices

    Get PDF
    CORE, a harvesting service offering access to millions of open access research papers from around the world, has shifted its harvesting process from following a monolithic approach to the adoption of a microservices infrastructure. In this paper, we explain how we rearranged and re-scheduled our old ingestion pipeline, present CORE's move to managing microservices and outline the tools we use in a new and optimised ingestion system. In addition, we discuss the ineffciencies of our old harvesting process, the advantages, and challenges of our new ingestion system and our future plans. We conclude that via the adoption of microservices architecture we managed to achieve a scalable and distributed system that would assist with CORE's future performance and evolution
    corecore