25,110 research outputs found
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Tupleware: Redefining Modern Analytics
There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems
Control-data separation architecture for cellular radio access networks: a survey and outlook
Conventional cellular systems are designed to ensure ubiquitous coverage with an always present wireless channel irrespective of the spatial and temporal demand of service. This approach raises several problems due to the tight coupling between network and data access points, as well as the paradigm shift towards data-oriented services, heterogeneous deployments and network densification. A logical separation between control and data planes is seen as a promising solution that could overcome these issues, by providing data services under the umbrella of a coverage layer. This article presents a holistic survey of existing literature on the control-data separation architecture (CDSA) for cellular radio access networks. As a starting point, we discuss the fundamentals, concepts, and general structure of the CDSA. Then, we point out limitations of the conventional architecture in futuristic deployment scenarios. In addition, we present and critically discuss the work that has been done to investigate potential benefits of the CDSA, as well as its technical challenges and enabling technologies. Finally, an overview of standardisation proposals related to this research vision is provided
- …