873 research outputs found

    NFU-Enabled FASTA: moving bioinformatics applications onto wide area networks

    Get PDF
    Abstract Background Advances in Internet technologies have allowed life science researchers to reach beyond the lab-centric research paradigm to create distributed collaborations. Of the existing technologies that support distributed collaborations, there are currently none that simultaneously support data storage and computation as a shared network resource, enabling computational burden to be wholly removed from participating clients. Software using computation-enable logistical networking components of the Internet Backplane Protocol provides a suitable means to accomplish these tasks. Here, we demonstrate software that enables this approach by distributing both the FASTA algorithm and appropriate data sets within the framework of a wide area network. Results For large datasets, computation-enabled logistical networks provide a significant reduction in FASTA algorithm running time over local and non-distributed logistical networking frameworks. We also find that genome-scale sizes of the stored data are easily adaptable to logistical networks. Conclusion Network function unit-enabled Internet Backplane Protocol effectively distributes FASTA algorithm computation over large data sets stored within the scaleable network. In situations where computation is subject to parallel solution over very large data sets, this approach provides a means to allow distributed collaborators access to a shared storage resource capable of storing the large volumes of data equated with modern life science. In addition, it provides a computation framework that removes the burden of computation from the client and places it within the network

    Integrating Protein Data Resources through Semantic Web Services

    Get PDF
    Understanding the function of every protein is one major objective of bioinformatics. Currently, a large amount of information (e.g., sequence, structure and dynamics) is being produced by experiments and predictions that are associated with protein function. Integrating these diverse data about protein sequence, structure, dynamics and other protein features allows further exploration and establishment of the relationships between protein sequence, structure, dynamics and function, and thereby controlling the function of target proteins. However, information integration in protein data resources faces challenges at technology level for interfacing heterogeneous data formats and standards and at application level for semantic interpretation of dissimilar data and queries. In this research, a semantic web services infrastructure, called Web Services for Protein data resources (WSP), for flexible and user-oriented integration of protein data resources, is proposed. This infrastructure includes a method for modeling protein web services, a service publication algorithm, an efficient service discovery (matching) algorithm, and an optimal service chaining algorithm. Rather than relying on syntactic matching, the matching algorithm discovers services based on their similarity to the requested service. Therefore, users can locate services that semantically match their data requirements even if they are syntactically distinctive. Furthermore, WSP supports a workflow-based approach for service integration. The chaining algorithm is used to select and chain services, based on the criteria of service accuracy and data interoperability. The algorithm generates a web services workflow which automatically integrates the results from individual services.A number of experiments are conducted to evaluate the performance of the matching algorithm. The results reveal that the algorithm can discover services with reasonable performance. Also, a composite service, which integrates protein dynamics and conservation, is experimented using the WSP infrastructure

    DIET : new developments and recent results

    Get PDF
    Among existing grid middleware approaches, one simple, powerful, and flexibleapproach consists of using servers available in different administrative domainsthrough the classic client-server or Remote Procedure Call (RPC) paradigm.Network Enabled Servers (NES) implement this model also called GridRPC.Clients submit computation requests to a scheduler whose goal is to find aserver available on the grid. The aim of this paper is to give an overview of anNES middleware developed in the GRAAL team called DIET and to describerecent developments. DIET (Distributed Interactive Engineering Toolbox) is ahierarchical set of components used for the development of applications basedon computational servers on the grid.Parmi les intergiciels de grilles existants, une approche simple, flexible et performante consiste a utiliser des serveurs disponibles dans des domaines administratifs différents à travers le paradigme classique de l’appel de procédure àdistance (RPC). Les environnements de ce type, connus sous le terme de Network Enabled Servers, implémentent ce modèle appelé GridRPC. Des clientssoumettent des requêtes de calcul à un ordonnanceur dont le but consiste àtrouver un serveur disponible sur la grille.Le but de cet article est de donner un tour d’horizon d’un intergiciel développédans le projet GRAAL appelé DIET 1. DIET (Distributed Interactive Engineering Toolbox) est un ensemble hiérarchique de composants utilisés pour ledéveloppement d’applications basées sur des serveurs de calcul sur la grille

    Performance Improvement of Distributed Computing Framework and Scientific Big Data Analysis

    Get PDF
    Analysis of Big data to gain better insights has been the focus of researchers in the recent past. Traditional desktop computers or database management systems may not be suitable for efficient and timely analysis, due to the requirement of massive parallel processing. Distributed computing frameworks are being explored as a viable solution. For example, Google proposed MapReduce, which is becoming a de facto computing architecture for Big data solutions. However, scheduling in MapReduce is coarse grained and remains as a challenge for improvement. Related with MapReduce scheduler when configured over distributed clusters, we identify two issues: data locality disruption and random assignment of non-local map tasks. We propose a network aware scheduler to extend the existing rack awareness. The tasks are scheduled in the order of node, rack and any other rack within the same cluster to achieve cluster level data locality. The issue of random assignment non-local map tasks is handled by enhancing the scheduler to consider the network parameters, such as delay, bandwidth and packet loss between remote clusters. As part of Big data analysis at computational biology, we consider two major data intensive applications: indexing genome sequences and de Novo assembly. Both of these applications deal with the massive amount data generated from DNA sequencers. We developed a scalable algorithm to construct sub-trees of a suffix tree in parallel to address huge memory requirements needed for indexing the human genome. For the de Novo assembly, we propose Parallel Giraph based Assembler (PGA) to address the challenges associated with the assembly of large genomes over commodity hardware. PGA uses the de Bruijn graph to represent the data generated from sequencers. Huge memory demands and performance expectations are addressed by developing parallel algorithms based on the distributed graph-processing framework, Apache Giraph

    Performance Optimization and Dynamics Control for Large-scale Data Transfer in Wide-area Networks

    Get PDF
    Transport control plays an important role in the performance of large-scale scientific and media streaming applications involving transfer of large data sets, media streaming, online computational steering, interactive visualization, and remote instrument control. In general, these applications have two distinctive classes of transport requirements: large-scale scientific applications require high bandwidths to move bulk data across wide-area networks, while media streaming applications require stable bandwidths to ensure smooth media playback. Unfortunately, the widely deployed Transmission Control Protocol is inadequate for such tasks due to its performance limitations. The purpose of this dissertation is to conduct rigorous analytical study of the design and performance of transport solutions, and develop an integrated transport solution in a systematical way to overcome the limitations of current transport methods. One of the primary challenges is to explore and compose a set of feasible route options with multiple constraints. Another challenge essentially arises from the randomness inherent in wide-area networks, particularly the Internet. This randomness must be explicitly accounted for to achieve both goodput maximization and stabilization over the constructed routes by suitably adjusting the source rate in response to both network and host dynamics.The superior and robust performance of the proposed transport solution is extensively evaluated in a simulated environment and further verified through real-life implementations and deployments over both Internet and dedicated connections under disparate network conditions in comparison with existing transport methods

    Modeling Television & Radio Broadcasting System Infrastructure within a Prototype Enterprise Geospatial Information System

    Get PDF
    Since the beginning of the modern American Intelligence Community apparatus, the need to understand how people communicate via their media has been important. The need to locate television and radio infrastructure and the extent of effects have been required in government to learn about the understandings, feelings, and values of other cultures to support clear and thoughtful communication with other nations. The Open Source Center (OSC) within the Office of the Director of National Intelligence has monitored global media sources for almost seventy years, and recently has begun to locate and analyze the geographic extent and effects of these sources. This report describes the creation of a prototype geographic information system which models the television and radio broadcast system infrastructure and broadcast areas of OSC sources. The project reviewed the social and interactive processes of media to learn about the spatial relationships between objects such as stations, towers, broadcast transmitters, and owners within the broadcasting system. The project prototyped a GIS which modeled these features and relationships in a ESRI-based geodatabase. The prototype utilized the ArcGIS suite of tools from ESRI to perform database management, analysis, and to implement Web-based Open Source Consortium Web services. The resulting prototype GIS provided OSC a platform which it could learn from and later implement
    • …
    corecore