27 research outputs found

    Using Fine-Grained Cycle Stealing to Improve Throughput, Efficiency and Response Time on a Dedicated Cluster while Maintaining Quality of Service

    Get PDF
    For various reasons, a dedicated cluster is not always fully utilized even when all of its processors are allocated to jobs. This occurs any time that a running job does not use 100% of each of the processors allocated to it. Keeping in mind the needs of both the cluster’s system administrators and its users, we would like to increase the throughput and efficiency of the cluster while maintaining or improving the average turnaround time of the jobs and the quality of service of the “primary” jobs originally scheduled on the cluster. To increase the throughput and efficiency of the cluster, we schedule background jobs to run concurrently with the primary jobs. However, to achieve our goal of maintaining or improving the average turnaround time of the jobs and the quality of service of the primary jobs, we investigate two methods of prioritizing the CPU usage of the primary and background jobs. The first method uses the existing “nice” mechanism in the 2.4 Linux kernel to give background processes a lower priority than primary processes. The second method involves modifying the 2.4 Linux kernel’s CPU scheduler to create a new guest process priority that prevents guest processes from running when primary processes are runnable. Our results come from empirical investigations using real production applications. Production runs using these applications are regularly performed in the dedicated cluster environment that we used for testing. Measurements of various statistics, such as wall time and CPU time, are taken directly from test runs that use these same production applications. This was helpful for comparison to results from models and synthetic applications. We found that using the existing nice mechanism significantly improves the throughput, efficiency and average turnaround time of the cluster but only at the expense of the quality of service of the primary jobs (primary job running times increased 5-25%). On the other hand, we can use the guest process priority to get similar improvements in throughput, efficiency and average turnaround time while not significantly impacting the quality of service of the primary jobs (primary job running times changed less than 1%)

    Genome modeling system: A knowledge management platform for genomics

    Get PDF
    In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms

    CMS physics technical design report : Addendum on high density QCD with heavy ions

    Get PDF
    Peer reviewe

    Improving cluster utilization through intelligent processor sharing

    No full text
    A dedicated cluster is often not fully utilized even when all of its processors are allocated to jobs. This occurs any time that a running job does not use 100 % of each of the processors allocated to it. We increase the throughput and efficiency of the cluster by scheduling background jobs to run concurrently with the “primary ” jobs originally scheduled on the cluster. We do this while maintaining the quality of service provided to the primary jobs. Our results come from empirical measurements using production applications. 1

    Solving hub arc location problems on a cluster of workstations

    No full text
    Many modern transportation systems rely on a network of hub facilities to help concentrate flows of freight or passengers to exploit the economies of scale in transportation. The design of a hub network, including location of the hub facilities, is a key determinant of the cost and competitiveness of a transportation and logistics system. This paper reports on a parallel implementation of an algorithm for the hub arc location model to design such a network. Computational work was performed on a cluster of workstations with data for air passenger traffic in the United States and postal operations in Sydney, Australia

    Circos plot of HCC1395 tumor/normal comparison.

    No full text
    <p>Circos is a popular tool for summarizing genomic events in a tumor genome. This is just one of many automatically generated visualizations made possible by the GMS. In this example, the WGS, exome and RNA-seq data for HCC1395 are displayed in several tracks along with additional visualizations illustrating individual events. Moving inwards, SNVs and Indels are plotted on the outermost track, then highly expressed genes, CNVs, and finally chromosomal translocations at the center. For events predicted to affect protein coding genes, additional plots are auto-generated to display the mutation position relative to protein domains and previously reported mutations from the Cosmic database, as illustrated in the topmost plot. Moving clockwise, a screenshot of IGV demonstrates one of the somatic deletions identified. IGV XML sessions are automatically generated to allow rapid manual review of all predicted events. Next, a histogram illustrates the expression of a single highly expressed gene relative to the distribution of expression for all genes. Then, a CNV plot is shown for an amplified portion of one chromosome. Finally, the coverage and supporting reads for a chromosomal translocation are depicted.</p

    HCC1395 (“TST1”) example input, models, and outputs.

    No full text
    <p>A test dataset for the HCC1395 cell line is provided with the GMS software to allow testing of software installation, and facilitate further development. It is also used to illustrate much of the current functionality of the GMS. HCC1395 tumor and the corresponding HCC1395BL ‘normal’ cell line DNA and RNA samples were sequenced by whole genome, exome, and RNA-seq methods producing six sets of instrument data for input to various GMS pipelines. Additional required inputs for the pipelines include a reference genome (e.g., GRCh37), gene annotations (e.g., Ensembl 67_37l), and variant databases (e.g., dbSNP37). Different versions (processing profiles) of the reference alignment were used to align WGS and exome DNA reads to the reference genome. A separate RNA-seq pipeline similarly aligns RNA reads. Alternate versions of the somatic variation pipeline are used to call various types of variants from exome and WGS data by comparing tumor and normal reference alignments. A differential expression pipeline identifies significantly altered transcript expression levels by comparing the tumor and normal RNA-seq alignments. Finally, the MedSeq pipeline summarizes all upstream pipelines into a single convenient result set. This includes a multitude of reports and visualizations for single nucleotide variants (SNVs), Indels (insertions and deletions), SVs (structural variants), CNVs (copy number variations), transcript fusions, differentially expressed genes, alternatively expressed isoforms, and much more. Data types are further integrated to, for example, identify which variants at the DNA level are expressed at the RNA level and which events affect known cancer driver genes or druggable targets.</p

    Overview of the GMS.

    No full text
    <p>The genome modeling system (GMS) is implemented to use a federated disk SAN, with meta-data stored in a PostgreSQL relational database. Sample management tools allow the import of new samples and instrument data. Data are then processed through various analysis pipelines (e.g., reference alignment, somatic variation detection, etc.) that in turn are managed and monitored by a workflow system (<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004274#box001" target="_blank">Box 1</a>). Stand-alone GMS tools, not part of automated pipelines, are available through a common tool tree. Most components of the system can be accessed through an Ubuntu Linux command-line interface or Ruby-on-Rails web interface.</p

    Somatic variation processing profile and workflow.

    No full text
    <p>To illustrate key GMS concepts, the processing profiles and workflow for the somatic variation pipeline are shown. Abbreviations: copy number variant (CNV), copy number amplification (CNA), genome analysis tool kit (GATK), insertion/deletion (Indel), loss of heterozygosity (LOH), mapping quality (MQ), single nucleotide variant (SNV), structural variant (SV), variant allele frequency (VAF).</p
    corecore