1,358 research outputs found

    Enabling and Performance Benchmarking of a Next-generation Sequencing Data Analysis Pipeline

    Get PDF
    The development of Next Generation Sequencing (NGS) technology resulted the rapid accumulation of a large amount of sequencing data demanding data mining. Various of variant calling softwares and pipelines came into being. Genome Analysis Toolkit (GATK) and its Best Practices quickly became the industrial gold-standard for variant calling because of its speediness, high accuracy and throughput. GATK has been updated all the time. The latest and strongest version is GATK4 which enabled parallelization and cloud infrastructure optimization via Apache spark. Currently, Broad Institute has cooperated with many cloud providers to deploy GATK Best Practices on cloud platform. However, there is no benchmarking data released for GATK4 and no cooperation with CSC (CSC – IT Center of Science Ltd) cPouta IaaS (Infrastructure as a Service) cloud. We optimized WDL (workflow description language) script of germline SNPs and Indels short variants discovery workflow from Best Practices and ran it by Cromwell execution engine on a virtual machine of cPouta cloud which featured a 24 cores Intel(R) Xeon(R) CPU E5-2680 v3 with hyper-threading. In addition, we benchmarked pipeline execution time(s) for five seperated pipelines of this workflow with three 30X WGS (Whole Genome Sequencing) datasets: NA12878, NA12891 and NA12892 and explored optimized runtime parameters for GATK4 tools, PairHMM thread scalability in HaplotypeCaller, GATK4 thread scalability for PGC in MarkDuplicates and execution times comparison for GATK4 SortSam vs SortSamSpark and MarkDuplicates vs MarkDuplicatesSpark. We found the real execution time for similar WGS datasets with different size and features showed consistency and execution time and dataset size were roughly positive correlated. The optimal threads number is 12 for GATK4 HaplotypeCaller in ERC mode, giving rise to 12.4% speed-up. The optimal PGC threads number is 2 for GATK4 MarkDuplicates. And, multi-threading with Spark local runner highly speeded up GATK4 tool execution. SortSamSpark enabled 16 local cores gave rise to a speed-up of 83.6%. MarkDuplicatesSpark enabled 16 local cores gave rise to a speed-up of 22.2% and 37.3%, seperately with and without writing metrics file. With detailed virtural machine setting up, optimized parameters and GATK4 performance benchmarking data, this thesis is a guide for implementation of GATK4 Best Practices germline SNPs and Indels short variants discovery workflow on CSC cPouta cloud platform

    Increasing Compression Ratio of Low Complexity Compressive Sensing Video Encoder with Application-Aware Configurable Mechanism

    Full text link
    With the development of embedded video acquisition nodes and wireless video surveillance systems, traditional video coding methods could not meet the needs of less computing complexity any more, as well as the urgent power consumption. So, a low-complexity compressive sensing video encoder framework with application-aware configurable mechanism is proposed in this paper, where novel encoding methods are exploited based on the practical purposes of the real applications to reduce the coding complexity effectively and improve the compression ratio (CR). Moreover, the group of processing (GOP) size and the measurement matrix size can be configured on the encoder side according to the post-analysis requirements of an application example of object tracking to increase the CR of encoder as best as possible. Simulations show the proposed framework of encoder could achieve 60X of CR when the tracking successful rate (SR) is still keeping above 90%.Comment: 5 pages with 6figures and 1 table,conferenc

    Thermal Bremsstrahlung Radiation in a Two-Temperature Plasma

    Full text link
    In the normal one-temperature plasma the motion of ions is usually neglected when calculating the Bremsstrahlung radiation of the plasma. Here we calculate the Bremsstrahlung radiation of a two-temperature plasma by taking into account of the motion of ions. Our results show that the total radiation power is always lower if the motion of ions is considered. We also apply the two-temperature Bremsstrahlung radiation mechanism for an analytical Advection-Dominated Accretion Flow (ADAF) model; we find the two-temperature correction to the total Bremsstrahlung radiation for ADAF is negligible.Comment: 5 pages, 4 figures, accepted for publication in CHJAA. Some discussions and references adde

    High-Field Shubnikov-de Haas Oscillations in the Topological Insulator Bi2_2Te2_2Se

    Full text link
    We report measurements of the surface Shubnikov de Haas oscillations (SdH) on crystals of the topological insulator Bi2_2Te2_2Se. In crystals with large bulk resistivity (∼\sim4 Ω\Omegacm at 4 K), we observe ∼\sim15 surface SdH oscillations (to the nn = 1 Landau Level) in magnetic fields BB up to 45 Tesla. Extrapolating to the limit 1/B→01/B\to 0, we confirm the 12\frac12-shift expected from a Dirac spectrum. The results are consistent with a very small surface Lande gg-factor.Comment: Text expanded, slight changes in text, final version; Total 6 pages, 6 figure
    • …
    corecore