1,358 research outputs found
Enabling and Performance Benchmarking of a Next-generation Sequencing Data Analysis Pipeline
The development of Next Generation Sequencing (NGS) technology resulted the rapid accumulation of a large amount of sequencing data demanding data mining. Various of variant calling softwares and pipelines came into being. Genome Analysis Toolkit (GATK) and its Best Practices quickly became the industrial gold-standard for variant calling because of its speediness, high accuracy and throughput. GATK has been updated all the time. The latest and strongest version is GATK4 which enabled parallelization and cloud infrastructure optimization via Apache spark. Currently, Broad Institute has cooperated with many cloud providers to deploy GATK Best Practices on cloud platform. However, there is no benchmarking data released for GATK4 and no cooperation with CSC (CSC – IT Center of Science Ltd) cPouta IaaS (Infrastructure as a Service) cloud.
We optimized WDL (workflow description language) script of germline SNPs and Indels short variants discovery workflow from Best Practices and ran it by Cromwell execution engine on a virtual machine of cPouta cloud which featured a 24 cores Intel(R) Xeon(R) CPU E5-2680 v3 with hyper-threading. In addition, we benchmarked pipeline execution time(s) for five seperated pipelines of this workflow with three 30X WGS (Whole Genome Sequencing) datasets: NA12878, NA12891 and NA12892 and explored optimized runtime parameters for GATK4 tools, PairHMM thread scalability in HaplotypeCaller, GATK4 thread scalability for PGC in MarkDuplicates and execution times comparison for GATK4 SortSam vs SortSamSpark and MarkDuplicates vs MarkDuplicatesSpark.
We found the real execution time for similar WGS datasets with different size and features showed consistency and execution time and dataset size were roughly positive correlated. The optimal threads number is 12 for GATK4 HaplotypeCaller in ERC mode, giving rise to 12.4% speed-up. The optimal PGC threads number is 2 for GATK4 MarkDuplicates. And, multi-threading with Spark local runner highly speeded up GATK4 tool execution. SortSamSpark enabled 16 local cores gave rise to a speed-up of 83.6%. MarkDuplicatesSpark enabled 16 local cores gave rise to a speed-up of 22.2% and 37.3%, seperately with and without writing metrics file.
With detailed virtural machine setting up, optimized parameters and GATK4 performance benchmarking data, this thesis is a guide for implementation of GATK4 Best Practices germline SNPs and Indels short variants discovery workflow on CSC cPouta cloud platform
Increasing Compression Ratio of Low Complexity Compressive Sensing Video Encoder with Application-Aware Configurable Mechanism
With the development of embedded video acquisition nodes and wireless video
surveillance systems, traditional video coding methods could not meet the needs
of less computing complexity any more, as well as the urgent power consumption.
So, a low-complexity compressive sensing video encoder framework with
application-aware configurable mechanism is proposed in this paper, where novel
encoding methods are exploited based on the practical purposes of the real
applications to reduce the coding complexity effectively and improve the
compression ratio (CR). Moreover, the group of processing (GOP) size and the
measurement matrix size can be configured on the encoder side according to the
post-analysis requirements of an application example of object tracking to
increase the CR of encoder as best as possible. Simulations show the proposed
framework of encoder could achieve 60X of CR when the tracking successful rate
(SR) is still keeping above 90%.Comment: 5 pages with 6figures and 1 table,conferenc
Thermal Bremsstrahlung Radiation in a Two-Temperature Plasma
In the normal one-temperature plasma the motion of ions is usually neglected
when calculating the Bremsstrahlung radiation of the plasma. Here we calculate
the Bremsstrahlung radiation of a two-temperature plasma by taking into account
of the motion of ions. Our results show that the total radiation power is
always lower if the motion of ions is considered. We also apply the
two-temperature Bremsstrahlung radiation mechanism for an analytical
Advection-Dominated Accretion Flow (ADAF) model; we find the two-temperature
correction to the total Bremsstrahlung radiation for ADAF is negligible.Comment: 5 pages, 4 figures, accepted for publication in CHJAA. Some
discussions and references adde
High-Field Shubnikov-de Haas Oscillations in the Topological Insulator BiTeSe
We report measurements of the surface Shubnikov de Haas oscillations (SdH) on
crystals of the topological insulator BiTeSe. In crystals with large
bulk resistivity (4 cm at 4 K), we observe 15 surface SdH
oscillations (to the = 1 Landau Level) in magnetic fields up to 45
Tesla. Extrapolating to the limit , we confirm the -shift
expected from a Dirac spectrum. The results are consistent with a very small
surface Lande -factor.Comment: Text expanded, slight changes in text, final version; Total 6 pages,
6 figure
- …