60 research outputs found
GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species
Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC\u27s data structure and algorithms are valuable for accelerating large-scale genomic research
Emergent electric field control of phase transformation in oxide superlattices.
Electric fields can transform materials with respect to their structure and properties, enabling various applications ranging from batteries to spintronics. Recently electrolytic gating, which can generate large electric fields and voltage-driven ion transfer, has been identified as a powerful means to achieve electric-field-controlled phase transformations. The class of transition metal oxides provide many potential candidates that present a strong response under electrolytic gating. However, very few show a reversible structural transformation at room-temperature. Here, we report the realization of a digitally synthesized transition metal oxide that shows a reversible, electric-field-controlled transformation between distinct crystalline phases at room-temperature. In superlattices comprised of alternating one-unit-cell of SrIrO3 and La0.2Sr0.8MnO3, we find a reversible phase transformation with a 7% lattice change and dramatic modulation in chemical, electronic, magnetic and optical properties, mediated by the reversible transfer of oxygen and hydrogen ions. Strikingly, this phase transformation is absent in the constituent oxides, solid solutions and larger period superlattices. Our findings open up this class of materials for voltage-controlled functionality
XCloud-VIP: Virtual Peak Enables Highly Accelerated NMR Spectroscopy and Faithful Quantitative Measures
Background: Nuclear Magnetic Resonance (NMR) spectroscopy is an important
bio-engineering tool to determine the metabolic concentrations, molecule
structures and so on. The data acquisition time, however, is very long in
multi-dimensional NMR. To accelerate data acquisition, non-uniformly sampling
is an effective way but may encounter severe spectral distortions and
unfaithful quantitative measures when the acceleration factor is high.
Objective: To reconstruct high fidelity spectra from highly accelerated NMR and
achieve much better quantitative measures. Methods: A virtual peak (VIP)
approach is proposed to self-learn the prior spectral information, such as the
central frequency and peak lineshape, and then feed these information into the
reconstruction. The proposed method is further implemented with cloud computing
to facilitate online, open, and easy access. Results: Results on synthetic and
experimental data demonstrate that, compared with the state-of-the-art method,
the new approach provides much better reconstruction of low-intensity peaks and
significantly improves the quantitative measures, including the regression of
peak intensity, the distances between nuclear pairs, and concentrations of
metabolics in mixtures. Conclusion: Self-learning prior peak information can
improve the reconstruction and quantitative measures of spectra. Significance:
This approach enables highly accelerated NMR and may promote time-consuming
applications such as quantitative and time-resolved NMR experiments
Crystal structure of the N domain of Lon protease from Mycobacterium avium complex.
Lon protease is evolutionarily conserved in prokaryotes and eukaryotic organelles. The primary function of Lon is to selectively degrade abnormal and certain regulatory proteins to maintain the homeostasis in vivo. Lon mainly consists of three functional domains and the N-terminal domain is required for the substrate selection and recognition. However, the precise contribution of the N-terminal domain remains elusive. Here, we determined the crystal structure of the N-terminal 192-residue construct of Lon protease from Mycobacterium avium complex at 2.4 å resolution,and measured NMR-relaxation parameters of backbones. This structure consists of two subdomains, the β-strand rich N-terminal subdomain and the five-helix bundle of C-terminal subdomain, connected by a flexible linker,and is similar to the overall structure of the N domain of Escherichia coli Lon even though their sequence identity is only 26%. The obtained NMR-relaxation parameters reveal two stabilized loops involved in the structural packing of the compact N domain and a turn structure formation. The performed homology comparison suggests that structural and sequence variations in the N domain may be closely related to the substrate selectivity of Lon variants. Our results provide the structure and dynamics characterization of a new Lon N domain, and will help to define the precise contribution of the Lon N-terminal domain to the substrate recognition
The effect of turbulent intermittency on the deflagration to detonation transition in SN Ia explosions
We examine the effects of turbulent intermittency on the deflagration to
detonation transition (DDT) in Type Ia supernovae. The Zel'dovich mechanism for
DDT requires the formation of a nearly isothermal region of mixed ash and fuel
that is larger than a critical size. We primarily consider the hypothesis by
Khokhlov et al. and Niemeyer and Woosley that the nearly isothermal, mixed
region is produced when the flame makes the transition to the distributed
regime. We use two models for the distribution of the turbulent velocity
fluctuations to estimate the probability as a function of the density in the
exploding white dwarf that a given region of critical size is in the
distributed regime due to strong local turbulent stretching of the flame
structure. We also estimate lower limits on the number of such regions as a
function of density. We find that the distributed regime, and hence perhaps
DDT, occurs in a local region of critical size at a density at least a factor
of 2-3 larger than predicted for mean conditions that neglect intermittency.
This factor brings the transition density to be much larger than the empirical
value from observations in most situations. We also consider the intermittency
effect on the more stringent conditions for DDT by Lisewski et al. and Woosley.
We find that a turbulent velocity of cm/s in a region of size cm,
required by Lisewski et al., is rare. We expect that intermittency gives a
weaker effect on the Woosley model with stronger criterion. The predicted
transition density from this criterion remains below g/cm after
accounting for intermittency using our intermittency models.Comment: 31 pages, accepted for publication in Ap
Genetic variability and population divergence of Rhododendron platypodum Diels in China in the context of conservation
Genetic diversity in endangered species is of special significance in the face of escalating global climate change and alarming biodiversity declines. Rhododendron platypodum Diels, an endangered species endemic to China, is distinguished by its restricted geographical range. This study aimed to explore genetic diversity and differentiation among its populations, gathering samples from all four distribution sites: Jinfo Mountain (JFM), Zhaoyun Mountain (ZYM), Baima Mountain (BMM), and Mao’er Mountain (MEM). We employed 18 pairs of Simple Sequence Repeat (SSR) primers to ascertain the genetic diversity and structural characteristics of these samples and further utilized 19 phenotypic data points to corroborate the differentiation observed among the populations. These primers detected 52 alleles, with the average number of observed alleles (Na) being 2.89, the average number of effective alleles (Ne) being 2.12, the average observed heterozygosity (Ho) being 0.57, and the expected heterozygosity (He) being 0.50. This array of data demonstrates the efficacy of the primers in reflecting R. platypodum’s genetic diversity. SSR-based genetic analysis of the populations yielded Ho, He, and Shannon index (I) values ranging from 0.47 to 0.65, 0.36 to 0.46, and 0.53 to 0.69, respectively. Notably, the ZYM population emerged as the most genetically diverse. Further analysis, incorporating molecular variance, principal component analysis, UPGMA cluster analysis, and structure analysis, highlighted significant genetic differentiation between the Chongqing (BMM, JFM, ZYM) and Guangxi (MEM) populations. Morphological data analysis corroborated these findings. Additionally, marked genetic and morphological distinctions were evident among the three Chongqing populations (BMM, JFM, and ZYM). This suggests that, despite the observed regional differentiation, R. platypodum’s overall genetic diversity is relatively constrained compared to other species within the Rhododendron genus. Consequently, R. platypodum conservation hinges critically on preserving its genetic diversity and protecting its distinct populations
OCRDetector: Accurately Detecting Open Chromatin Regions via Plasma Cell-Free DNA Sequencing Data
Open chromatin regions (OCRs) are special regions of the human genome that can be accessed by DNA regulatory elements. Several studies have reported that a series of OCRs are associated with mechanisms involved in human diseases, such as cancers. Identifying OCRs using ATAC-seq or DNase-seq is often expensive. It has become popular to detect OCRs from plasma cell-free DNA (cfDNA) sequencing data, because both the fragmentation modes of cfDNA and the sequencing coverage in OCRs are significantly different from those in other regions. However, it is a challenging computational problem to accurately detect OCRs from plasma cfDNA-seq data, as multiple factors—e.g., sequencing and mapping bias, insufficient read depth, etc.—often mislead the computational model. In this paper, we propose a novel bioinformatics pipeline, OCRDetector, for detecting OCRs from whole-genome cfDNA sequencing data. The pipeline calculates the window protection score (WPS) waveform and the cfDNA sequencing coverage. To validate the proposed pipeline, we compared the percentage overlap of our OCRs with those obtained by other methods. The experimental results show that 81% of the TSS regions of housekeeping genes are detected, and our results have obvious tissue specificity. In addition, the overlap percentage between our OCRs and the high-confidence OCRs obtained by ATAC-seq or DNase-seq is greater than 70%
- …