38 research outputs found
Online Linear Extractors for Independent Sources
In this work, we characterize online linear extractors. In other words, given a matrix , we study the convergence of the iterated process , where is repeatedly sampled independently from some fixed (but unknown) distribution with (min)-entropy at least . Here, we think of as the state of an online extractor, and as its input.
As our main result, we show that the state converges to the uniform distribution for all input distributions with entropy if and only if the matrix has no non-trivial invariant subspace (i.e., a non-zero subspace such that ). In other words, a matrix yields an online linear extractor if and only if has no non-trivial invariant subspace. For example, the linear transformation corresponding to multiplication by a generator of the field yields a good online linear extractor. Furthermore, for any such matrix convergence takes at most steps.
We also study the more general notion of condensing---that is, we ask when this process converges to a distribution with entropy at least , when the input distribution has entropy greater than . (Extractors corresponding to the special case when .) We show that a matrix gives a good condenser if there are relatively few vectors such that are linearly dependent. As an application, we show that the very simple cyclic rotation transformation condenses to bits for any if is a prime satisfying a certain simple number-theoretic condition.
Our proofs are Fourier-analytic and rely on a novel lemma, which gives a tight bound on the product of certain Fourier coefficients of any entropic distribution
No Time to Hash: On Super Efficient Entropy Accumulation
Real-world random number generators (RNGs) cannot afford to use (slow) cryptographic hashing every time they refresh their state with a new entropic input . Instead, they use ``superefficient\u27\u27 simple entropy-accumulation procedures, such as
where rotates an -bit state by some fixed number . For example, Microsoft\u27s RNG uses for and for . Where do these numbers come from? Are they good choices?
Should rotation be replaced by a better permutation of the input bits?
In this work we initiate a rigorous study of these pragmatic questions, by modeling the sequence of successive entropic inputs as independent (but otherwise adversarial) samples from some natural distribution family . Our contribution is as follows.
* We define -monotone distributions as a rich family that includes relevant real-world distributions (Gaussian, exponential, etc.), but avoids trivial impossibility results.
* For any with , we show that rotation accumulates bits of entropy from independent samples from any (unknown) -monotone distribution with entropy .
* However, we also show that some choices of perform much better than others for a given . E.g., we show is one of the best choices for ; in contrast, is good, but generally worse than , for .
* More generally, given a permutation and , we define a simple parameter, the covering number , and show that it characterizes the number of steps before the rule
accumulates nearly bits of entropy from independent, -monotone samples of min-entropy each.
* We build a simple permutation , which achieves nearly optimal for all values of simultaneously, and experimentally validate that it compares favorably with all rotations
Photoactivation of Cu Centers in Metal-Organic Frameworks for Selective CO2 Conversion to Ethanol.
CO2 hydrogenation to ethanol is of practical importance but poses a significant challenge due to the need of forming one C-C bond while keeping one C-O bond intact. CuI centers could selectively catalyze CO2-to-ethanol conversion, but the CuI catalytic sites were unstable under reaction conditions. Here we report the use of low-intensity light to generate CuI species in the cavities of a metal-organic framework (MOF) for catalytic CO2 hydrogenation to ethanol. X-ray photoelectron and transient absorption spectroscopies indicate the generation of CuI species via single-electron transfer from photoexcited [Ru(bpy)3]2+-based ligands on the MOF to CuII centers in the cavities and from Cu0 centers to the photoexcited [Ru(bpy)3]2+-based ligands. Upon light activation, this Cu-Ru-MOF hybrid selectively hydrogenates CO2 to EtOH with an activity of 9650 μmol gCu-1 h-1 under 2 MPa of H2/CO2 = 3:1 at 150 °C. Low-intensity light thus generates and stabilizes CuI species for sustained EtOH production
Effect of additives and moisture on the fermentation quality and bacterial community of high moisture ear corn
Maize (Zea mays L) is one of the most widely cultivated crops used as energy feeds. The aim of this study was to evaluate the effects of two lactic acid bacteria additives on the fermentation quality and bacterial community of high moisture ear corn (HMEC) silage at different moisture levels. The study utilized corn kernels and cobs harvested at the stage of complete ripeness as the primary material. The cob was crushed and divided into three treatment groups: an untreated control group (CK), a group treated with a mixture of Lactobacillus plantarum and Lactobacillus brucei (TQ), or a group treated with a mixture of Lactococcus lactis and Lactobacillus brucei (KT). Moisture contents were adjusted to 37.5% (L), 42.5% (M) or 47.5% (H) and then silaged for 180 days. Compared to CK, TQ, and KT elevated the dry matter, crude protein, starch, lactic and acetic acid content of HMEC and reduced the pH, neutral detergent fiber, acid detergent fiber and ammonia nitrogen content (p < 0.05). Even though both additives improved the bacterial community structure after fermentation, KT experienced the greater enhancement. At a phylum and genus level, KT had the higher relative abundance of Firmicutes and Lactobacillus, respectively. Compared with the group of 37.5% (L) moisture content, the 42.5% (M) and 47.5% moisture content (H) group increased lactic acid, acetic acid and ammonia nitrogen concentrations and reduced the pH value (p < 0.05). In conclusion, the addition of TQ and KT at the appropriate moisture content might be helpful for producing high-quality HMEC. Among the three moisture contents, 42.5% (M) moisture content provides the best silage qualities
Genetic characteristics of common variable immunodeficiency patients with autoimmunity
Background: The pathogenesis of common variable immunodeficiency disorder (CVID) is complex, especially when combined with autoimmunity. Genetic factors may be potential explanations for this complex situation, and whole genome sequencing (WGS) provide the basis for this potential.Methods: Genetic information of patients with CVID with autoimmunity, together with their first-degree relatives, was collected through WGS. The association between genetic factors and clinical phenotypes was studied using genetic analysis strategies such as sporadic and pedigree.Results: We collected 42 blood samples for WGS (16 CVID patients and 26 first-degree relatives of healthy controls). Through pedigree, sporadic screening strategies and low-frequency deleterious screening of rare diseases, we obtained 9,148 mutation sites, including 8,171 single-nucleotide variants (SNVs) and 977 Insertion-deletions (InDels). Finally, we obtained a total of 28 candidate genes (32 loci), of which the most common mutant was LRBA. The most common autoimmunity in the 16 patients was systematic lupus erythematosis. Through KEGG pathway enrichment, we identified the top ten signaling pathways, including “primary immunodeficiency”, “JAK-STAT signaling pathway”, and “T-cell receptor signaling pathway”. We used PyMOL to predict and analyse the three-dimensional protein structures of the NFKB1, RAG1, TIRAP, NCF2, and MYB genes. In addition, we constructed a PPI network by combining candidate mutants with genes associated with CVID in the OMIM database via the STRING database.Conclusion: The genetic background of CVID includes not only monogenic origins but also oligogenic effects. Our study showed that immunodeficiency and autoimmunity may overlap in genetic backgrounds.Clinical Trial Registration: identifier ChiCTR210004403
Impact of AlphaFold on Structure Prediction of Protein Complexes: The CASP15-CAPRI Experiment
We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homo-dimers, 3 homo-trimers, 13 hetero-dimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their 5 best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% for the targets compared to 8% two years earlier, a remarkable improvement resulting from the wide use of the AlphaFold2 and AlphaFold-Multimer software. Creative use was made of the deep learning inference engines affording the sampling of a much larger number of models and enriching the multiple sequence alignments with sequences from various sources. Wide use was also made of the AlphaFold confidence metrics to rank models, permitting top performing groups to exceed the results of the public AlphaFold-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem
Impact of AlphaFold on structure prediction of protein complexes: The CASP15-CAPRI experiment
We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo-trimers, 13 heterodimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2-Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem
Protein contact distance and structure prediction driven by deep learning
Proteins, fundamental building blocks of living organisms, play a crucial role in various biological processes. Understanding protein structure is essential for unraveling their functions and designing therapeutics. However, experimentally determining protein structures is time-consuming and expensive, motivating the development of computational methods. The prediction of protein tertiary structure relies to a certain extent on the accurate prediction of protein secondary structure and protein contact/ distance map. A high-quality contact distance prediction is crucial in constructing an ideal protein tertiary structure. Similarly, accurate prediction of distances between protein chains aids in the construction of higher-quality protein complex structures, also known as quaternary structures. In recent years, the advancement of deep learning techniques and the continuous expansion of protein sequence databases has significantly improved the accuracy of protein contact distance prediction, consequently impacting the prediction of protein tertiary and quaternary structures. This dissertation presents four contributions. First, DNSS2, an innovative approach based on one-dimensional deep convolutional networks, is proposed for the accurate prediction of protein secondary structure. Secondly, DeepDist introduces a multi-task deep learning framework that facilitates the prediction of real-valued distances between residues. Thirdly, DeepDist2 represents an enhanced version of the deep learning-based protein distance prediction tool. Finally, CDPred, a 2D attention-based deep neural network is developed to predict inter-chain distances in protein complexes. All the methods are available as software tools or web servers which are freely available to the scientific community.Includes bibliographical references
Improving AlphaFold2-based protein tertiary structure prediction with MULTICOM in CASP15
Abstract Since the 14th Critical Assessment of Techniques for Protein Structure Prediction (CASP14), AlphaFold2 has become the standard method for protein tertiary structure prediction. One remaining challenge is to further improve its prediction. We developed a new version of the MULTICOM system to sample diverse multiple sequence alignments (MSAs) and structural templates to improve the input for AlphaFold2 to generate structural models. The models are then ranked by both the pairwise model similarity and AlphaFold2 self-reported model quality score. The top ranked models are refined by a novel structure alignment-based refinement method powered by Foldseek. Moreover, for a monomer target that is a subunit of a protein assembly (complex), MULTICOM integrates tertiary and quaternary structure predictions to account for tertiary structural changes induced by protein-protein interaction. The system participated in the tertiary structure prediction in 2022 CASP15 experiment. Our server predictor MULTICOM_refine ranked 3rd among 47 CASP15 server predictors and our human predictor MULTICOM ranked 7th among all 132 human and server predictors. The average GDT-TS score and TM-score of the first structural models that MULTICOM_refine predicted for 94 CASP15 domains are ~0.80 and ~0.92, 9.6% and 8.2% higher than ~0.73 and 0.85 of the standard AlphaFold2 predictor respectively