38 research outputs found

    Online Linear Extractors for Independent Sources

    Get PDF
    In this work, we characterize online linear extractors. In other words, given a matrix AF2n×nA \in \mathbb{F}_2^{n \times n}, we study the convergence of the iterated process SASX\mathbf{S} \leftarrow A\mathbf{S} \oplus \mathbf{X} , where XD\mathbf{X} \sim D is repeatedly sampled independently from some fixed (but unknown) distribution DD with (min)-entropy at least kk. Here, we think of S{0,1}n\mathbf{S} \in \{0,1\}^n as the state of an online extractor, and X{0,1}n\mathbf{X} \in \{0,1\}^n as its input. As our main result, we show that the state S\mathbf{S} converges to the uniform distribution for all input distributions DD with entropy k>0k > 0 if and only if the matrix AA has no non-trivial invariant subspace (i.e., a non-zero subspace VF2nV \subsetneq \mathbb{F}_2^n such that AVVAV \subseteq V). In other words, a matrix AA yields an online linear extractor if and only if AA has no non-trivial invariant subspace. For example, the linear transformation corresponding to multiplication by a generator of the field F2n\mathbb{F}_{2^n} yields a good online linear extractor. Furthermore, for any such matrix convergence takes at most O~(n2(k+1)/k2)\widetilde{O}(n^2(k+1)/k^2) steps. We also study the more general notion of condensing---that is, we ask when this process converges to a distribution with entropy at least \ell, when the input distribution has entropy greater than kk. (Extractors corresponding to the special case when =n\ell = n.) We show that a matrix gives a good condenser if there are relatively few vectors wF2n\mathbf{w} \in \mathbb{F}_2^n such that w,ATw,,(AT)nk1w\mathbf{w}, A^T\mathbf{w}, \ldots, (A^T)^{n-k-1} \mathbf{w} are linearly dependent. As an application, we show that the very simple cyclic rotation transformation A(x1,,xn)=(xn,x1,,xn1)A(x_1,\ldots, x_n) = (x_n,x_1,\ldots, x_{n-1}) condenses to =n1\ell = n-1 bits for any k>1k > 1 if nn is a prime satisfying a certain simple number-theoretic condition. Our proofs are Fourier-analytic and rely on a novel lemma, which gives a tight bound on the product of certain Fourier coefficients of any entropic distribution

    No Time to Hash: On Super Efficient Entropy Accumulation

    Get PDF
    Real-world random number generators (RNGs) cannot afford to use (slow) cryptographic hashing every time they refresh their state RR with a new entropic input XX. Instead, they use ``superefficient\u27\u27 simple entropy-accumulation procedures, such as Rrotα,n(R)X,R \leftarrow \mathsf{rot}_{\alpha, n}(R) \oplus X, where rotα,n\mathsf{rot}_{\alpha,n} rotates an nn-bit state RR by some fixed number α\alpha. For example, Microsoft\u27s RNG uses α=5\alpha=5 for n=32n=32 and α=19\alpha=19 for n=64n=64. Where do these numbers come from? Are they good choices? Should rotation be replaced by a better permutation π\pi of the input bits? In this work we initiate a rigorous study of these pragmatic questions, by modeling the sequence of successive entropic inputs X1,X2,X_1,X_2,\ldots as independent (but otherwise adversarial) samples from some natural distribution family D{\mathcal D}. Our contribution is as follows. * We define 22-monotone distributions as a rich family D{\mathcal D} that includes relevant real-world distributions (Gaussian, exponential, etc.), but avoids trivial impossibility results. * For any α\alpha with gcd(α,n)=1\gcd(\alpha,n)=1, we show that rotation accumulates Ω(n)\Omega(n) bits of entropy from nn independent samples X1,,XnX_1,\ldots,X_n from any (unknown) 22-monotone distribution with entropy k>1k > 1. * However, we also show that some choices of α\alpha perform much better than others for a given nn. E.g., we show α=19\alpha=19 is one of the best choices for n=64n=64; in contrast, α=5\alpha=5 is good, but generally worse than α=7\alpha=7, for n=32n=32. * More generally, given a permutation π\pi and k1k\ge 1, we define a simple parameter, the covering number Cπ,kC_{\pi,k}, and show that it characterizes the number of steps before the rule (R1,,Rn)(Rπ(1),,Rπ(n))X(R_1,\ldots,R_n)\leftarrow (R_{\pi(1)},\ldots, R_{\pi(n)})\oplus X accumulates nearly nn bits of entropy from independent, 22-monotone samples of min-entropy kk each. * We build a simple permutation π\pi^*, which achieves nearly optimal Cπ,kn/kC_{\pi^*,k}\approx n/k for all values of kk simultaneously, and experimentally validate that it compares favorably with all rotations rotα,n\mathsf{rot}_{\alpha,n}

    Photoactivation of Cu Centers in Metal-Organic Frameworks for Selective CO2 Conversion to Ethanol.

    Get PDF
    CO2 hydrogenation to ethanol is of practical importance but poses a significant challenge due to the need of forming one C-C bond while keeping one C-O bond intact. CuI centers could selectively catalyze CO2-to-ethanol conversion, but the CuI catalytic sites were unstable under reaction conditions. Here we report the use of low-intensity light to generate CuI species in the cavities of a metal-organic framework (MOF) for catalytic CO2 hydrogenation to ethanol. X-ray photoelectron and transient absorption spectroscopies indicate the generation of CuI species via single-electron transfer from photoexcited [Ru(bpy)3]2+-based ligands on the MOF to CuII centers in the cavities and from Cu0 centers to the photoexcited [Ru(bpy)3]2+-based ligands. Upon light activation, this Cu-Ru-MOF hybrid selectively hydrogenates CO2 to EtOH with an activity of 9650 μmol gCu-1 h-1 under 2 MPa of H2/CO2 = 3:1 at 150 °C. Low-intensity light thus generates and stabilizes CuI species for sustained EtOH production

    Effect of additives and moisture on the fermentation quality and bacterial community of high moisture ear corn

    Get PDF
    Maize (Zea mays L) is one of the most widely cultivated crops used as energy feeds. The aim of this study was to evaluate the effects of two lactic acid bacteria additives on the fermentation quality and bacterial community of high moisture ear corn (HMEC) silage at different moisture levels. The study utilized corn kernels and cobs harvested at the stage of complete ripeness as the primary material. The cob was crushed and divided into three treatment groups: an untreated control group (CK), a group treated with a mixture of Lactobacillus plantarum and Lactobacillus brucei (TQ), or a group treated with a mixture of Lactococcus lactis and Lactobacillus brucei (KT). Moisture contents were adjusted to 37.5% (L), 42.5% (M) or 47.5% (H) and then silaged for 180 days. Compared to CK, TQ, and KT elevated the dry matter, crude protein, starch, lactic and acetic acid content of HMEC and reduced the pH, neutral detergent fiber, acid detergent fiber and ammonia nitrogen content (p < 0.05). Even though both additives improved the bacterial community structure after fermentation, KT experienced the greater enhancement. At a phylum and genus level, KT had the higher relative abundance of Firmicutes and Lactobacillus, respectively. Compared with the group of 37.5% (L) moisture content, the 42.5% (M) and 47.5% moisture content (H) group increased lactic acid, acetic acid and ammonia nitrogen concentrations and reduced the pH value (p < 0.05). In conclusion, the addition of TQ and KT at the appropriate moisture content might be helpful for producing high-quality HMEC. Among the three moisture contents, 42.5% (M) moisture content provides the best silage qualities

    Genetic characteristics of common variable immunodeficiency patients with autoimmunity

    Get PDF
    Background: The pathogenesis of common variable immunodeficiency disorder (CVID) is complex, especially when combined with autoimmunity. Genetic factors may be potential explanations for this complex situation, and whole genome sequencing (WGS) provide the basis for this potential.Methods: Genetic information of patients with CVID with autoimmunity, together with their first-degree relatives, was collected through WGS. The association between genetic factors and clinical phenotypes was studied using genetic analysis strategies such as sporadic and pedigree.Results: We collected 42 blood samples for WGS (16 CVID patients and 26 first-degree relatives of healthy controls). Through pedigree, sporadic screening strategies and low-frequency deleterious screening of rare diseases, we obtained 9,148 mutation sites, including 8,171 single-nucleotide variants (SNVs) and 977 Insertion-deletions (InDels). Finally, we obtained a total of 28 candidate genes (32 loci), of which the most common mutant was LRBA. The most common autoimmunity in the 16 patients was systematic lupus erythematosis. Through KEGG pathway enrichment, we identified the top ten signaling pathways, including “primary immunodeficiency”, “JAK-STAT signaling pathway”, and “T-cell receptor signaling pathway”. We used PyMOL to predict and analyse the three-dimensional protein structures of the NFKB1, RAG1, TIRAP, NCF2, and MYB genes. In addition, we constructed a PPI network by combining candidate mutants with genes associated with CVID in the OMIM database via the STRING database.Conclusion: The genetic background of CVID includes not only monogenic origins but also oligogenic effects. Our study showed that immunodeficiency and autoimmunity may overlap in genetic backgrounds.Clinical Trial Registration: identifier ChiCTR210004403

    Impact of AlphaFold on Structure Prediction of Protein Complexes: The CASP15-CAPRI Experiment

    Get PDF
    We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homo-dimers, 3 homo-trimers, 13 hetero-dimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their 5 best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% for the targets compared to 8% two years earlier, a remarkable improvement resulting from the wide use of the AlphaFold2 and AlphaFold-Multimer software. Creative use was made of the deep learning inference engines affording the sampling of a much larger number of models and enriching the multiple sequence alignments with sequences from various sources. Wide use was also made of the AlphaFold confidence metrics to rank models, permitting top performing groups to exceed the results of the public AlphaFold-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem

    Impact of AlphaFold on structure prediction of protein complexes: The CASP15-CAPRI experiment

    Get PDF
    We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo-trimers, 13 heterodimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2-Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem

    Protein contact distance and structure prediction driven by deep learning

    Get PDF
    Proteins, fundamental building blocks of living organisms, play a crucial role in various biological processes. Understanding protein structure is essential for unraveling their functions and designing therapeutics. However, experimentally determining protein structures is time-consuming and expensive, motivating the development of computational methods. The prediction of protein tertiary structure relies to a certain extent on the accurate prediction of protein secondary structure and protein contact/ distance map. A high-quality contact distance prediction is crucial in constructing an ideal protein tertiary structure. Similarly, accurate prediction of distances between protein chains aids in the construction of higher-quality protein complex structures, also known as quaternary structures. In recent years, the advancement of deep learning techniques and the continuous expansion of protein sequence databases has significantly improved the accuracy of protein contact distance prediction, consequently impacting the prediction of protein tertiary and quaternary structures. This dissertation presents four contributions. First, DNSS2, an innovative approach based on one-dimensional deep convolutional networks, is proposed for the accurate prediction of protein secondary structure. Secondly, DeepDist introduces a multi-task deep learning framework that facilitates the prediction of real-valued distances between residues. Thirdly, DeepDist2 represents an enhanced version of the deep learning-based protein distance prediction tool. Finally, CDPred, a 2D attention-based deep neural network is developed to predict inter-chain distances in protein complexes. All the methods are available as software tools or web servers which are freely available to the scientific community.Includes bibliographical references

    Improving AlphaFold2-based protein tertiary structure prediction with MULTICOM in CASP15

    No full text
    Abstract Since the 14th Critical Assessment of Techniques for Protein Structure Prediction (CASP14), AlphaFold2 has become the standard method for protein tertiary structure prediction. One remaining challenge is to further improve its prediction. We developed a new version of the MULTICOM system to sample diverse multiple sequence alignments (MSAs) and structural templates to improve the input for AlphaFold2 to generate structural models. The models are then ranked by both the pairwise model similarity and AlphaFold2 self-reported model quality score. The top ranked models are refined by a novel structure alignment-based refinement method powered by Foldseek. Moreover, for a monomer target that is a subunit of a protein assembly (complex), MULTICOM integrates tertiary and quaternary structure predictions to account for tertiary structural changes induced by protein-protein interaction. The system participated in the tertiary structure prediction in 2022 CASP15 experiment. Our server predictor MULTICOM_refine ranked 3rd among 47 CASP15 server predictors and our human predictor MULTICOM ranked 7th among all 132 human and server predictors. The average GDT-TS score and TM-score of the first structural models that MULTICOM_refine predicted for 94 CASP15 domains are ~0.80 and ~0.92, 9.6% and 8.2% higher than ~0.73 and 0.85 of the standard AlphaFold2 predictor respectively
    corecore