166 research outputs found

    Protein Design and Chromatin Structure: Novel Computational Approaches

    No full text
    Constructing fitness landscape has broad implication in molecular evolution, cellular epigenetic state, and protein design. We studied the problem of constructing fitness landscape of inverse protein folding. Computational inverse protein folding or protein design aims to generate amino acid sequences that fold into an a priori determined structural fold for engineering novel or enhanced biochemistry. For this task, a function describing the fitness landscape of sequences is critical to identify correct ones that fold into the desired structure. In this study, we showed that nonlinear fitness function for protein design can be significantly improved. Using a rectangular kernel with a basis set of proteins and decoys chosen a priori, we obtained a simplified nonlinear kernel function via a finite Newton method. The full landscape for a large number of protein folds can be captured using only 480 native proteins and 3,200 non-protein decoys. A blind test of a simplified version of sequence design was carried out to discriminate simultaneously 428 native sequences not homologous to any training proteins from 11 million challenging protein-like decoys. This simplified fitness function correctly classified 408 native sequences (20 misclassifications, 95% correct rate), which outperforms several other statistical linear scoring function and optimized linear function. The performance is also comparable with results obtained from a far more complex nonlinear fitness function with > 5,000 terms. Our results further suggested that for the task of global sequence design of 428 selected proteins, the search space of protein shape and sequence can be effectively parametrized with just about 3,680 carefully chosen basis set of proteins and decoys, and we showed in addition that the overall landscape is not overly sensitive to the specific choice of this set. Our results can be generalized to construct fitness landscape. Chromosome Conformation Capture (3C)-based technologies are used to detect pairs of loci located on the same chromosome or on different chromosomes that are in close spatial proximity. There are some biases may affect the 3C-based experimental procedure, including the non-alternative primer design and the distance between restriction sites. To overcome these biases, we propose a general novel constrained self- avoiding chromatin (C-SAC) model to remove non-specific physical interactions and develop a sequential importance sampling algorithm to rebuild 3D chromatin structures based on 5C experiments, and apply this approach to the ENCODE region ENm008 α-globin gene domain on human chromosome 16 for the lymphoblastoid cell (GM12878) and the chronic myelogenous leukemia cell (K562). We successfully removed non-specific physical interactions from the 5C reads for both two cells by our random ensemble generated by C-SAC model. We found that α-globin gene domain is a compact globule in the GM12878 cell, and it is formed two separate domains in the K562 cell. We not only recover most of 5C indicated proximity interactions, but also find new proximity interactions which 5C experiments can not detect. We got 77% coverage interactions by comparing with ChIA-PET measurements. Based on the ensemble of the reconstructed 3D conformations, we also proposed one mechanism which may explain why α-globin gene is inactive in the GM12878 cell and active in the K562 cell

    Amplification of Hofmeister Effect by Alcohols

    No full text
    We have demonstrated that Hofmeister effect can be amplified by adding alcohols to aqueous solutions. The lower critical solution temperature behavior of poly­(<i>N</i>-isopropylacrylamide) has been employed as the model system to study the amplification of Hofmeister effect. The alcohols can more effectively amplify the Hofmeister effect following the series methanol < ethanol < 1-propanol < 2-propanol for the monohydric alcohols and following the series d-sorbitol ≈ xylitol ≈ meso-erythritol < glycerol < ethylene glycol < methanol for the polyhydric alcohols. Our study reveals that the relative extent of amplification of Hofmeister effect is determined by the stability of the water/alcohol complex, which is strongly dependent on the chemical structure of alcohols. The more stable solvent complex formed via stronger hydrogen bonds can more effectively differentiate the anions through the anion–solvent complex interactions, resulting in a stronger amplification of Hofmeister effect. This study provides an alternative method to tune the relative strength of Hofmeister effect besides salt concentration

    The nuclear architecture of budding yeast and the mC-SAC model genome.

    No full text
    <p><b>(A)</b> Schematic representation of the nucleus and nuclear landmarks of budding yeast and their corresponding coordinates and dimensions (not to scale). <b>(B)</b> An example 3D structure of mC-SAC genome confined in the cell nucleus. <b>(C)</b> Correlation between genome-wide chromatin conformation capture interaction frequencies and interaction frequencies measured from the fully-constrained ensemble of model yeast genomes. <b>(D)</b> Heat map of interaction frequencies measured in the fully-constrained ensemble. Darker color indicates higher interaction frequency. <b>(E)</b> Heat map of interaction frequencies from the experimental measurements [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005658#pcbi.1005658.ref016" target="_blank">16</a>]. <b>(F)</b> Heat map of simulated interactions from the fully-constrained ensemble, with only interactions between restriction fragments of the genome-wide 3C experiment [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005658#pcbi.1005658.ref016" target="_blank">16</a>] are shown for direct comparison. <b>(G)</b> Heat map of interaction frequencies of the fully-constrained ensemble that are corrected after removal of expected interaction frequencies obtained from an ensemble generated using only nuclear confinement and excluded-volume as constraints. <b>(H)</b> Heat map of interaction frequencies of the genome-wide 3C experiments that are corrected after removal of expected interaction frequencies. <b>(I)</b> Correlation of interaction frequencies between genome-wide 3C data and from the fully-constrained ensemble, after removal of expected interactions as obtained from an ensemble generated using only nuclear confinement and excluded-volume as constraints.</p

    Interactions among fragile sites and their distribution in the budding yeast genome.

    No full text
    <p><b>(A)</b> Mean interaction frequency between fragile sites (shown as thick green line) and the histogram of mean interaction frequencies between 10,000 sets of 95 random sites. <b>(B)</b> The distribution of fragile sites in the 16 chromosomes. <b>(C)</b> Heat map of interaction frequencies between fragile sites as computed from the fully-constrained ensemble. The length of each chromosome is proportional to the number of fragile sites it contains. All high frequency interactions (red) are predicted to occur between different chromosomes, except those on the diagonal. <b>(D)</b> The distribution of fragile sites by their genomic distances to the corresponding centromeres.</p

    Relationship between genomic and spatial positions of eight genes.

    No full text
    <p><b>(A)</b> The correlation between the relative positions of these genes measured by electron microscopy [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005658#pcbi.1005658.ref006" target="_blank">6</a>] (<i>x</i>-axis) and by fully-constrained ensemble (<i>y</i>-axis). <b>(B)</b> The relationship between the experimentally measured relative spatial positions of the important genes and their distance to the corresponding centromeres. The two locations of genes that correlate poorly are on Chr12 and telomere, which are subject to nucleolus and telomere attachment constraints. <b>(C)</b> The same relationship can be seen from computationally generated fully-constrained ensemble. <b>(D)</b> Heat map of interaction frequencies of Artificial Genome 1 (AG1) with 16 total chromosomes. <b>(E)</b> Heat map of interaction frequencies Artificial Genome 2 (AG2) with 12 total chromosomes. <b>(F)</b> The correlation between the relative position of the genes measured experimentally and measured from AG1 (blue) and AG2(red) ensembles. <b>(G)</b> The relationship between the relative positions of the genes measured from AG1 (blue) and AG2 (red) ensembles and their distances to the corresponding centromeres. The distances of these genes to their corresponding centromeres in artificial nuclei are different from each other and are all different from their corresponding distances in real yeast nuclei, as we assign random genomic coordinates to the centromeres in the artificial nuclei. <b>(H)</b> The correlation between the relative positions of the genes measured by electron microscopy [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005658#pcbi.1005658.ref006" target="_blank">6</a>] and by “with only centromere” ensemble. <b>(I)</b> The same correlation between the positions measured by electron microscopy [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005658#pcbi.1005658.ref006" target="_blank">6</a>] and in the “without centromere” ensemble.</p

    Detection and Quantification of Bacterial Spoilage in Milk and Pork Meat Using MALDI-TOF-MS and Multivariate Analysis

    No full text
    Microbiological safety is one of the cornerstones of quality control in the food industry. Identification and quantification of spoilage bacteria in pasteurized milk and meat in the food industry currently relies on accurate and sensitive yet time-consuming techniques which give retrospective values for microbial contamination. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS), a proven technique in the field of protein and peptide identification and quantification, may be a valuable alternative approach for the rapid assessment of microbial spoilage. In this work we therefore developed MALDI-TOF-MS as a novel analytical approach for the assessment of food that when combined with chemometrics allows for the detection and quantification of milk and pork meat spoilage bacteria. To develop this approach, natural spoilage of pasteurized milk and raw pork meat samples incubated at 15 °C and at room temperature, respectively, was conducted. Samples were collected for MALDI-TOF-MS analysis (which took 4 min per sample) at regular time intervals throughout the spoilage process, with concurrent calculation and documentation of reference total viable counts using traditional microbiological methods (these took 2 days). Multivariate statistical techniques such as principal component discriminant function analysis, canonical correlation analysis, partial least-squares (PLS) regression, and kernel PLS (KPLS) were used to analyze the data. The results from MALDI-TOF-MS combined with PLS or KPLS gave excellent bacterial quantification results for both milk and meat spoilage, and typical root mean squared errors for prediction in test spectra were between 0.53 and 0.79 log unit. Overall these novel findings strongly indicate that MALDI-TOF-MS when combined with chemometric approaches would be a useful adjunct for routine use in the milk and meat industry as a fast and accurate viable bacterial detection and quantification method

    tRNA gene interactions and differentiating biologically specific interactions from non-specific interactions arising from polymer effects.

    No full text
    <p><b>(A)</b> Distribution of frequencies of interactions enriched in the fully-constrained ensemble, but absent in the genome-wide 3C data. The 14 novel interactions with significantly more interaction frequencies are encircled. The <i>x</i>-axis values are the interaction frequencies and the <i>y</i>-axis values are the number of interactions that these frequencies are observed. <b>(B)</b> Histogram of enrichment factor of RNAPIII and TFIIS binding. Mean enrichment of predicted interactions are shown as the solid green line, along with the histogram of enrichment of 10,000 random sets of 14 interactions. <b>(C)</b> Distribution of mean-spatial distances between tRNA genes grouped according to their genomic distances to centromeres. <b>(D)</b> The distribution of tRNA genes by their genomic distances to the corresponding centromeres. <b>(E)</b> Interaction propensities of genome-wide 3C data (<i>x</i>-axis) and the fully-constrained ensemble (<i>y</i>-axis) calculated using a random ensemble as the null model. Interactions enriched in the genome-wide 3C data over the fully-constrained ensemble are enclosed in the black circle.</p

    Effects of confinement on the overall folding behavior of budding yeast genome.

    No full text
    <p><b>(A)</b> Overall correlation coefficient of the frequencies between genome-wide 3C measurements and modeled ensemble. As the nuclear size increases, correlation generally decreases. <b>(B)</b> Effects of nuclear size and chromosomal arm length on the median distances between telomeres. Relationships between arm length and median telomere distances at different nuclear sizes for the fully-constrained ensemble, with different telomeres as references are shown. Two linear regimens become one linear regime as <i>D</i> increases from 2 <i>μ</i>m to 4 and to 16 <i>μ</i>m.</p

    Spatial organization of the budding yeast genome in the cell nucleus and identification of specific chromatin interactions from multi-chromosome constrained chromatin model

    No full text
    <div><p>Nuclear landmarks and biochemical factors play important roles in the organization of the yeast genome. The interaction pattern of budding yeast as measured from genome-wide 3C studies are largely recapitulated by model polymer genomes subject to landmark constraints. However, the origin of inter-chromosomal interactions, specific roles of individual landmarks, and the roles of biochemical factors in yeast genome organization remain unclear. Here we describe a multi-chromosome constrained self-avoiding chromatin model (mC-SAC) to gain understanding of the budding yeast genome organization. With significantly improved sampling of genome structures, both intra- and inter-chromosomal interaction patterns from genome-wide 3C studies are accurately captured in our model at higher resolution than previous studies. We show that nuclear confinement is a key determinant of the intra-chromosomal interactions, and centromere tethering is responsible for the inter-chromosomal interactions. In addition, important genomic elements such as fragile sites and tRNA genes are found to be clustered spatially, largely due to centromere tethering. We uncovered previously unknown interactions that were not captured by genome-wide 3C studies, which are found to be enriched with tRNA genes, RNAPIII and TFIIS binding. Moreover, we identified specific high-frequency genome-wide 3C interactions that are unaccounted for by polymer effects under landmark constraints. These interactions are enriched with important genes and likely play biological roles.</p></div

    Portable, Quantitative Detection of <i>Bacillus</i> Bacterial Spores Using Surface-Enhanced Raman Scattering

    No full text
    Portable rapid detection of pathogenic bacteria such as <i>Bacillus</i> is highly desirable for safety in food manufacture and under the current heightened risk of biological terrorism. Surface-enhanced Raman scattering (SERS) is becoming the preferred analytical technique for bacterial detection, due to its speed of analysis and high sensitivity. However in seeking methods offering the lowest limits of detection, the current research has tended toward highly confocal, microscopy-based analysis, which requires somewhat bulky instrumentation and precisely synthesized SERS substrates. By contrast, in this study we have improved SERS for bacterial analyses using silver colloidal substrates, which are easily and cheaply synthesized in bulk, and which we shall demonstrate permit analysis using portable instrumentation. All analyses were conducted in triplicate to assess the reproducibility of this approach, which was excellent. We demonstrate that SERS is able to detect and quantify rapidly the dipicolinate (DPA) biomarker for <i>Bacillus</i> spores at 5 ppb (29.9 nM) levels which are significantly lower than those previously reported for SERS and well below the infective dose of 10<sup>4</sup> <i>B. anthracis</i> cells for inhalation anthrax. Finally we show the potential of multivariate data analysis to improve detection levels in complex DPA extracts from viable spores
    corecore