215 research outputs found

    Cooperativity among Short Amyloid Stretches in Long Amyloidogenic Sequences

    Get PDF
    Amyloid fibrillar aggregates of polypeptides are associated with many neurodegenerative diseases. Short peptide segments in protein sequences may trigger aggregation. Identifying these stretches and examining their behavior in longer protein segments is critical for understanding these diseases and obtaining potential therapies. In this study, we combined machine learning and structure-based energy evaluation to examine and predict amyloidogenic segments. Our feature selection method discovered that windows consisting of long amino acid segments of ∼30 residues, instead of the commonly used short hexapeptides, provided the highest accuracy. Weighted contributions of an amino acid at each position in a 27 residue window revealed three cooperative regions of short stretch, resemble the β-strand-turn-β-strand motif in A-βpeptide amyloid and β-solenoid structure of HET-s(218–289) prion (C). Using an in-house energy evaluation algorithm, the interaction energy between two short stretches in long segment is computed and incorporated as an additional feature. The algorithm successfully predicted and classified amyloid segments with an overall accuracy of 75%. Our study revealed that genome-wide amyloid segments are not only dependent on short high propensity stretches, but also on nearby residues

    Spatial and topological organization of DNA chains induced by gene co-localization

    Get PDF
    Transcriptional activity has been shown to relate to the organization of chromosomes in the eukaryotic nucleus and in the bacterial nucleoid. In particular, highly transcribed genes, RNA polymerases and transcription factors gather into discrete spatial foci called transcription factories. However, the mechanisms underlying the formation of these foci and the resulting topological order of the chromosome remain to be elucidated. Here we consider a thermodynamic framework based on a worm-like chain model of chromosomes where sparse designated sites along the DNA are able to interact whenever they are spatially close-by. This is motivated by recurrent evidence that there exists physical interactions between genes that operate together. Three important results come out of this simple framework. First, the resulting formation of transcription foci can be viewed as a micro-phase separation of the interacting sites from the rest of the DNA. In this respect, a thermodynamic analysis suggests transcription factors to be appropriate candidates for mediating the physical interactions between genes. Next, numerical simulations of the polymer reveal a rich variety of phases that are associated with different topological orderings, each providing a way to increase the local concentrations of the interacting sites. Finally, the numerical results show that both one-dimensional clustering and periodic location of the binding sites along the DNA, which have been observed in several organisms, make the spatial co-localization of multiple families of genes particularly efficient.Comment: Figures and Supplementary Material freely available on http://dx.doi.org/10.1371/journal.pcbi.100067

    Prediction of Protein Modification Sites of Pyrrolidone Carboxylic Acid Using mRMR Feature Selection and Analysis

    Get PDF
    Pyrrolidone carboxylic acid (PCA) is formed during a common post-translational modification (PTM) of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR) and incremental feature selection (IFS). We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations

    Position-Specific Analysis and Prediction of Protein Pupylation Sites Based on Multiple Features

    Get PDF

    A study of carboxylic ester hydrolases: structural classification, properties, and database

    Get PDF
    The carboxylic ester hydrolases (CEHs) are enzymes that hydrolyze an ester bond to form a carboxylic acid and an alcohol. They are one of the enzyme groups that are most explored industrially for their applications in the food, flavor, pharmaceutical, organic synthesis, and detergent industries. We classified CEHs into families and clans according to their amino acid sequences (primary structures) and three-dimensional structures (tertiary structures). Our work has established the systematic structural classification of the CEHs. Primary structures of family members are similar to each other, and their active sites and reaction mechanisms are conserved. The tertiary structures of members of each clan, which is composed of different families, remain very similar, although amino acid sequences of members of different families are not similar. CEHs were divided into 127 families by use of BLAST, with 67 families being grouped into seven clans. Multiple sequence alignment and tertiary structures superposition were used, and active sites and reaction mechanisms were analyzed. Python and Shell scripts were implemented to automate the process of comparing CEH primary and tertiary structures. A comprehensive database, CASTLE (CArboxylic eSTer hydroLasEs), may be constructed to provide the primary and tertiary structures of CEHs. This database would be available at www.castle.enzyme.iastate.edu and will be accessible to the entire biology community

    Hierarchical representation for PPI sites prediction

    Get PDF
    Background: Protein–protein interactions have pivotal roles in life processes, and aberrant interactions are associated with various disorders. Interaction site identification is key for understanding disease mechanisms and design new drugs. Effective and efficient computational methods for the PPI prediction are of great value due to the overall cost of experimental methods. Promising results have been obtained using machine learning methods and deep learning techniques, but their effectiveness depends on protein representation and feature selection. Results: We define a new abstraction of the protein structure, called hierarchical representations, considering and quantifying spatial and sequential neighboring among amino acids. We also investigate the effect of molecular abstractions using the Graph Convolutional Networks technique to classify amino acids as interface and no-interface ones. Our study takes into account three abstractions, hierarchical representations, contact map, and the residue sequence, and considers the eight functional classes of proteins extracted from the Protein–Protein Docking Benchmark 5.0. The performance of our method, evaluated using standard metrics, is compared to the ones obtained with some state-of-the-art protein interface predictors. The analysis of the performance values shows that our method outperforms the considered competitors when the considered molecules are structurally similar. Conclusions: The hierarchical representation can capture the structural properties that promote the interactions and can be used to represent proteins with unknown structures by codifying only their sequential neighboring. Analyzing the results, we conclude that classes should be arranged according to their architectures rather than functions

    Three-dimensional Folding of Eukaryotic Genomes

    Get PDF
    Chromatin packages eukaryotic genomes via a hierarchical series of folding steps, encrypting multiple layers of epigenetic information, which are capable of regulating nuclear transactions in response to complex signals in environment. Besides the 1-dimensinal chromatin landscape such as nucleosome positioning and histone modifications, little is known about the secondary chromatin structures and their functional consequences related to transcriptional regulation and DNA replication. The family of chromosomal conformation capture (3C) assays has revolutionized our understanding of large-scale chromosome folding with the ability to measure relative interaction probability between genomic loci in vivo. However, the suboptimal resolution of the typical 3C techniques leaves the levels of nucleosome interactions or 30 nm structures inaccessible, and also restricts their applicability to study gene level of chromatin folding in small genome organisms such as yeasts, worm, and plants. To uncover the “blind spot” of chromatin organization, I developed an innovative method called Micro-C and an improved protocol, Micro-C XL, which enable to map chromatin structures at all range of scale from single nucleosome to the entire genome. Several fine-scale aspects of chromatin folding in budding and fission yeasts have been identified by Micro-C, including histone tail-mediated tri-/tetra-nucleosome stackings, gene crumples/globules, and chromosomally-interacting domains (CIDs). CIDs are spatially demarcated by the boundaries, which are colocalized with the promoters of actively transcribed genes and histone marks for active transcription or turnover. The levels of chromatin compaction are regulated via transcription-dependent or transcription-independent manner – either the perturbations of transcription or the mutations of chromatin regulators strongly affect the global chromatin folding. Taken together, Micro-C further reveals chromatin folding behaviors below the sub-kilobase scale and opens an avenue to study chromatin organization in many biological systems

    Alternative Splicing and Protein Structure Evolution

    Get PDF
    In den letzten Jahren gab es in verschiedensten Bereichen der Biologie einen dramatischen Anstieg verfügbarer, experimenteller Daten. Diese erlauben zum ersten Mal eine detailierte Analyse der Funktionsweisen von zellulären Komponenten wie Genen und Proteinen, die Analyse ihrer Verknüpfung in zellulären Netzwerken sowie der Geschichte ihrer Evolution. Insbesondere der Bioinformatik kommt hier eine wichtige Rolle in der Datenaufbereitung und ihrer biologischen Interpretation zu. In der vorliegenden Doktorarbeit werden zwei wichtige Bereiche der aktuellen bioinformatischen Forschung untersucht, nämlich die Analyse von Proteinstrukturevolution und Ähnlichkeiten zwischen Proteinstrukturen, sowie die Analyse von alternativem Splicing, einem integralen Prozess in eukaryotischen Zellen, der zur funktionellen Diversität beiträgt. Insbesondere führen wir mit dieser Arbeit die Idee einer kombinierten Analyse der beiden Mechanismen (Strukturevolution und Splicing) ein. Wir zeigen, dass sich durch eine kombinierte Betrachtung neue Einsichten gewinnen lassen, wie Strukturevolution und alternatives Splicing sowie eine Kopplung beider Mechanismen zu funktioneller und struktureller Komplexität in höheren Organismen beitragen. Die in der Arbeit vorgestellten Methoden, Hypothesen und Ergebnisse können dabei einen Beitrag zu unserem Verständnis der Funktionsweise von Strukturevolution und alternativem Splicing bei der Entstehung komplexer Organismen leisten wodurch beide, traditionell getrennte Bereiche der Bioinformatik in Zukunft voneinander profitieren können

    Nondestructive Multivariate Classification of Codling Moth Infested Apples Using Machine Learning and Sensor Fusion

    Get PDF
    Apple is the number one on the list of the most consumed fruits in the United States. The increasing market demand for high quality apples and the need for fast, and effective quality evaluation techniques have prompted research into the development of nondestructive evaluation methods. Codling moth (CM), Cydia pomonella L. (Lepidoptera: Tortricidae), is the most devastating pest of apples. Therefore, this dissertation is focused on the development of nondestructive methods for the detection and classification of CM-infested apples. The objective one in this study was aimed to identify and characterize the source of detectable vibro-acoustic signals coming from CM-infested apples. A novel approach was developed to correlate the larval activities to low-frequency vibro-acoustic signals, by capturing the larval activities using a digital camera while simultaneously registering the signal patterns observed in the contact piezoelectric sensors on apple surface. While the larva crawling was characterized by the low amplitude and higher frequency (around 4 Hz) signals, the chewing signals had greater amplitude and lower frequency (around 1 Hz). In objective two and three, vibro-acoustic and acoustic impulse methods were developed to classify CM-infested and healthy apples. In the first approach, the identified vibro-acoustic patterns from the infested apples were used for the classification of the CM-infested and healthy signal data. The classification accuracy was as high as 95.94% for 5 s signaling time. For the acoustic impulse method, a knocking test was performed to measure the vibration/acoustic response of the infested apple fruit to a pre-defined impulse in comparison to that of a healthy sample. The classification rate obtained was 99% for a short signaling time of 60-80 ms. In objective four, shortwave near infrared hyperspectral imaging (SWNIR HSI) in the wavelength range of 900-1700 nm was applied to detect CM infestation at the pixel level for the three apple cultivars reaching an accuracy of up to 97.4%. In objective five, the physicochemical characteristics of apples were predicted using HSI method. The results showed the correlation coefficients of prediction (Rp) up to 0.90, 0.93, 0.97, and 0.91 for SSC, firmness, pH and moisture content, respectively. Furthermore, the effect of long-term storage (20 weeks) at three different storage conditions (0 °C, 4 °C, and 10 °C) on CM infestation and the detectability of the infested apples was studied. At a constant storage temperature the detectability of infested samples remained the same for the first three months then improved in the fourth month followed by a decrease until the end of the storage. Finally, a sensor data fusion method was developed which showed an improvement in the classification performance compared to the individual methods. These findings indicated there is a high potential of acoustic and NIR HSI methods for detecting and classifying CM infestation in different apple cultivars
    corecore