48 research outputs found

    Complex networks theory for analyzing metabolic networks

    Full text link
    One of the main tasks of post-genomic informatics is to systematically investigate all molecules and their interactions within a living cell so as to understand how these molecules and the interactions between them relate to the function of the organism, while networks are appropriate abstract description of all kinds of interactions. In the past few years, great achievement has been made in developing theory of complex networks for revealing the organizing principles that govern the formation and evolution of various complex biological, technological and social networks. This paper reviews the accomplishments in constructing genome-based metabolic networks and describes how the theory of complex networks is applied to analyze metabolic networks.Comment: 13 pages, 2 figure

    Insight into the Stability of Cross-β Amyloid Fibril from VEALYL Short Peptide with Molecular Dynamics Simulation

    Get PDF
    Amyloid fibrils are found in many fatal neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, type II diabetes, and prion disease. The VEALYL short peptide from insulin has been confirmed to aggregate amyloid-like fibrils. However, the aggregation mechanism of amyloid fibril is poorly understood. Here, we utilized molecular dynamics simulation to analyse the stability of VEALYL hexamer. The statistical results indicate that hydrophobic residues play key roles in stabilizing VEALYL hexamer. Single point and two linkage mutants confirmed that Val1, Leu4, and Tyr5 of VEALYL are key residues. The consistency of the results for the VEALYL oligomer suggests that the intermediate states might be trimer (3-0) and pentamer(3-2). These results can help us to obtain an insight into the aggregation mechanism of amyloid fibril. These methods can be used to study the stability of amyloid fibril from other short peptides

    Prediction of Protein Modification Sites of Pyrrolidone Carboxylic Acid Using mRMR Feature Selection and Analysis

    Get PDF
    Pyrrolidone carboxylic acid (PCA) is formed during a common post-translational modification (PTM) of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR) and incremental feature selection (IFS). We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations

    Gene-Centric Characteristics of Genome-Wide Association Studies

    Get PDF
    BACKGROUND: The high-throughput genotyping chips have contributed greatly to genome-wide association (GWA) studies to identify novel disease susceptibility single nucleotide polymorphisms (SNPs). The high-density chips are designed using two different SNP selection approaches, the direct gene-centric approach, and the indirect quasi-random SNPs or linkage disequilibrium (LD)-based tagSNPs approaches. Although all these approaches can provide high genome coverage and ascertain variants in genes, it is not clear to which extent these approaches could capture the common genic variants. It is also important to characterize and compare the differences between these approaches. METHODOLOGY/PRINCIPAL FINDINGS: In our study, by using both the Phase II HapMap data and the disease variants extracted from OMIM, a gene-centric evaluation was first performed to evaluate the ability of the approaches in capturing the disease variants in Caucasian population. Then the distribution patterns of SNPs were also characterized in genic regions, evolutionarily conserved introns and nongenic regions, ontologies and pathways. The results show that, no mater which SNP selection approach is used, the current high-density SNP chips provide very high coverage in genic regions and can capture most of known common disease variants under HapMap frame. The results also show that the differences between the direct and the indirect approaches are relatively small. Both have similar SNP distribution patterns in these gene-centric characteristics. CONCLUSIONS/SIGNIFICANCE: This study suggests that the indirect approaches not only have the advantage of high coverage but also are useful for studies focusing on various functional SNPs either in genes or in the conserved regions that the direct approach supports. The study and the annotation of characteristics will be helpful for designing and analyzing GWA studies that aim to identify genetic risk factors involved in common diseases, especially variants in genes and conserved regions

    Differences between Human Plasma and Serum Metabolite Profiles

    Get PDF
    BACKGROUND: Human plasma and serum are widely used matrices in clinical and biological studies. However, different collecting procedures and the coagulation cascade influence concentrations of both proteins and metabolites in these matrices. The effects on metabolite concentration profiles have not been fully characterized. METHODOLOGY/PRINCIPAL FINDINGS: We analyzed the concentrations of 163 metabolites in plasma and serum samples collected simultaneously from 377 fasting individuals. To ensure data quality, 41 metabolites with low measurement stability were excluded from further analysis. In addition, plasma and corresponding serum samples from 83 individuals were re-measured in the same plates and mean correlation coefficients (r) of all metabolites between the duplicates were 0.83 and 0.80 in plasma and serum, respectively, indicating significantly better stability of plasma compared to serum (p = 0.01). Metabolite profiles from plasma and serum were clearly distinct with 104 metabolites showing significantly higher concentrations in serum. In particular, 9 metabolites showed relative concentration differences larger than 20%. Despite differences in absolute concentration between the two matrices, for most metabolites the overall correlation was high (mean r = 0.81±0.10), which reflects a proportional change in concentration. Furthermore, when two groups of individuals with different phenotypes were compared with each other using both matrices, more metabolites with significantly different concentrations could be identified in serum than in plasma. For example, when 51 type 2 diabetes (T2D) patients were compared with 326 non-T2D individuals, 15 more significantly different metabolites were found in serum, in addition to the 25 common to both matrices. CONCLUSIONS/SIGNIFICANCE: Our study shows that reproducibility was good in both plasma and serum, and better in plasma. Furthermore, as long as the same blood preparation procedure is used, either matrix should generate similar results in clinical and biological studies. The higher metabolite concentrations in serum, however, make it possible to provide more sensitive results in biomarker detection

    Prediction of Deleterious Non-Synonymous SNPs Based on Protein Interaction Network and Hybrid Properties

    Get PDF
    Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies

    A Reliable Method to Remove Batch Effects Maintaining Group Differences in Lymphoma Methylation Case Study

    No full text
    The amount of biological data is increasing and their analysis is becoming one of the most challenging topics in the information sciences. Before starting the analysis it is important to remove unwanted variability due to some factors such as: year of sequencing, laboratory conditions and use of different protocols. This is a crucial step because if the variability is not evaluated before starting the analysis of interest, the results may be undesirable and the conclusion can not be true. The literature suggests to use some valid mathematical models, but experience shows that applying these to high-throughput data with a non-uniform study design is not straightforward and in many cases it may introduce a false signal. Therefore it is necessary to develop models that allow to remove the effects that can negatively influence the study preserving biological meaning. In this paper we report a new case study related lymphoma methylation data and we propose a suitable pipeline for its analysis
    corecore