303 research outputs found
Predict collagen hydroxyproline sites using support vector machines.
addresses: School of Biosciences, University of Exeter, Exeter, United Kingdom. [email protected]: Journal ArticleThis is a copy of an article published in the Journal of Computational Biology © 2009 copyright Mary Ann Liebert, Inc.; Journal of Computational Biology is available online at: http://online.liebertpub.com.Collagen hydroxyproline is an important posttranslational modification activity because of its close relationship with various diseases and signaling activities. However, there is no study to date for constructing models for predicting collagen hydroxyproline sites. Support vector machines with two kernel functions (the identity kernel function and the bio-kernel function) have been used for constructing models for predicting collagen hydroxyproline sites in this study. The models are constructed based on 37 sequences collected from NCBI. Peptide data are generated using a sliding window with various sizes to scan the sequences. Fivefold cross-validation is used for model evaluation. The best model has specificity of 70% and sensitivity of 90%
Prediction and Analysis of Protein Hydroxyproline and Hydroxylysine
BACKGROUND: Hydroxylation is an important post-translational modification and closely related to various diseases. Besides the biotechnology experiments, in silico prediction methods are alternative ways to identify the potential hydroxylation sites. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we developed a novel sequence-based method for identifying the two main types of hydroxylation sites--hydroxyproline and hydroxylysine. First, feature selection was made on three kinds of features consisting of amino acid indices (AAindex) which includes various physicochemical properties and biochemical properties of amino acids, Position-Specific Scoring Matrices (PSSM) which represent evolution information of amino acids and structural disorder of amino acids in the sliding window with length of 13 amino acids, then the prediction model were built using incremental feature selection method. As a result, the prediction accuracies are 76.0% and 82.1%, evaluated by jackknife cross-validation on the hydroxyproline dataset and hydroxylysine dataset, respectively. Feature analysis suggested that physicochemical properties and biochemical properties and evolution information of amino acids contribute much to the identification of the protein hydroxylation sites, while structural disorder had little relation to protein hydroxylation. It was also found that the amino acid adjacent to the hydroxylation site tends to exert more influence than other sites on hydroxylation determination. CONCLUSIONS/SIGNIFICANCE: These findings may provide useful insights for exploiting the mechanisms of hydroxylation
A Study of Raman Spectroscopy as a Clinical Diagnostic Tool for the Detection of Lynch Syndrome/Hereditary NonPolyposis Colorectal Cancer (HNPCC)
Lynch syndrome also known as hereditary non-polyposis colorectal cancer (HNPCC) is a highly penetrant hereditary form of colorectal cancer that accounts for approximately 3% of all cases. It is caused by mutations in DNA mismatch repair resulting in accelerated adenoma to carcinoma progression. The current clinical guidelines used to identify Lynch Syndrome (LS) are known to be too stringent resulting in overall underdiagnoses. Raman spectroscopy is a powerful analytical tool used to probe the molecular vibrations of a sample to provide a unique chemical fingerprint. The potential of using Raman as a diagnostic tool for discriminating LS from sporadic adenocarcinoma is explored within this thesis. A number of experimental parameters were initially optimized for use with formalin fixed paraffin embedded colonic tissue (FFPE). This has resulted in the development of a novel cost-effective backing substrate shown to be superior to the conventionally used calcium fluoride (CaF2). This substrate is a form of silanized super mirror stainless steel that was found to have a much lower Raman background, enhanced Raman signal and complete paraffin removal from FFPE tissues. Performance of the novel substrate was compared against CaF2 by acquiring large high resolution Raman maps from FFPE rat and human colonic tissue. All of the major histological features were discerned from steel mounted tissue with the benefit of clear lipid signals without paraffin obstruction. Biochemical signals were comparable to those obtained on CaF2 with no detectable irregularities. By using principal component analysis to reduce the dimensionality of the dataset it was then possible to use linear discriminant analysis to build a classification model for the discrimination of normal colonic tissue (n=10) from two pathological groups: LS (n=10) and sporadic adenocarcinoma (n=10). Using leaveone-map-out cross-validation of the model classifier has shown that LS was predicted with a sensitivity of 63% and a specificity of 89% - values that are competitive with classification techniques applied routinely in clinical practice
Recommended from our members
Engineering Electron Transfer Processes in Oxidoreductases: Applications in Biocatalysis
As the demand for cost-efficient and environmentally friendly processes increases in the chemical industry, impact of biocatalysis, which is the utilization of enzymes and whole microorganisms for production of fine chemicals, has become more predominant. From pharmaceuticals to cosmetics, biocatalysts are widely used in various sectors, and their significance have dramatically intensified with the introduction of initial protein engineering techniques in 1980s. As the field of protein engineering has evolved over the last few decades, its integration with other disciplines such as process engineering and synthetic biology is now more critical for establishing non-natural pathways and reactions to produce broader range of chemicals. While developing an interdisciplinary approach, few strategies have emerged to be more prevalent: (i) better integration of biocatalysts with (nano)devices, and (ii) use of protein based scaffolds for creating novel synthetic multienzyme cascades. Throughout this doctoral thesis, applicability of these ideas with oxidoreductases was investigated. Oxidoreductases are a class of under-utilized enzymes that catalyze the electron transfer between different metabolites, while at the same time use cofactors (NAD(P)(H), molecular oxygen, etc.) as the electron supplier. In Chapter 2, the electron transfer mechanism of a monooxygenase, cytochrome P450 27B1 (CYP27B1), was mimicked for electrochemical sensing of a vitamin D form (25(OH)D) in solution. Natural electron transfer pathway of this enzyme uses NADPH and two electron transfer proteins for conversion of 25(OH)D to its product. Inspired by this mechanism, this enzyme was mixed with an artificial redox mediator and immobilized on an electrode surface. As a result of rigorous experiments, CYP27B1-modified electrode was found to detect 25(OH)D in its physiological range. This is a significant result as it opens a new way for development of a vitamin D biosensor that can diminish the amount of required cost and time for testing. In the next chapter of the thesis, effects of changing the size of cofactor on catalysis of dehydrogenases were studied in detail. Natural cofactors of two different redox enzymes were chemically modified with PEG, and kinetic experiments were conducted in order to better understand the relation between transport phenomena and biocatalysis. It was found that when the size of the cofactor was increased, two enzymes were affected differently; while efficiency of one enzyme was not altered significantly, that of the other dropped dramatically. Through comprehensive analysis, dominant impact of PEGylation was determined to be due to the differences in the interactions of PEGylated cofactors and enzymes. This study showed that protein engineering methods can be utilized to gain insights into better understanding of the relationship between mass transfer and catalysis in engineered bioprocesses and biocatalytic cascades. In Chapter 4, PEGylated cofactors were used to create artificial multienzyme complexes. In this study, SpyCatcher-SpyTag scaffold was utilized for wiring two redox enzymes and by tethering with PEGylated cofactors, a new biocatalyst with self-contained redox chemistry was obtained. Detailed kinetic analysis showed that this new multienzyme cascade was able to catalyze a reaction that was thermodynamically downhill but kinetically very slow in the absence of any enzyme. This also proved that attached cofactor acts as a ‘swing-arm’, carrying electrons from one enzyme to another; similar to the unique mechanism of pyruvate dehydrogenase complex. Generality of this methodology was investigated by constructing an immobilized three-enzyme-containing biocatalyst, which was hypothesized to catalyze an industrially important reaction under very mild conditions. This work is a significant contribution to the field, and a good demonstration of use of protein engineering for process engineering applications. Chapter 5 concludes this thesis with a study that investigates the practicability of a collagen mimetic peptide as a novel way of constructing multiprotein cascades. Collagen mimetic peptides are composed of three individual strands that might (homotrimer) or might not (heterotrimer) have identical sequences, and in this work, we have utilized a recently designed hydroxyproline-free sequences of a heterotrimer collagen mimetic peptide. Individual strands were attached to different proteins by genetic fusion, and optimum experimental conditions for self-assembly of a multiprotein complex were investigated. Initial results suggested formation of such a complex, but further experiments are required to finalize the confirmation. This new collagen-based platform studied in this chapter is a crucial step towards development of cofactorless multienzyme cascades. Finally, this doctoral thesis demonstrates the prominence of protein engineering in biocatalysis applications by utilizing various strategies together with the electron transfer mechanisms of oxidoreductases. By expanding and building upon these methodologies, it is possible to obtain more improved biosensors and functional artificial multienzyme cascades with industrial applications. Hence, this study is a promising example to exhibit the impact of interdisciplinary approach on industrial biotechnology
Fully Atomistic Modelling of Collagen Cross-linking
The extracellular matrix (ECM) undergoes progressive age-related stiffening and loss of proteolytic digestibility due to an increase in concentration of advanced glycation end products (AGEs). Detrimental collagen stiffening properties are believed to play a significant role in several age-related diseases such as osteoporosis and cardiovascular disease. Currently little is known of the potential location of covalently cross-linked AGEs formation within collagen molecules; neither are there reports on how the respective cross-link sites affect the physical and biochemical properties of collagen. Using fully atomistic molecular dynamics simulations (MD) we have identified preferential sites for exothermic formation of two lysine-arginine derived AGEs, glucosepane and DOGDIC. Identification of these favourable sites enables us to align collagen cross-linking with experimentally observed changes to the ECM. For example, formation of both AGEs were found to be energetically favourable within close proximity of the Matrix Metalloproteinase-1 (MMP1) binding site, which could potentially disrupt collagen degradation. With the aid of a number of dynamic analysis techniques we have provided an explanation for the site specificity of the two AGE cross-links. The mechanical properties of collagen were also investigated through the use of steered MD to determine the effect of the cross-links presence. Additionally the effect of the sequence on the collagen mechanical properties was also investigated, owing to the heterogeneous response of collagen to an applied load. A homology model for the Homo sapiens sequence was developed from the crystal structure of the Rattus norvegicus structure that was shown to produce stable simulations. Through the use of the homology model and implementation of a novel simulation technique we attempted to ascertain the orientations of the collagen molecules within a fibril, that is currently below the resolution limit of experimental techniques
Interpretation of Mutations, Expression, Copy Number in Somatic Breast Cancer: Implications for Metastasis and Chemotherapy
Breast cancer (BC) patient management has been transformed over the last two decades due to the development and application of genome-wide technologies. The vast amounts of data generated by these assays, however, create new challenges for accurate and comprehensive analysis and interpretation. This thesis describes novel methods for fluorescence in-situ hybridization (FISH), array comparative genomic hybridization (aCGH), and next generation DNA- and RNA-sequencing, to improve upon current approaches used for these technologies. An ab initio algorithm was implemented to identify genomic intervals of single copy and highly divergent repetitive sequences that were applied to FISH and aCGH probe design. FISH probes with higher resolution than commercially available reagents were developed and validated on metaphase chromosomes. An aCGH microarray was developed that had improved reproducibility compared to the standard Agilent 44K array, which was achieved by placing oligonucleotide probes distant from conserved repetitive sequences.
Splicing mutations are currently underrepresented in genome-wide sequencing analyses, and there are limited methods to validate genome-wide mutation predictions. This thesis describes Veridical, a program developed to statistically validate aberrant splicing caused by a predicted mutation. Splicing mutation analysis was performed on a large subset of BC patients previously analyzed by the Cancer Genome Atlas. This analysis revealed an elevated number of splicing mutations in genes involved in NCAM pathways in basal-like and HER2-enriched lymph node positive tumours. Genome-wide technologies were leveraged further to develop chemosensitivity models that predict BC response to paclitaxel and gemcitabine. A type of machine learning, called support vector machines (SVM), was used to create predictive models from small sets of biologically-relevant genes to drug disposition or resistance. SVM models generated were able to predict sensitivity in two groups of independent patient data.
High variability between individuals requires more accurate and higher resolution genomic data. However the data themselves are insufficient; also needed are more insightful analytical methods to fully exploit these data. This dissertation presents both improvements in data quality and accuracy as well as analytical procedures, with the aim of detecting and interpreting critical genomic abnormalities that are hallmarks of BC subtypes, metastasis and therapy response
Machine learning classification models for fetal skeletal development performance prediction using maternal bone metabolic proteins in goats
Background: In developing countries, maternal undernutrition is the major intrauterine environmental factor contributing to fetal development and adverse pregnancy outcomes. Maternal nutrition restriction (MNR) in gestation has proven to impact overall growth, bone development, and proliferation and metabolism of mesenchymal stem cells in offspring. However, the efficient method for elucidation of fetal bone development performance through maternal bone metabolic biochemical markers remains elusive. Methods: We adapted goats to elucidate fetal bone development state with maternal serum bone metabolic proteins under malnutrition conditions in mid- and late-gestation stages. We used the experimental data to create 72 datasets by mixing different input features such as one-hot encoding of experimental conditions, metabolic original data, experimental-centered features and experimental condition probabilities. Seven Machine Learning methods have been used to predict six fetal bone parameters (weight, length, and diameter of femur/humerus). Results: The results indicated that MNR influences fetal bone development (femur and humerus) and fetal bone metabolic protein levels (C-terminal telopeptides of collagen I, CTx, in middle-gestation and N-terminal telopeptides of collagen I, NTx, in late-gestation), and maternal bone metabolites (low bone alkaline phosphatase, BALP, in middle-gestation and high BALP in late-gestation). The results show the importance of experimental conditions (ECs) encoding by mixing the information with the serum metabolic data. The best classification models obtained for femur weight (Fw) and length (FI), and humerus weight (Hw) are Support Vector Machines classifiers with the leave-one-out cross-validation accuracy of 1. The rest of the accuracies are 0.98, 0.946 and 0.696 for the diameter of femur (Fd), diameter and length of humerus (Hd, Hl), respectively. With the feature importance analysis, the moving averages mixed ECs are generally more important for the majority of the models. The moving average of parathyroid hormone (PTH) within nutritional conditions (MA-PTH-experim) is important for Fd, Hd and Hl prediction models but its removal for enhancing the Fw, Fl and Hw model performance. Further, using one feature models, it is possible to obtain even more accurate models compared with the feature importance analysis models. In conclusion, the machine learning is an efficient method to confirm the important role of PTH and BALP mixed with nutritional conditions for fetal bone growth performance of goats. All the Python scripts including results and comments are available into an open repository at https://gitlab.com/muntisa/goat-bones-machine-learning
Recommended from our members
Elucidating Structure and Dynamics of Extracellular Matrix Collagen Using Solid-State NMR
In recent years, solid-state Nuclear Magnetic Resonance (NMR) has emerged as an established spectroscopic method to afford detailed structural information on native cellular and extracellular components at atomic-scale resolution. Fibrillar collagens are the most common component of the extracellular matrix (ECM), comprising up to 20% by weight of the human body and is found in most of the tissues. Due to their diverse structures and compositions, collagens serve many functions, providing structural and mechanical support for surrounding cells, and playing important roles in cell-to-cell communication. Nonetheless, despite being at first glance a simple protein formed by three homologous polypeptide chains of repeating three-amino-acid triads trimerised into a triple helix, it is a highly versatile and complex system. Due to the complexity and size of the triple helix, the scientific community still lacks understanding of collagen structure, flexibility and dynamics at the atomic level, in spite of today’s advances in technology. The combination of C, N-labelled amino acid enrichment of in-vitro or in-vivo materials with two-dimensional solid-state NMR spectroscopy potentially provides a more detailed understanding of the complex collagen structure and dynamics at atomic resolution. Furthermore, our knowledge of undesirable structural changes within the extracellular matrix, such as non-enzymatic glycation reactions with reducing sugars, is limited. Glycation-modified extracellular matrix (ECM) leads to abnormal cell behaviour and widespread cell necrosis, potentially causing numerous health complications, e.g. in diabetic patients. Solid-state NMR is a powerful probe to study these structural changes.
The work presented in this thesis demonstrates how solid-state NMR can be used to study the effects of genetic and glycation chemistry on the molecular structure and dynamics of the collagen. We employed a selection of synthetic model peptides that contain a variation of the native sequence representing normal and defected collagen triple-helical compositions to assess the backbone motions via the use of the N T relaxation. Further, we use U-C,N-isotopically enriched collagen ECM samples to investigate the conformational and dynamic changes after glycation of the hydrophilic and hydrophobic regions of the collagen fibrils. Finally, we propose a methodology that can be employed to probe different sites (gap and overlap zones) of the collagen fibrils in their native state which can be exploited to detect less abundant species found in the collagen protein.EPSR
Evaluation of chronic wounds by raman spectroscopy and image processing
Diabetic foot ulcer has become a major healthcare problem as the prevalence of diabetes and the related complications increase globally. Due to the underlying pathological abnormalities in diabetic patients, these ulcers do not heal in a timely and orderly fashion as acute wounds do. Objective and accurate assessment of wound healing status is needed to deliver better wound care to patients.In this research, we utilize near-infrared Raman spectroscopy to study tissue samples from diabetic foot ulcers on a small cohort of patients. We categorized wounds as healing or non-healing, harvested samples from wound debridement and collected Raman spectra from cryosectioned samples. The average spectrum of samples from healing wounds shows higher intensities at bands associated with collagen and other proteins while the non-healing group shows higher intensities at bands associated with red blood cells. Significant spectral features such as individual band intensities and pairwise intensity ratios were identified by performing unpaired t-tests between these two groups. Supervised classification using a support vector machine (SVM) classifier was conducted to classify the spectra or samples based on the spectral features. The trained SVM classifier is able to predict a spectrum’s category with 85.2% accuracy. The prediction of whether a sample is from a healing or non-healing wound can be as accurate as 95.7% when the average spectrum of the sample was fed to the SVM classifier.Since the quantification of the wound area is a common clinical practice, we also applied image processing techniques to accurately detect the wound boundary in digital images of the wound. Our method derives from a combination of color based image analysis algorithms, and the method is validated by comparing the performance with manually traced boundaries of wounds in animal models and human wounds of diverse patients. Images were taken by an inexpensive digital camera under variable lighting conditions. Approximately 100 patient images and 50 animal images were analyzed and high overlap was achieved between manual tracings and calculated wound areas by our method. The simplicity of our method combined with its robustness suggests that it can be a valuable tool in clinical wound evaluations.Ph.D., Biomedical Engineering -- Drexel University, 201
Identification of biomarkers for the prediction of radiation toxicity in prostate cancer patients
The success of radiotherapy in tumour control depends on the total dose given. However, the tolerance of the normal tissues surrounding the tumour limits this dose. It is not known why some patients develop radiation toxicity and, currently, it is not possible to predict before treatment which patients will experience adverse effects. Thus, there is an unmet clinical need for a new test to identify patients at risk of radiation toxicity. The aim of this study was to determine if spectral variations in blood lymphocytes from PCa patients may suggest Raman spectral bands that could be used in future research to identify spectral features associated with radiosensitivity.
Blood samples were collected retrospectively from 42 patients enrolled on the Cancer Trials Ireland ICORG 08-17 study who had undergone radiotherapy for prostate cancer and had shown either severe or no/minimal late radiation toxicity in follow-up. Radiation response was assessed following in-vitro irradiation using Raman micro-spectroscopy in addition to the G2 chromosomal radiosensitivity assay and the γH2AX DNA damage assay. A partial least squares discriminant analysis model was developed to classify patients using known radiation toxicity scores. Following this retrospective study, blood samples were collected prospectively from 51 patients also enrolled on the ICORG 08- 17 study. These samples were collected prior to radiotherapy and these patients were categorised based on severe or no/minimal late radiation toxicity in follow-up. Radiation response was assessed following in-vitro irradiation using Raman micro-spectroscopy in addition to the G2 chromosomal radiosensitivity assay and the γH2AX DNA damage assay.
A partial least squares discriminant analysis model was developed to predict radiation toxicity. Finally, blood samples were collected prospectively prior to radiotherapy from another 30 patients enrolled on the Northern Ireland Cancer Trials Centre SPORT study for prostate cancer and these patients were also categorised based on severe or no/minimal late radiation toxicity in follow-up. Radiation response was assessed following in-vitro irradiation using Raman micro-spectroscopy in addition to the citrulline assay. A partial least squares discriminant analysis model was again developed to predict radiation toxicity.
Prediction of radiation toxicity outcome could not be achieved based on late radiation toxicity in the cohort of prostate cancer patients enrolled on the ICORG 08-17 study, but some success in predicting radiation toxicity could be achieved based on late radiation toxicity in the cohort of prostate cancer patients enrolled on the Northern Ireland Cancer Trials Centre SPORT study. The patients from the ICORG 08-17 study will be followed up at 6 monthly intervals until Year 9 however, and those from the SPORT study will be followed up every 6 months for up to 5 years with a minimum annual follow-up from 5- 10 years, allowing the models to be updated as patient clinical status changes. In the future, this technology may have potential to lead to individualized patient radiotherapy by identifying patients that are at risk of radiation toxicity
- …