303 research outputs found

    Predict collagen hydroxyproline sites using support vector machines.

    Get PDF
    addresses: School of Biosciences, University of Exeter, Exeter, United Kingdom. [email protected]: Journal ArticleThis is a copy of an article published in the Journal of Computational Biology © 2009 copyright Mary Ann Liebert, Inc.; Journal of Computational Biology is available online at: http://online.liebertpub.com.Collagen hydroxyproline is an important posttranslational modification activity because of its close relationship with various diseases and signaling activities. However, there is no study to date for constructing models for predicting collagen hydroxyproline sites. Support vector machines with two kernel functions (the identity kernel function and the bio-kernel function) have been used for constructing models for predicting collagen hydroxyproline sites in this study. The models are constructed based on 37 sequences collected from NCBI. Peptide data are generated using a sliding window with various sizes to scan the sequences. Fivefold cross-validation is used for model evaluation. The best model has specificity of 70% and sensitivity of 90%

    Prediction and Analysis of Protein Hydroxyproline and Hydroxylysine

    Get PDF
    BACKGROUND: Hydroxylation is an important post-translational modification and closely related to various diseases. Besides the biotechnology experiments, in silico prediction methods are alternative ways to identify the potential hydroxylation sites. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we developed a novel sequence-based method for identifying the two main types of hydroxylation sites--hydroxyproline and hydroxylysine. First, feature selection was made on three kinds of features consisting of amino acid indices (AAindex) which includes various physicochemical properties and biochemical properties of amino acids, Position-Specific Scoring Matrices (PSSM) which represent evolution information of amino acids and structural disorder of amino acids in the sliding window with length of 13 amino acids, then the prediction model were built using incremental feature selection method. As a result, the prediction accuracies are 76.0% and 82.1%, evaluated by jackknife cross-validation on the hydroxyproline dataset and hydroxylysine dataset, respectively. Feature analysis suggested that physicochemical properties and biochemical properties and evolution information of amino acids contribute much to the identification of the protein hydroxylation sites, while structural disorder had little relation to protein hydroxylation. It was also found that the amino acid adjacent to the hydroxylation site tends to exert more influence than other sites on hydroxylation determination. CONCLUSIONS/SIGNIFICANCE: These findings may provide useful insights for exploiting the mechanisms of hydroxylation

    A Study of Raman Spectroscopy as a Clinical Diagnostic Tool for the Detection of Lynch Syndrome/Hereditary NonPolyposis Colorectal Cancer (HNPCC)

    Get PDF
    Lynch syndrome also known as hereditary non-polyposis colorectal cancer (HNPCC) is a highly penetrant hereditary form of colorectal cancer that accounts for approximately 3% of all cases. It is caused by mutations in DNA mismatch repair resulting in accelerated adenoma to carcinoma progression. The current clinical guidelines used to identify Lynch Syndrome (LS) are known to be too stringent resulting in overall underdiagnoses. Raman spectroscopy is a powerful analytical tool used to probe the molecular vibrations of a sample to provide a unique chemical fingerprint. The potential of using Raman as a diagnostic tool for discriminating LS from sporadic adenocarcinoma is explored within this thesis. A number of experimental parameters were initially optimized for use with formalin fixed paraffin embedded colonic tissue (FFPE). This has resulted in the development of a novel cost-effective backing substrate shown to be superior to the conventionally used calcium fluoride (CaF2). This substrate is a form of silanized super mirror stainless steel that was found to have a much lower Raman background, enhanced Raman signal and complete paraffin removal from FFPE tissues. Performance of the novel substrate was compared against CaF2 by acquiring large high resolution Raman maps from FFPE rat and human colonic tissue. All of the major histological features were discerned from steel mounted tissue with the benefit of clear lipid signals without paraffin obstruction. Biochemical signals were comparable to those obtained on CaF2 with no detectable irregularities. By using principal component analysis to reduce the dimensionality of the dataset it was then possible to use linear discriminant analysis to build a classification model for the discrimination of normal colonic tissue (n=10) from two pathological groups: LS (n=10) and sporadic adenocarcinoma (n=10). Using leaveone-map-out cross-validation of the model classifier has shown that LS was predicted with a sensitivity of 63% and a specificity of 89% - values that are competitive with classification techniques applied routinely in clinical practice

    Fully Atomistic Modelling of Collagen Cross-linking

    Get PDF
    The extracellular matrix (ECM) undergoes progressive age-related stiffening and loss of proteolytic digestibility due to an increase in concentration of advanced glycation end products (AGEs). Detrimental collagen stiffening properties are believed to play a significant role in several age-related diseases such as osteoporosis and cardiovascular disease. Currently little is known of the potential location of covalently cross-linked AGEs formation within collagen molecules; neither are there reports on how the respective cross-link sites affect the physical and biochemical properties of collagen. Using fully atomistic molecular dynamics simulations (MD) we have identified preferential sites for exothermic formation of two lysine-arginine derived AGEs, glucosepane and DOGDIC. Identification of these favourable sites enables us to align collagen cross-linking with experimentally observed changes to the ECM. For example, formation of both AGEs were found to be energetically favourable within close proximity of the Matrix Metalloproteinase-1 (MMP1) binding site, which could potentially disrupt collagen degradation. With the aid of a number of dynamic analysis techniques we have provided an explanation for the site specificity of the two AGE cross-links. The mechanical properties of collagen were also investigated through the use of steered MD to determine the effect of the cross-links presence. Additionally the effect of the sequence on the collagen mechanical properties was also investigated, owing to the heterogeneous response of collagen to an applied load. A homology model for the Homo sapiens sequence was developed from the crystal structure of the Rattus norvegicus structure that was shown to produce stable simulations. Through the use of the homology model and implementation of a novel simulation technique we attempted to ascertain the orientations of the collagen molecules within a fibril, that is currently below the resolution limit of experimental techniques

    Interpretation of Mutations, Expression, Copy Number in Somatic Breast Cancer: Implications for Metastasis and Chemotherapy

    Get PDF
    Breast cancer (BC) patient management has been transformed over the last two decades due to the development and application of genome-wide technologies. The vast amounts of data generated by these assays, however, create new challenges for accurate and comprehensive analysis and interpretation. This thesis describes novel methods for fluorescence in-situ hybridization (FISH), array comparative genomic hybridization (aCGH), and next generation DNA- and RNA-sequencing, to improve upon current approaches used for these technologies. An ab initio algorithm was implemented to identify genomic intervals of single copy and highly divergent repetitive sequences that were applied to FISH and aCGH probe design. FISH probes with higher resolution than commercially available reagents were developed and validated on metaphase chromosomes. An aCGH microarray was developed that had improved reproducibility compared to the standard Agilent 44K array, which was achieved by placing oligonucleotide probes distant from conserved repetitive sequences. Splicing mutations are currently underrepresented in genome-wide sequencing analyses, and there are limited methods to validate genome-wide mutation predictions. This thesis describes Veridical, a program developed to statistically validate aberrant splicing caused by a predicted mutation. Splicing mutation analysis was performed on a large subset of BC patients previously analyzed by the Cancer Genome Atlas. This analysis revealed an elevated number of splicing mutations in genes involved in NCAM pathways in basal-like and HER2-enriched lymph node positive tumours. Genome-wide technologies were leveraged further to develop chemosensitivity models that predict BC response to paclitaxel and gemcitabine. A type of machine learning, called support vector machines (SVM), was used to create predictive models from small sets of biologically-relevant genes to drug disposition or resistance. SVM models generated were able to predict sensitivity in two groups of independent patient data. High variability between individuals requires more accurate and higher resolution genomic data. However the data themselves are insufficient; also needed are more insightful analytical methods to fully exploit these data. This dissertation presents both improvements in data quality and accuracy as well as analytical procedures, with the aim of detecting and interpreting critical genomic abnormalities that are hallmarks of BC subtypes, metastasis and therapy response

    Machine learning classification models for fetal skeletal development performance prediction using maternal bone metabolic proteins in goats

    Get PDF
    Background: In developing countries, maternal undernutrition is the major intrauterine environmental factor contributing to fetal development and adverse pregnancy outcomes. Maternal nutrition restriction (MNR) in gestation has proven to impact overall growth, bone development, and proliferation and metabolism of mesenchymal stem cells in offspring. However, the efficient method for elucidation of fetal bone development performance through maternal bone metabolic biochemical markers remains elusive. Methods: We adapted goats to elucidate fetal bone development state with maternal serum bone metabolic proteins under malnutrition conditions in mid- and late-gestation stages. We used the experimental data to create 72 datasets by mixing different input features such as one-hot encoding of experimental conditions, metabolic original data, experimental-centered features and experimental condition probabilities. Seven Machine Learning methods have been used to predict six fetal bone parameters (weight, length, and diameter of femur/humerus). Results: The results indicated that MNR influences fetal bone development (femur and humerus) and fetal bone metabolic protein levels (C-terminal telopeptides of collagen I, CTx, in middle-gestation and N-terminal telopeptides of collagen I, NTx, in late-gestation), and maternal bone metabolites (low bone alkaline phosphatase, BALP, in middle-gestation and high BALP in late-gestation). The results show the importance of experimental conditions (ECs) encoding by mixing the information with the serum metabolic data. The best classification models obtained for femur weight (Fw) and length (FI), and humerus weight (Hw) are Support Vector Machines classifiers with the leave-one-out cross-validation accuracy of 1. The rest of the accuracies are 0.98, 0.946 and 0.696 for the diameter of femur (Fd), diameter and length of humerus (Hd, Hl), respectively. With the feature importance analysis, the moving averages mixed ECs are generally more important for the majority of the models. The moving average of parathyroid hormone (PTH) within nutritional conditions (MA-PTH-experim) is important for Fd, Hd and Hl prediction models but its removal for enhancing the Fw, Fl and Hw model performance. Further, using one feature models, it is possible to obtain even more accurate models compared with the feature importance analysis models. In conclusion, the machine learning is an efficient method to confirm the important role of PTH and BALP mixed with nutritional conditions for fetal bone growth performance of goats. All the Python scripts including results and comments are available into an open repository at https://gitlab.com/muntisa/goat-bones-machine-learning

    Evaluation of chronic wounds by raman spectroscopy and image processing

    Get PDF
    Diabetic foot ulcer has become a major healthcare problem as the prevalence of diabetes and the related complications increase globally. Due to the underlying pathological abnormalities in diabetic patients, these ulcers do not heal in a timely and orderly fashion as acute wounds do. Objective and accurate assessment of wound healing status is needed to deliver better wound care to patients.In this research, we utilize near-infrared Raman spectroscopy to study tissue samples from diabetic foot ulcers on a small cohort of patients. We categorized wounds as healing or non-healing, harvested samples from wound debridement and collected Raman spectra from cryosectioned samples. The average spectrum of samples from healing wounds shows higher intensities at bands associated with collagen and other proteins while the non-healing group shows higher intensities at bands associated with red blood cells. Significant spectral features such as individual band intensities and pairwise intensity ratios were identified by performing unpaired t-tests between these two groups. Supervised classification using a support vector machine (SVM) classifier was conducted to classify the spectra or samples based on the spectral features. The trained SVM classifier is able to predict a spectrum’s category with 85.2% accuracy. The prediction of whether a sample is from a healing or non-healing wound can be as accurate as 95.7% when the average spectrum of the sample was fed to the SVM classifier.Since the quantification of the wound area is a common clinical practice, we also applied image processing techniques to accurately detect the wound boundary in digital images of the wound. Our method derives from a combination of color based image analysis algorithms, and the method is validated by comparing the performance with manually traced boundaries of wounds in animal models and human wounds of diverse patients. Images were taken by an inexpensive digital camera under variable lighting conditions. Approximately 100 patient images and 50 animal images were analyzed and high overlap was achieved between manual tracings and calculated wound areas by our method. The simplicity of our method combined with its robustness suggests that it can be a valuable tool in clinical wound evaluations.Ph.D., Biomedical Engineering -- Drexel University, 201

    Identification of biomarkers for the prediction of radiation toxicity in prostate cancer patients

    Get PDF
    The success of radiotherapy in tumour control depends on the total dose given. However, the tolerance of the normal tissues surrounding the tumour limits this dose. It is not known why some patients develop radiation toxicity and, currently, it is not possible to predict before treatment which patients will experience adverse effects. Thus, there is an unmet clinical need for a new test to identify patients at risk of radiation toxicity. The aim of this study was to determine if spectral variations in blood lymphocytes from PCa patients may suggest Raman spectral bands that could be used in future research to identify spectral features associated with radiosensitivity. Blood samples were collected retrospectively from 42 patients enrolled on the Cancer Trials Ireland ICORG 08-17 study who had undergone radiotherapy for prostate cancer and had shown either severe or no/minimal late radiation toxicity in follow-up. Radiation response was assessed following in-vitro irradiation using Raman micro-spectroscopy in addition to the G2 chromosomal radiosensitivity assay and the γH2AX DNA damage assay. A partial least squares discriminant analysis model was developed to classify patients using known radiation toxicity scores. Following this retrospective study, blood samples were collected prospectively from 51 patients also enrolled on the ICORG 08- 17 study. These samples were collected prior to radiotherapy and these patients were categorised based on severe or no/minimal late radiation toxicity in follow-up. Radiation response was assessed following in-vitro irradiation using Raman micro-spectroscopy in addition to the G2 chromosomal radiosensitivity assay and the γH2AX DNA damage assay. A partial least squares discriminant analysis model was developed to predict radiation toxicity. Finally, blood samples were collected prospectively prior to radiotherapy from another 30 patients enrolled on the Northern Ireland Cancer Trials Centre SPORT study for prostate cancer and these patients were also categorised based on severe or no/minimal late radiation toxicity in follow-up. Radiation response was assessed following in-vitro irradiation using Raman micro-spectroscopy in addition to the citrulline assay. A partial least squares discriminant analysis model was again developed to predict radiation toxicity. Prediction of radiation toxicity outcome could not be achieved based on late radiation toxicity in the cohort of prostate cancer patients enrolled on the ICORG 08-17 study, but some success in predicting radiation toxicity could be achieved based on late radiation toxicity in the cohort of prostate cancer patients enrolled on the Northern Ireland Cancer Trials Centre SPORT study. The patients from the ICORG 08-17 study will be followed up at 6 monthly intervals until Year 9 however, and those from the SPORT study will be followed up every 6 months for up to 5 years with a minimum annual follow-up from 5- 10 years, allowing the models to be updated as patient clinical status changes. In the future, this technology may have potential to lead to individualized patient radiotherapy by identifying patients that are at risk of radiation toxicity
    • …
    corecore