5,790 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationRapidly evolving technologies such as chip arrays and next-generation sequencing are uncovering human genetic variants at an unprecedented pace. Unfortunately, this ever growing collection of gene sequence variation has limited clinical utility without clear association to disease outcomes. As electronic medical records begin to incorporate genetic information, gene variant classification and accurate interpretation of gene test results plays a critical role in customizing patient therapy. To verify the functional impact of a given gene variant, laboratories rely on confirming evidence such as previous literature reports, patient history and disease segregation in a family. By definition variants of uncertain significance (VUS) lack this supporting evidence and in such cases, computational tools are often used to evaluate the predicted functional impact of a gene mutation. This study evaluates leveraging high quality genotype-phenotype disease variant data from 20 genes and 3986 variants, to develop gene-specific predictors utilizing a combination of changes in primary amino acid sequence, amino acid properties as descriptors of mutation severity and Naïve Bayes classification. A Primary Sequence Amino Acid Properties (PSAAP) prediction algorithm was then combined with well established predictors in a weighted Consensus sum in context of gene-specific reference intervals for known phenotypes. PSAAP and Consensus were also used to evaluate known variants of uncertain significance in the RET proto-oncogene as a model gene. The PSAAP algorithm was successfully extended to many genes and diseases. Gene-specific algorithms typically outperform generalized prediction tools. Characteristic mutation properties of a given gene and disease may be lost when diluted into genomewide data sets. A reliable computational phenotype classification framework with quantitative metrics and disease specific reference ranges allows objective evaluation of novel or uncertain gene variants and augments decision making when confirming clinical information is limited

    Classification and Analysis of Regulatory Pathways Using Graph Property, Biochemical and Physicochemical Property, and Functional Property

    Get PDF
    Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) “Metabolism”, (ii) “Genetic Information Processing”, (iii) “Environmental Information Processing”, (iv) “Cellular Processes”, (v) “Organismal Systems”, and (vi) “Human Diseases”. The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area

    The detection of meningococcal disease through identification of antimicrobial peptides using an in silico model creation

    Get PDF
    Philosophiae Doctor - PhDNeisseria meningitidis (the meningococcus), the causative agent of meningococcal disease (MD) was identified in 1887 and despite effective antibiotics and partially effective vaccines, Neisseria meningitidis (N. meningitidis) is the leading cause worldwide of meningitis and rapidly fatal sepsis usually in otherwise healthy individuals. Over 500 000 meningococcal cases occur every year. These numbers have made bacterial meningitis a top ten infectious cause of death worldwide. MD primarily affects children under 5 years of age, although in epidemic outbreaks there is a shift in disease to older children, adolescents and adults. MD is also associated with marked morbidity including limb loss, hearing loss, cognitive dysfunction, visual impairment, educational difficulties, developmental delays, motor nerve deficits, seizure disorders and behavioural problems. Antimicrobial peptides (AMPs) are molecules that provide protection against environmental pathogens, acting against a large number of microorganisms, including bacteria, fungi, yeast and virus. AMPs production is a major component of innate immunity against infection. The chemical properties of AMPs allow them to insert into the anionic cell wall and phospholipid membranes of microorganisms or bind to the bacteria making it easily detectable for diagnostic purposes. AMPs can be exploited for the generation of novel antibiotics, as biomarkers in the diagnosis of inflammatory conditions, for the manipulation of the inflammatory process, wound healing, autoimmunity and in the combat of tumour cells. Due to the severity of meningitis, early detection and identification of the strain of N. meningitidis is vital. Rapid and accurate diagnosis is essential for optimal management of patients and a major problem for MD is its diagnostic difficulties and experts conclude that with an early intervention the patient’ prognosis will be much improved. It is becoming increasingly difficult to confirm the diagnosis of meningococcal infection by conventional methods. Although polymerase chain reaction (PCR) has the potential advantage of providing more rapid confirmation of the presence of the bacterium than culturing, it is still time consuming as well as costly. Introduction of AMPs to bind to N. meningitidis receptors could provide a less costly and time consuming solution to the current diagnostic problems. World Health Organization (WHO) meningococcal meningitis program activities encourage laboratory strengthening to ensure prompt and accurate diagnosis to rapidly confirm the presence of MD. This study aimed to identify a list of putative AMPs showing antibacterial activity to N. meningitidis to be used as ligands against receptors uniquely expressed by the bacterium and for the identified AMPs to be used in a Lateral Flow Device (LFD) for the rapid and accurate diagnosis of MD

    Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy

    Full text link
    Background: It is necessary and essential to discovery protein function from the novel primary sequences. Wet lab experimental procedures are not only time-consuming, but also costly, so predicting protein structure and function reliably based only on amino acid sequence has significant value. TATA-binding protein (TBP) is a kind of DNA binding protein, which plays a key role in the transcription regulation. Our study proposed an automatic approach for identifying TATA-binding proteins efficiently, accurately, and conveniently. This method would guide for the special protein identification with computational intelligence strategies. Results: Firstly, we proposed novel fingerprint features for TBP based on pseudo amino acid composition, physicochemical properties, and secondary structure. Secondly, hierarchical features dimensionality reduction strategies were employed to improve the performance furthermore. Currently, Pretata achieves 92.92% TATA- binding protein prediction accuracy, which is better than all other existing methods. Conclusions: The experiments demonstrate that our method could greatly improve the prediction accuracy and speed, thus allowing large-scale NGS data prediction to be practical. A web server is developed to facilitate the other researchers, which can be accessed at http://server.malab.cn/preTata/

    Prediction of lung tumor types based on protein attributes by machine learning algorithms

    Full text link

    PUEPro : A Computational Pipeline for Prediction of Urine Excretory Proteins

    Get PDF
    This work is supported by the National Natural Science Foundation of China (Grant Nos. 81320108025, 61402194, 61572227), Development Project of Jilin Province of China (20140101180JC) and China Postdoctoral Science Foundation (2014T70291).Postprin

    Deriving a mutation index of carcinogenicity using protein structure and protein interfaces

    Get PDF
    With the advent of Next Generation Sequencing the identification of mutations in the genomes of healthy and diseased tissues has become commonplace. While much progress has been made to elucidate the aetiology of disease processes in cancer, the contributions to disease that many individual mutations make remain to be characterised and their downstream consequences on cancer phenotypes remain to be understood. Missense mutations commonly occur in cancers and their consequences remain challenging to predict. However, this knowledge is becoming more vital, for both assessing disease progression and for stratifying drug treatment regimes. Coupled with structural data, comprehensive genomic databases of mutations such as the 1000 Genomes project and COSMIC give an opportunity to investigate general principles of how cancer mutations disrupt proteins and their interactions at the molecular and network level. We describe a comprehensive comparison of cancer and neutral missense mutations; by combining features derived from structural and interface properties we have developed a carcinogenicity predictor, InCa (Index of Carcinogenicity). Upon comparison with other methods, we observe that InCa can predict mutations that might not be detected by other methods. We also discuss general limitations shared by all predictors that attempt to predict driver mutations and discuss how this could impact high-throughput predictions. A web interface to a server implementation is publicly available at http://inca.icr.ac.uk/

    Identification of Bacterial Cell Wall Lyases via Pseudo Amino Acid Composition

    Get PDF

    Determine the Classification of COVID-19 by Combining the Encoding of Amino Acids with Machine-Learning Models

    Get PDF
    In the ongoing battle against COVID-19, a novel approach integrating the encoding of amino acids with advanced machine-learning models offers a promising avenue for enhancing the classification accuracy of the virus strains. The relentless evolution of the virus necessitates robust and adaptable diagnostic tools capable of capturing the genetic intricacies that underpin the disease's transmission and virulence. This study addresses the critical need for refined classification techniques, pinpointing a significant gap in existing methodologies that often overlook the potential of amino acid sequences as predictive biomarkers. Employing a sophisticated feature selection mechanism, this research harnesses the power of Information Gain (IG) and Analysis of Variance (ANOVA) to distill essential features from the amino acid sequences. This process not only illuminates the sequences' predictive capacity but also reduces computational complexity, paving the way for more efficient model training and validation. The dataset, derived from the National Genomics Data Center (NGDC), encompasses a comprehensive array of amino acid sequences associated with various COVID-19 strains, providing a fertile ground for model evaluation through 10-fold cross-validation. The study meticulously evaluates the performance of two machine-learning classifiers: Decision Trees (DT) and Random Forest (RF). Utilizing IG, the RF classifier demonstrated exceptional proficiency, achieving an accuracy of 98.69%, with similarly high metrics across sensitivity, specificity, and precision. This starkly contrasts with the DT classifier, which, while respectable, lagged behind with an overall accuracy of 89.23%. A parallel examination using ANOVA echoed these findings, with RF maintaining superior performance, albeit with a narrower margin of distinction between the two classifiers. This comparative analysis underscores the RF classifier's robustness, attributable to its ensemble nature, which aggregates insights from multiple decision trees to mitigate overfitting and enhance predictive accuracy. The integration of amino acid encoding with RF, informed by targeted feature selection through IG and ANOVA, presents a potent methodology for COVID-19 strain classification
    corecore