12 research outputs found

    Statistical Relational Learning for Proteomics: Function, Interactions and Evolution

    Get PDF
    In recent years, the field of Statistical Relational Learning (SRL) [1, 2] has produced new, powerful learning methods that are explicitly designed to solve complex problems, such as collective classification, multi-task learning and structured output prediction, which natively handle relational data, noise, and partial information. Statistical-relational methods rely on some First- Order Logic as a general, expressive formal language to encode both the data instances and the relations or constraints between them. The latter encode background knowledge on the problem domain, and are use to restrict or bias the model search space according to the instructions of domain experts. The new tools developed within SRL allow to revisit old computational biology problems in a less ad hoc fashion, and to tackle novel, more complex ones. Motivated by these developments, in this thesis we describe and discuss the application of SRL to three important biological problems, highlighting the advantages, discussing the trade-offs, and pointing out the open problems. In particular, in Chapter 3 we show how to jointly improve the outputs of multiple correlated predictors of protein features by means of a very gen- eral probabilistic-logical consistency layer. The logical layer — based on grounding-specific Markov Logic networks [3] — enforces a set of weighted first-order rules encoding biologically motivated constraints between the pre- dictions. The refiner then improves the raw predictions so that they least violate the constraints. Contrary to canonical methods for the prediction of protein features, which typically take predicted correlated features as in- puts to improve the output post facto, our method can jointly refine all predictions together, with potential gains in overall consistency. In order to showcase our method, we integrate three stand-alone predictors of corre- lated features, namely subcellular localization (Loctree[4]), disulfide bonding state (Disulfind[5]), and metal bonding state (MetalDetector[6]), in a way that takes into account the respective strengths and weaknesses. The ex- perimental results show that the refiner can improve the performance of the underlying predictors by removing rule violations. In addition, the proposed method is fully general, and could in principle be applied to an array of heterogeneous predictions without requiring any change to the underlying software. In Chapter 4 we consider the multi-level protein–protein interaction (PPI) prediction problem. In general, PPIs can be seen as a hierarchical process occurring at three related levels: proteins bind by means of specific domains, which in turn form interfaces through patches of residues. Detailed knowl- edge about which domains and residues are involved in a given interaction has extensive applications to biology, including better understanding of the bind- ing process and more efficient drug/enzyme design. We cast the prediction problem in terms of multi-task learning, with one task per level (proteins, domains and residues), and propose a machine learning method that collec- tively infers the binding state of all object pairs, at all levels, concurrently. Our method is based on Semantic Based Regularization (SBR) [7], a flexible and theoretically sound SRL framework that employs First-Order Logic con- straints to tie the learning tasks together. Contrarily to most current PPI prediction methods, which neither identify which regions of a protein actu- ally instantiate an interaction nor leverage the hierarchy of predictions, our method resolves the prediction problem up to residue level, enforcing con- sistent predictions between the hierarchy levels, and fruitfully exploits the hierarchical nature of the problem. We present numerical results showing that our method substantially outperforms the baseline in several experi- mental settings, indicating that our multi-level formulation can indeed lead to better predictions. Finally, in Chapter 5 we consider the problem of predicting drug-resistant protein mutations through a combination of Inductive Logic Programming [8, 9] and Statistical Relational Learning. In particular, we focus on viral pro- teins: viruses are typically characterized by high mutation rates, which allow them to quickly develop drug-resistant mutations. Mining relevant rules from mutation data can be extremely useful to understand the virus adaptation mechanism and to design drugs that effectively counter potentially resistant mutants. We propose a simple approach for mutant prediction where the in- put consists of mutation data with drug-resistance information, either as sets of mutations conferring resistance to a certain drug, or as sets of mutants with information on their susceptibility to the drug. The algorithm learns a set of relational rules characterizing drug-resistance, and uses them to generate a set of potentially resistant mutants. Learning a weighted combination of rules allows to attach generated mutants with a resistance score as predicted by the statistical relational model and select only the highest scoring ones. Promising results were obtained in generating resistant mutations for both nucleoside and non-nucleoside HIV reverse transcriptase inhibitors. The ap- proach can be generalized quite easily to learning mutants characterized by more complex rules correlating multiple mutations

    A Bioinformatics Study of Protein Conformational Flexibility and Misfolding: a Sequence, Structure and Dynamics Approach

    Get PDF
    This PhD Thesis titled "A Bioinformatics Study of Protein Conformational Flexibility and Misfolding: a Sequence, Structure and Dynamics Approach" comprises the results and conclusions obtained by us from the study of three different but somehow related research projects, covering aspects of the phenomenon of protein local conformational instability, its relationship with protein function, evolvability and aggregation, and the effect of genetic variations on protein conformational instability related to Conformational Diseases. These projects include the prediction of putative prion proteins in complete proteomes and the study of prion biology from a genomic perspective, the prediction of conformationally unstable protein regions and the existence of a structural framework for linking conformational instability to folding and function, and the establishment of a rationale for assessing the connection among mutations and disease phenotypes in Conformational Diseases.Esta tesis doctoral comprende los resultados y conclusiones obtenidos por nosotros a partir del estudio de tres proyectos de investigación diferentes pero de alguna manera relacionados, cubriendo los aspectos del fenómeno de la inestabilidad conformacional local de la proteína, su relación con la función de la proteína, la capacidad de evolución y agregación, y el efecto de las variaciones genéticas en la inestabilidad conformacional de la proteína relacionados con las enfermedades conformacionales. Estos proyectos incluyen la predicción de presuntas proteínas priónicas en proteomas complejos y el estudio de la biología de priones desde una perspectiva genómica, la predicción de las regiones de proteínas conformacionalmente inestables y la existencia de un marco estructural para la vinculación de la inestabilidad conformacional del plegado y la función, y el establecimiento de una razón fundamental para la evaluación de la relación entre las mutaciones y fenotipos de la enfermedad en enfermedades conformacionales

    Protein family classification using multiple-class neural networks.

    Get PDF
    The objective of genomic sequence analysis is to retrieve important information from the vast amount of genomic sequence data, such as DNA, RNA and protein sequences. The main task includes the interpretation of the function of DNA sequence on a genomic scale, the comparisons among genomes to gain insight into the universality of biological mechanisms and into the details of gene structure and function, the determination of the structure of all proteins and protein family classification. With its many features and capabilities for recognition, generalization and classification, artificial neural network technology is well suited for sequence analysis. At the state of the art, many methods have been devised to determine if a given protein sequence is member of a given protein superfamily. This is a binary classification problem, and efficient neural network techniques are mentioned in literature for solving such problem. In this Master\u27s thesis, we consider the problem of classifying given protein sequences into one among at least three protein families using neural networks, and, propose two methods: Pair-wise Multiple Classification Approach and Single Network Approach for this problem. In Pair-wise Multiple Classification Approach , several sub-networks are employed to perform the task whereas a compact network system is used in Single Network Approach . We performed experiments, using SNNS and UOWNNS neural network simulator on our NNs with different input/output representation, and reported accuracies as high as 95%. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Z54. Source: Masters Abstracts International, Volume: 43-01, page: 0248. Adviser: Alioune Ngom. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    MILCS: A mutual information learning classifier system

    Get PDF
    This paper introduces a new variety of learning classifier system (LCS), called MILCS, which utilizes mutual information as fitness feedback. Unlike most LCSs, MILCS is specifically designed for supervised learning. MILCS's design draws on an analogy to the structural learning approach of cascade correlation networks. We present preliminary results, and contrast them to results from XCS. We discuss the explanatory power of the resulting rule sets, and introduce a new technique for visualizing explanatory power. Final comments include future directions for this research, including investigations in neural networks and other systems. Copyright 2007 ACM

    Removal of antagonistic spindle forces can rescue metaphase spindle length and reduce chromosome segregation defects

    Get PDF
    Regular Abstracts - Tuesday Poster Presentations: no. 1925Metaphase describes a phase of mitosis where chromosomes are attached and oriented on the bipolar spindle for subsequent segregation at anaphase. In diverse cell types, the metaphase spindle is maintained at a relatively constant length. Metaphase spindle length is proposed to be regulated by a balance of pushing and pulling forces generated by distinct sets of spindle microtubules and their interactions with motors and microtubule-associated proteins (MAPs). Spindle length appears important for chromosome segregation fidelity, as cells with shorter or longer than normal metaphase spindles, generated through deletion or inhibition of individual mitotic motors or MAPs, showed chromosome segregation defects. To test the force balance model of spindle length control and its effect on chromosome segregation, we applied fast microfluidic temperature-control with live-cell imaging to monitor the effect of switching off different combinations of antagonistic forces in the fission yeast metaphase spindle. We show that spindle midzone proteins kinesin-5 cut7p and microtubule bundler ase1p contribute to outward pushing forces, and spindle kinetochore proteins kinesin-8 klp5/6p and dam1p contribute to inward pulling forces. Removing these proteins individually led to aberrant metaphase spindle length and chromosome segregation defects. Removing these proteins in antagonistic combination rescued the defective spindle length and, in some combinations, also partially rescued chromosome segregation defects. Our results stress the importance of proper chromosome-to-microtubule attachment over spindle length regulation for proper chromosome segregation.postprin

    Psr1p interacts with SUN/sad1p and EB1/mal3p to establish the bipolar spindle

    Get PDF
    Regular Abstracts - Sunday Poster Presentations: no. 382During mitosis, interpolar microtubules from two spindle pole bodies (SPBs) interdigitate to create an antiparallel microtubule array for accommodating numerous regulatory proteins. Among these proteins, the kinesin-5 cut7p/Eg5 is the key player responsible for sliding apart antiparallel microtubules and thus helps in establishing the bipolar spindle. At the onset of mitosis, two SPBs are adjacent to one another with most microtubules running nearly parallel toward the nuclear envelope, creating an unfavorable microtubule configuration for the kinesin-5 kinesins. Therefore, how the cell organizes the antiparallel microtubule array in the first place at mitotic onset remains enigmatic. Here, we show that a novel protein psrp1p localizes to the SPB and plays a key role in organizing the antiparallel microtubule array. The absence of psr1+ leads to a transient monopolar spindle and massive chromosome loss. Further functional characterization demonstrates that psr1p is recruited to the SPB through interaction with the conserved SUN protein sad1p and that psr1p physically interacts with the conserved microtubule plus tip protein mal3p/EB1. These results suggest a model that psr1p serves as a linking protein between sad1p/SUN and mal3p/EB1 to allow microtubule plus ends to be coupled to the SPBs for organization of an antiparallel microtubule array. Thus, we conclude that psr1p is involved in organizing the antiparallel microtubule array in the first place at mitosis onset by interaction with SUN/sad1p and EB1/mal3p, thereby establishing the bipolar spindle.postprin

    Smoking and Second Hand Smoking in Adolescents with Chronic Kidney Disease: A Report from the Chronic Kidney Disease in Children (CKiD) Cohort Study

    Get PDF
    The goal of this study was to determine the prevalence of smoking and second hand smoking [SHS] in adolescents with CKD and their relationship to baseline parameters at enrollment in the CKiD, observational cohort study of 600 children (aged 1-16 yrs) with Schwartz estimated GFR of 30-90 ml/min/1.73m2. 239 adolescents had self-report survey data on smoking and SHS exposure: 21 [9%] subjects had “ever” smoked a cigarette. Among them, 4 were current and 17 were former smokers. Hypertension was more prevalent in those that had “ever” smoked a cigarette (42%) compared to non-smokers (9%), p\u3c0.01. Among 218 non-smokers, 130 (59%) were male, 142 (65%) were Caucasian; 60 (28%) reported SHS exposure compared to 158 (72%) with no exposure. Non-smoker adolescents with SHS exposure were compared to those without SHS exposure. There was no racial, age, or gender differences between both groups. Baseline creatinine, diastolic hypertension, C reactive protein, lipid profile, GFR and hemoglobin were not statistically different. Significantly higher protein to creatinine ratio (0.90 vs. 0.53, p\u3c0.01) was observed in those exposed to SHS compared to those not exposed. Exposed adolescents were heavier than non-exposed adolescents (85th percentile vs. 55th percentile for BMI, p\u3c 0.01). Uncontrolled casual systolic hypertension was twice as prevalent among those exposed to SHS (16%) compared to those not exposed to SHS (7%), though the difference was not statistically significant (p= 0.07). Adjusted multivariate regression analysis [OR (95% CI)] showed that increased protein to creatinine ratio [1.34 (1.03, 1.75)] and higher BMI [1.14 (1.02, 1.29)] were independently associated with exposure to SHS among non-smoker adolescents. These results reveal that among adolescents with CKD, cigarette use is low and SHS is highly prevalent. The association of smoking with hypertension and SHS with increased proteinuria suggests a possible role of these factors in CKD progression and cardiovascular outcomes
    corecore