278 research outputs found

    Automatic discovery of cross-family sequence features associated with protein function

    Get PDF
    BACKGROUND: Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. RESULTS: We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. CONCLUSION: We have developed a novel and useful approach for knowledge discovery in annotated sequence data. The technique is able to identify functionally important sequence features and does not require expert knowledge. By viewing protein function from a sequence perspective, the approach is also suitable for discovering unexpected links between biological processes, such as the recently discovered role of ubiquitination in transcription

    Towards defining the nuclear proteome

    Get PDF
    Direct evidence is reported for 2,568 mammalian proteins within the nuclear proteome, consisting of at least 14% of the entire proteome

    Prediction of Protein Subcellular Localization: A Machine Learning Approach

    Get PDF
    Subcellular localization is a key functional characteristic of proteins. Optimally combining available information is one of the key challenges in today's knowledge-based subcellular localization prediction approaches. This study explores machine learning approaches for the prediction of protein subcellular localization that use resources concerning Gene Ontology and secondary structures. Using the spectrum kernel for feature representation of amino acid sequences and secondary structures, we explore an SVM-based learning method that classifies six subcellular localization sites: endoplasmic reticulum, extracellular, Golgi, membrane, mitochondria, and nucleus

    Automated retrieval and extraction of training course information from unstructured web pages

    Get PDF
    Web Information Extraction (WIE) is the discipline dealing with the discovery, processing and extraction of specific pieces of information from semi-structured or unstructured web pages. The World Wide Web comprises billions of web pages and there is much need for systems that will locate, extract and integrate the acquired knowledge into organisations practices. There are some commercial, automated web extraction software packages, however their success comes from heavily involving their users in the process of finding the relevant web pages, preparing the system to recognise items of interest on these pages and manually dealing with the evaluation and storage of the extracted results. This research has explored WIE, specifically with regard to the automation of the extraction and validation of online training information. The work also includes research and development in the area of automated Web Information Retrieval (WIR), more specifically in Web Searching (or Crawling) and Web Classification. Different technologies were considered, however after much consideration, Naïve Bayes Networks were chosen as the most suitable for the development of the classification system. The extraction part of the system used Genetic Programming (GP) for the generation of web extraction solutions. Specifically, GP was used to evolve Regular Expressions, which were then used to extract specific training course information from the web such as: course names, prices, dates and locations. The experimental results indicate that all three aspects of this research perform very well, with the Web Crawler outperforming existing crawling systems, the Web Classifier performing with an accuracy of over 95% and a precision of over 98%, and the Web Extractor achieving an accuracy of over 94% for the extraction of course titles and an accuracy of just under 67% for the extraction of other course attributes such as dates, prices and locations. Furthermore, the overall work is of great significance to the sponsoring company, as it simplifies and improves the existing time-consuming, labour-intensive and error-prone manual techniques, as will be discussed in this thesis. The prototype developed in this research works in the background and requires very little, often no, human assistance

    Multiscale image analysis of calcium dynamics in cardiac myocytes

    Get PDF
    Cardiac myocytes constitute a unique physiological system. They are the muscle cells that build up heart tissue and provide the force to pump blood by synchronously contracting at every beat. This contraction is regulated by calcium concentration, among other ions, which exhibits a very complex behaviour, rich in dynamical states at the molecular, cellular and tissue levels. Details of such dynamical patterns are closely related to the mechanisms responsible for cardiac function and also cardiac disease, which is the first cause of death in the modern world. The emerging field of translational cardiology focuses on the study of how such mechanisms connect and influence each other across spatial and temporal scales finally yielding to a certain clinical condition. In order to study such patterns, we benefit from the recent and very important advances in the field of experimental cell physiology. In particular, fluorescence microscopy allows us to observe the distribution of calcium in the cell with a spatial resolution below the micron and a frame rate around the millisecond, thus providing a very accurate monitoring of calcium fluxes in the cell. This thesis is the result of over five years' work on biological signal and digital image processing of cardiac cells. During this period of time the aim has been to develop computational techniques for extracting quantitative data of physiological relevance from microscopy images at different scales. The two main subjects covered in the thesis are image segmentation and classification methods applied to fluorescence microscopy imaging of cardiac myocytes. These methods are applied to a variety of problems involving different space and time scales such as the localisation of molecular receptors, the detection and characterisation of spontaneous calcium-release events and the propagation of calcium waves across a culture of cardiac cells. The experimental images and data have been provided by four internationally renowned collaborators in the field. It is thanks to them and their teams that this thesis has been possible. They are Dr. Leif Hove-Madsen from the Institut de Ciències Cardiovasculars de Catalunya in Barcelona, Prof. S. R. Wayne Chen from the Department of Physiology and Pharmacology in the Libin Cardiovascular Institute of Alberta, University of Calgary, Dr. Peter P. Jones from the Department of Physiology in the University of Otago, and Prof. Glen Tibbits from the Department of Biomedical Physiology & Kinesiology at the Simon Fraser University in Vancouver. The work belongs to the biomedical engineering discipline, focusing on the engineering perspective by applying physics and mathematics to solve biomedical problems. Specifically, we frame our contributions in the field of computational translational cardiology, attempting to connect molecular mechanisms in cardiac cells up to cardiac disease by developing signal and image-processing methods and machine-learning methods that are scalable through the different scales. This computational approach allows for a quantitative, robust and reproducible analysis of the experimental data and allows us to obtain results that otherwise would not be possible by means of traditional manual methods. The results of the thesis provide specific insight into different cell mechanisms that have a non-negligible impact at the clinical level. In particular, we gain a deeper knowledge of cell mechanisms related to cardiac arrhythmia, fibrillation phenomena, the emergence of alternans and anomalies in calcium handling due to cell ageing.Els cardiomiòcits constitueixen un sistema fisiològic únic. Són les cèl·lules muscular que formen el cor i proporcionen la força per bombar la sang fent una contracció a cada batec. La regulació d'aquesta contracció es fa mitjançant concentració de calci (entre d'altres ions) i presenta una dinàmica molt complexa tant a l'escala molecular, cel·lular i de teixit. Detalls d'aquesta dinàmica estan fortament relacionats amb la funció cardíaca i per sobre de tot amb patologies cardíaques. La disciplina emergent de la cardiologia translacional es centra en l'estudi de com aquests mecanismes es connecten i s'influencien entre sí a través de diferents escales temporals i espacials finalment donant lloc a condicions clíniques. Per estudiar aquests patrons ens beneficiem dels recents avenços en fisiologia i biologia cel·lular. En particular, la microscòpia de fluorescència ens permet observar la distribució de calci dins una cèl·lula amb una resolució espacial per sota de la micra i temporal per sota del mil·lisegon, permetent un monitoratge acurat dels fluxos de calci en la cèl·lula cardíaca. Aquesta tesi és el resultat de més de cinc anys de feina en processament de senyal i imatge de cardiomiòcits humans. Durant aquest període de temps l'objectiu principal ha estat desenvolupar tècniques computacionals per extraure dades d'imatges de microscòpia amb rellevància fisiològica. Els dos temes principals coberts a la tesi són segmentació d'imatges i classificadors, aplicats a imatges de microscòpia de fluorescència de cardiomiòcits. Els mètodes s'apliquen a diferents problemes involucrant diverses escales espacials i temporals, des de determinar la posició de receptors a l’escala molecular passant detectar i caracteritzar alliberament espontani de calci intracel·lular fins a la propagació d'ones de calci en un cultiu de cèl·lules cardíaques. Les dades experimentals han estat proporcionades per quatre col·laboradors de renom internacional. És gràcies a ells i els seus equips que aquesta tesi ha estat possible. Són el Dr. Leif Hove-Madsen de l'Institut de Ciències Cardiovasculars de Catalunya a Barcelona, el Dr. S.R. Wayne Chen del Department of Physiology and Pharmacology al Libin Cardiovascular Institute of Alberta, University of Calgary, el Dr. Peter P. Jones del Department of Physiology a la University of Otago, i el Dr. Glen Tibbits del Department of Biomedical Physiology & Kinesiology de la Simon Fraser University a Vancouver. El treball pertany a la disciplina de la enginyeria biomèdica, fent èmfasi a la perspectiva de l'enginyeria, aplicant física i matemàtiques per solucionar problemes de la biomedicina. Específicament, s'emmarca en la cardiologia translacional computacional, mirant de connectar mecanismes a l’escala molecular amb patologies cardíaques mitjançant tècniques de processament de dades i aprenentatge automàtic que són escalables a les diferents escales d’aplicació. Aquest enfocament computacional permet una anàlisi quantitatiu, robust i reproduïble de les dades experimentals i ens permet d'obtenir resultats que serien impossibles d'assolir mitjançant els tradicionals mètodes manuals. Els resultats que proporciona la tesi han permès aprofundir en l'enteniment de diferents mecanismes fisiològics amb impacte en l'àmbit clínic. Particularment hem permès d’assolir coneixements relacionats amb l'arítmia cardíaca, la fibril·lació, processos d'alternança i anomalies relacionades amb l’envelliment

    Bioinformatic and Proteomic Investigation of Chloroplast Transit Peptide Motifs and Genesis

    Get PDF
    The eukaryotic mitochondrion was formed by the endosymbiotic association of an - proteobacterium and a primordial phagocytic eukaryote. A second, and later, endosymbiosis between the eukaryote and a cyanobacterium gave rise to the chloroplast of plants. Following each of these events most of the organellar DNA was exported to the nucleus. A system evolved wherein proteins produced on cytosolic ribosomes are targeted to organelle protein translocators by N-terminal targeting sequences. Protein sorting between the chloroplast and the mitochondrion in the plant cell by the general import pathways shows remarkable fidelity despite a lack of sequence conservation among transit peptides and pre-sequences and despite very little sequence difference between these two targeting peptides. There is evidence for a hydrophobic recognition motif in mitochondrial presequences, and a similar motif has been proposed for the chloroplast transit peptide. We have developed novel motif-finding methods and applied them to our own chloroplast proteome data and to literature mitochondrial data. We fail to find a hydrophobic motif that discriminates the chloroplast and the mitochondrion. Another little understood phenomenon of organelle protein trafficking is how the targeting sequence is acquired after transfer of organelle DNA to the nucleus. It has been hypothesized that the transit peptide is acquired by exon shuffling. We find no correlation of transit peptide lengths with exon boundaries. Furthermore, using highly expressed cyanobacterial proteins conserved in plants, we find that the transit peptide appears as likely to be attached within the primordial sequence as without, indicating a more stochastic process for the origin of the transit peptide
    • …
    corecore