10 research outputs found

    Non-coding RNA annotation of the genome of Trichoplax adhaerens

    Get PDF
    A detailed annotation of non-protein coding RNAs is typically missing in initial releases of newly sequenced genomes. Here we report on a comprehensive ncRNA annotation of the genome of Trichoplax adhaerens, the presumably most basal metazoan whose genome has been published to-date. Since blast identified only a small fraction of the best-conserved ncRNAs—in particular rRNAs, tRNAs and some snRNAs—we developed a semi-global dynamic programming tool, GotohScan, to increase the sensitivity of the homology search. It successfully identified the full complement of major and minor spliceosomal snRNAs, the genes for RNase P and MRP RNAs, the SRP RNA, as well as several small nucleolar RNAs. We did not find any microRNA candidates homologous to known eumetazoan sequences. Interestingly, most ncRNAs, including the pol-III transcripts, appear as single-copy genes or with very small copy numbers in the Trichoplax genome

    Non-coding RNA annotation of the genome of Trichoplax adhaerens

    Get PDF
    A detailed annotation of non-protein coding RNAs is typically missing in initial releases of newly sequenced genomes. Here we report on a comprehensive ncRNA annotation of the genome of Trichoplax adhaerens, the presumably most basal metazoan whose genome has been published to-date. Since blast identified only a small fraction of the best-conserved ncRNAs—in particular rRNAs, tRNAs and some snRNAs—we developed a semi-global dynamic programming tool, GotohScan, to increase the sensitivity of the homology search. It successfully identified the full complement of major and minor spliceosomal snRNAs, the genes for RNase P and MRP RNAs, the SRP RNA, as well as several small nucleolar RNAs. We did not find any microRNA candidates homologous to known eumetazoan sequences. Interestingly, most ncRNAs, including the pol-III transcripts, appear as single-copy genes or with very small copy numbers in the Trichoplax genome

    Identification and analysis of patterns in DNA sequences, the genetic code and transcriptional gene regulation

    Get PDF
    The present cumulative work consists of six articles linked by the topic ”Identification and Analysis of Patterns in DNA sequences, the Genetic Code and Transcriptional Gene Regulation”. We have applied a binary coding, to efficiently findpatterns within nucleotide sequences. In the first and second part of my work one single bit to encode all four nucleotides is used. The three possibilities of a one - bit coding are: keto (G,U) - amino (A,C) bases, strong (G,C) - weak (A,U) bases, and purines (G,A) - pyrimidines (C,U). We found out that the best pattern could be observed using the purine - pyrimidine coding. Applying this coding we have succeeded in finding a new representation of the genetic code which has been published under the title ”A New Classification Scheme of the Genetic Code” in ”Journal of Molecular Biology” and ”A Purine-Pyrimidine Classification Scheme of the Genetic Code” in ”BIOForum Europe”. This new representation enables to reduce the common table of the genetic code from 64 to 32 fields maintaining the same information content. It turned out that all known and even new patterns of the genetic code can easily be recognized in this new scheme. Furthermore, our new representation allows us for speculations about the origin and evolution of the translation machinery and the genetic code. Thus, we found a possible explanation for the contemporary codon - amino acid assignment and wide support for an early doublet code. Those explanations have been published in ”Journal of Bioinformatics and Computational Biology” under the title ”The New Classification Scheme of the Genetic Code, its Early Evolution, and tRNA Usage”. Assuming to find these purine - pyrimidine patterns at the DNA level itself, we examined DNA binding sites for the occurrence of binary patterns. A comprehensive statistic about the largest class of restriction enzymes (type II) has shown a very distinctive purine - pyrimidine pattern. Moreover, we have observed a higher G+C content for the protein binding sequences. For both observations we have provided and discussed several explanations published under the title ”Common Patterns in Type II Restriction Enzyme Binding Sites” in ”Nucleic Acid Research”. The identified patterns may help to understand how a protein finds its binding site. In the last part of my work two submitted articles about the analysis of Boolean functions are presented. Boolean functions are used for the description and analysis of complex dynamic processes and make it easier to find binary patterns within biochemical interaction networks. It is well known that not all functions are necessary to describe biologically relevant gene interaction networks. In the article entitled ”Boolean Networks with Biologically Relevant Rules Show Ordered Behavior”, submitted to ”BioSystems”, we have shown, that the class of required Boolean functions can strongly be restricted. Furthermore, we calculated the exact number of hierarchically canalizing functions which are known to be biologically relevant. In our work ”The Decomposition Tree for Analysis of Boolean Functions” submitted to ”Journal of Complexity”, we introduced an efficient data structure for the classification and analysis of Boolean functions. This permits the recognition of biologically relevant Boolean functions in polynomial time

    Incorporating Boltzmann Machine Priors for Semantic Labeling in Images and Videos

    Get PDF
    Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field (CRF), which is well-suited to modeling local interactions among adjacent image regions. However the CRF is limited in dealing with complex, global (long-range) interactions between regions in an image, and between frames in a video. This thesis presents approaches to modeling long-range interactions within images and videos, for use in semantic labeling. In order to model these long-range interactions, we incorporate priors based on the restricted Boltzmann machine (RBM). The RBM is a generative model which has demonstrated the ability to learn the shape of an object and the CRBM is a temporal extension which can learn the motion of an object. Although the CRF is a good baseline labeler, we show how the RBM and CRBM can be added to the architecture to model both the global object shape within an image and the temporal dependencies of the object from previous frames in a video. We demonstrate the labeling performance of our models for the parts of complex face images from the Labeled Faces in the Wild database (for images) and the YouTube Faces Database (for videos). Our hybrid models produce results that are both quantitatively and qualitatively better than the baseline CRF alone for both images and videos

    HIERARCHICAL ENSEMBLE METHODS FOR ONTOLOGY-BASED PREDICTIONS IN COMPUTATIONAL BIOLOGY

    Get PDF
    L'annotazione standardizzata di entit\ue0 biologiche, quali geni e proteine, ha fortemente promosso l'organizzazione dei concetti biologici in vocabolari controllati, cio\ue8 ontologie che consentono di indicizzare in modo coerente le relazioni tra le diverse classi funzionali organizzate secondo una gerarchia predefinita. Esempi di ontologie biologiche in cui i termini funzionali sono strutturati secondo un grafo diretto aciclico (DAG) sono la Gene Ontology (GO) e la Human Phenotype Ontology (HPO). Tali tassonomie gerarchiche vengono utilizzate dalla comunit\ue0 scientifica rispettivamente per sistematizzare le funzioni proteiche di tutti gli organismi viventi dagli Archea ai Metazoa e per categorizzare le anomalie fenotipiche associate a malattie umane. Tali bio-ontologie, offrendo uno spazio di classificazione ben definito, hanno favorito lo sviluppo di metodi di apprendimento per la predizione automatizzata della funzione delle proteine e delle associazioni gene-fenotipo patologico nell'uomo. L'obiettivo di tali metodologie consiste nell'\u201cindirizzare\u201d la ricerca \u201cin-vitro\u201d per favorire una riduzione delle spese ed un uso pi\uf9 efficace dei fondi destinati alla ricerca. Dal punto di vista dell'apprendimento automatico il problema della predizione della funzione delle proteine o delle associazioni gene-fenotipo patologico nell'uomo pu\uf2 essere modellato come un problema di classificazione multi-etichetta strutturato, in cui le predizioni associate ad ogni esempio (i.e., gene o proteina) sono sotto-grafi organizzati secondo una determinata struttura (albero o DAG). A causa della complessit\ue0 del problema di classificazione, ad oggi l'approccio di predizione pi\uf9 comunemente utilizzato \ue8 quello \u201cflat\u201d, che consiste nell'addestrare un classificatore separatamente per ogni termine dell'ontologia senza considerare le relazioni gerarchiche esistenti tra le classi funzionali. L'utilizzo di questo approccio \ue8 giustificato non soltanto dal fatto di ridurre la complessit\ue0 computazionale del problema di apprendimento, ma anche dalla natura \u201cinstabile\u201d dei termini che compongono l'ontologia stessa. Infatti tali termini vengono aggiornati mensilmente mediante un processo curato da esperti che si basa sia sulla letteratura scientifica biomedica che su dati sperimentali ottenuti da esperimenti eseguiti \u201cin-vitro\u201d o \u201cin-silico\u201d. In questo contesto, in letteratura sono stati proposti due classi generali di classificatori. Da una parte, si collocano i metodi di apprendimento automatico che predicono le classi funzionali in modo \u201cflat\u201d, ossia senza esplorare la struttura intrinseca dello spazio delle annotazioni. Dall'altra parte, gli approcci gerarchici che, considerando esplicitamente le relazioni gerarchiche fra i termini funzionali dell'ontologia, garantiscono che le annotazioni predette rispettino la \u201ctrue-path-rule\u201d, la regola biologica che governa le ontologie. Nell'ambito dei metodi gerarchici, in letteratura sono stati proposti due diverse categorie di approcci. La prima si basa su metodi kernelizzati per predizioni con output strutturato, mentre la seconda su metodi di ensemble gerarchici. Entrambi questi metodi presentano alcuni svantaggi. I primi sono computazionalmente pesanti e non scalano bene se applicati ad ontologie biologiche. I secondi sono stati per la maggior parte concepiti per tassonomie strutturate ad albero, e quei pochi approcci specificatamente progettati per ontologie strutturate secondo un DAG, sono nella maggioranza dei casi incapaci di migliorare le performance di predizione dei metodi \u201cflat\u201d. Per superare queste limitazioni, nel presente lavoro di tesi si sono proposti dei nuovi metodi di ensemble gerarchici capaci di fornire predizioni consistenti con la struttura gerarchica dell'ontologia. Tali approcci, da un lato estendono precedenti metodi originariamente sviluppati per ontologie strutturate ad albero ad ontologie organizzate secondo un DAG e dall'altro migliorano significativamente le predizioni rispetto all'approccio \u201cflat\u201d indipendentemente dalla scelta del tipo di classificatore utilizzato. Nella loro forma pi\uf9 generale, gli approcci di ensemble gerarchici sono altamente modulari, nel senso che adottano una strategia di apprendimento a due passi. Nel primo passo, le classi funzionali dell'ontologia vengono apprese in modo indipendente l'una dall'altra, mentre nel secondo passo le predizioni \u201cflat\u201d vengono combinate opportunamente tenendo conto delle gerarchia fra le classi ontologiche. I principali contributi introdotti nella presente tesi sono sia metodologici che sperimentali. Da un punto di vista metodologico, sono stati proposti i seguenti nuovi metodi di ensemble gerarchici: a) HTD-DAG (Hierarchical Top-Down per tassonomie DAG strutturate); b) TPR-DAG (True-Path-Rule per DAG) con diverse varianti algoritmiche; c) ISO-TPR (True-Path-Rule con Regressione Isotonica), un nuovo algoritmo gerarchico che combina la True-Path-Rule con metodi di regressione isotonica. Per tutti i metodi di ensemble gerarchici \ue8 stato dimostrato in modo formale la coerenza delle predizioni, cio\ue8 \ue8 stato provato come gli approcci proposti sono in grado di fornire predizioni che rispettano le relazioni gerarchiche fra le classi. Da un punto di vista sperimentale, risultati a livello dell'intero genoma di organismi modello e dell'uomo ed a livello della totalit\ue0 delle classi incluse nelle ontologie biologiche mostrano che gli approcci metodologici proposti: a) sono competitivi con gli algoritmi di predizione output strutturata allo stato dell'arte; b) sono in grado di migliorare i classificatori \u201cflat\u201d, a patto che le predizioni fornite dal classificatore non siano casuali; c) sono in grado di predire nuove associazioni tra geni umani e fenotipi patologici, un passo cruciale per la scoperta di nuovi geni associati a malattie genetiche umane e al cancro; d) scalano bene su dataset costituiti da decina di migliaia di esempi (i.e., proteine o geni) e su tassonomie costituite da migliaia di classi funzionali. Infine, i metodi proposti in questa tesi sono stati implementati in una libreria software scritta in linguaggio R, HEMDAG (Hierarchical Ensemble Methods per DAG), che \ue8 pubblica, liberamente scaricabile e disponibile per i sistemi operativi Linux, Windows e Macintosh.The standardized annotation of biomedical related objects, often organized in dedicated catalogues, strongly promoted the organization of biological concepts into controlled vocabularies, i.e. ontologies by which related terms of the underlying biological domain are structured according to a predefined hierarchy. Indeed large ontologies have been developed by the scientific community to structure and organize the gene and protein taxonomy of all the living organisms from Archea to Metazoa, i.e. the Gene Ontology, or human specific ontologies, such as the Human Phenotype Ontology, that provides a structured taxonomy of the abnormal human phenotypes associated with diseases. These ontologies, offering a coded and well-defined classification space for biological entities such as genes and proteins, favor the development of machine learning methods able to predict features of biological objects like the association between a human gene and a disease, with the aim to drive wet lab research allowing a reduction of the costs and a more effective usage of the available research funds. Despite the soundness of the aforementioned objectives, the resulting multi-label classification problems raise so complex machine learning issues that until recently the far common approach was the \u201cflat\u201d prediction, i.e. simply training a classifier for each term in the controlled vocabulary and ignoring the relationships between terms. This approach was not only justified by the need to reduce the computational complexity of the learning task, but also by the somewhat \u201cunstable\u201d nature of the terms composing the controlled vocabularies, because they were (and are) updated on a monthly basis in a process performed by expert curators and based on biomedical literature, and wet and in-silico experiments. In this context, two main general classes of classifiers have been proposed in literature. On the one hand, \u201chierarchy-unaware\u201d learning methods predict labels in a \u201cflat\u201d way without exploiting the inherent structure of the annotation space. On the other hand, \u201chierarchy-aware\u201d learning methods can improve the accuracy and the precision of the predictions by considering the hierarchical relationships between ontology terms. Moreover these methods can guarantee the consistency of the predicted labels according to the \u201ctrue path rule\u201d, that is the biological and logical rule that governs the internal coherence of biological ontologies. To properly handle the hierarchical relationships linking the ontology terms, two main classes of structured output methods have been proposed in literature: the first one is based on kernelized methods for structured output spaces, the second on hierarchical ensemble methods for ontology-based predictions. However both these approaches suffer of significant drawbacks. The kernel-based methods for structured output space are computationally intensive and do not scale well when applied to complex multi-label bio-ontologies. Most hierarchical ensemble methods have been conceived for tree-structured taxonomies and the few ones specifically developed for the prediction in DAG-structured output spaces are, in most cases, unable to improve prediction performances over flat methods. To overcome these limitations, in this thesis novel \u201contology-aware\u201d ensemble methods have been developed, able to handle DAG-structured ontologies, leveraging previous results obtained with \u201ctrue-path-rule\u201d-based hierarchical learning algorithms. These methods are highly modular in the sense that they adopt a \u201ctwo-step\u201d learning strategy: in the first step they learn separately each term of the ontology using flat methods, and in the second they properly combine the flat predictions according to the hierarchy of the classes. The main contributions of this thesis are both methodological and experimental. From a methodological standpoint, novel hierarchical ensemble methods are proposed, including: a) HTD (Hierarchical Top-Down algorithm for DAG structured ontologies); b) TPR-DAG (True Path Rule ensemble for DAG) with several variants; c) ISO-TPR, a novel ensemble method that combines the True Path Rule approach with Isotonic Regression. For all these methods a formal proof of their consistency, i.e. the guarantee of providing predictions that \u201crespect\u201d the hierarchical relationships between classes, is provided. From an experimental standpoint, extensive genome and ontology-wide results show that the proposed methods: a) are competitive with state-of-the-art prediction algorithms; b) are able to improve flat machine learning classifiers, if the base learners can provide non random predictions; c) are able to predict new associations between genes and human abnormal phenotypes, a crucial step to discover novel genes associated with human diseases ranging from genetic disorders to cancer; d) scale nicely with large datasets and bio-ontologies. Finally HEMDAG, a novel R library implementing the proposed hierarchical ensemble methods has been developed and publicly delivered

    Interactive graph drawing with constraints

    Get PDF
    This thesis investigates the requirements for graph drawing stemming from practical applications, and presents both theoretical as well as practical results and approaches to handle them. Many approaches to compute graph layouts in various drawing styles exist, but the results are often not sufficient for use in practice. Drawing conventions, graphical notation standards, and user-defined requirements restrict the set of admissible drawings. These restrictions can be formalized as constraints for the layout computation. We investigate the requirements and give an overview and categorization of the corresponding constraints. Of main importance for the readability of a graph drawing is the number of edge crossings. In case the graph is planar it should be drawn without crossings, otherwise we should aim to use the minimum number of crossings possible. However, several types of constraints may impose restrictions on the way the graph can be embedded in the plane. These restrictions may have a strong impact on crossing minimization. For two types of such constraints we present specific solutions how to consider them in layout computation: We introduce the class of so-called embedding constraints, which restrict the order of the edges around a vertex. For embedding constraints we describe approaches for planarity testing, embedding, and edge insertion with the minimum number of crossings. These problems can be solved in linear time with our approaches. The second constraint type that we tackle are clusters. Clusters describe a hierarchical grouping of the graph's vertices that has to be reflected in the drawing. The complexity of the corresponding clustered planarity testing problem for clustered graphs is unknown so far. We describe a technique to compute a maximum clustered planar subgraph of a clustered graph. Our solution is based on an Integer Linear Program (ILP) formulation and includes also the first practical clustered planarity test for general clustered graphs. The resulting subgraph can be used within the first step of the planarization approach for clustered graphs. In addition, we describe how to improve the performance for pure clustered planarity testing by implying a branch-and-price approach. Large and complex graphs nowadays arise in many application domains. These graphs require interaction and navigation techniques to allow exploration of the underlying data. The corresponding concepts are presented and solutions for three practical applications are proposed: First, we describe Scaffold Hunter, a tool for the exploration of chemical space. We show how to use a hierarchical classification of molecules for the visual navigation in chemical space. The resulting visualization is embedded into an interactive environment that allows visual analysis of chemical compound databases. Finally, two interactive visualization approaches for two types of biological networks, protein-domain networks and residue interaction networks, are presented.In zahlreichen Anwendungsgebieten werden Informationen als Graphen modelliert und mithilfe dieser Graphen visualisiert. Eine übersichtliche Darstellung hilft bei der Analyse und unterstützt das Verständnis bei der Präsentation von Informationen mittels graph-basierter Diagramme. Neben allgemeinen ästhetischen Kriterien bestehen für eine solche Darstellung Anforderungen, die sich aus der Charakteristik der Daten, etablierten Darstellungskonventionen und der konkreten Fragestellung ergeben. Zusätzlich ist häufig eine individuelle Anpassung der Darstellung durch den Anwender gewünscht. Diese Anforderungen können mithilfe von Nebenbedingungen für die Berechnung eines Layouts formuliert werden. Trotz einer Vielzahl unterschiedlicher Anforderungen aus zahlreichen Anwendungsgebieten können die meisten Anforderungen über einige generische Nebenbedingungen formuliert werden. In dieser Arbeit untersuchen wir die Anforderungen aus der Praxis und beschreiben eine Zuordnung zu Nebenbedingungen für die Layoutberechnung. Wir geben eine Übersicht über den aktuellen Stand der Behandlung von Nebenbedingungen beim Zeichnen von Graphen und kategorisieren diese nach grundlegenden Eigenschaften. Von besonderer Wichtigkeit für die Qualität einer Darstellung ist die Anzahl der Kreuzungen. Planare Graphen sollten kreuzungsfrei gezeichnet werden, bei nicht-planaren Graphen sollte die minimale Anzahl Kreuzungen erreicht werden. Einige Nebenbedingungen beschränken jedoch die Möglichkeit, den Graph in die Ebene einzubetten. Dies kann starke Auswirkungen auf das Ergebnis der Kreuzungsminimierung haben. Zwei wichtige Typen solcher Nebenbedingungen werden in dieser Arbeit näher untersucht. Mit den Embedding Constraints führen wir eine Klasse von Nebenbedingungen ein, welche die mögliche Reihenfolge der Kanten um einen Knoten beschränken. Für diese Klasse präsentieren wir Linearzeitalgorithmen für das Testen der Planarität und das optimale Einfügen von Kanten unter Beachtung der Einbettungsbeschränkungen. Der zweite Typ von Nebenbedingungen sind Cluster, die eine hierarchische Gruppierung von Knoten vorgeben. Für das Testen der Cluster-Planarität unter solchen Nebenbedingungen ist die Komplexität bisher unbekannt. Wir beschreiben ein Verfahren, um einen maximalen Cluster-planaren Untergraphen zu berechnen. Wir nutzen dabei eine Formulierung als ganzzahliges lineares Programm sowie einen Branch-and-Cut Ansatz zur Lösung. Das Verfahren erlaubt auch die Bestimmung der Cluster-Planarität und stellt damit den ersten praktischen Ansatz zum Testen allgemeiner Clustergraphen dar. Zusätzlich beschreiben wir eine Verbesserung für den Fall, dass lediglich Cluster-Planarität getestet werden muss, der maximale Cluster-planare Untergraph aber nicht von Interesse ist. Für dieses Szenario geben wir eine vereinfachte Formulierung und präsentieren ein Lösungsverfahren, das auf einem Branch-and-Price Ansatz beruht. In der Praxis müssen häufig sehr große oder komplexe Graphen untersucht werden. Dazu werden entsprechende Interaktions- und Navigationsmethoden benötigt. Wir beschreiben die entsprechenden Konzepte und stellen Lösungen für drei Anwendungsbereiche vor: Zunächst beschreiben wir Scaffold Hunter, eine Software zur Navigation im chemischen Strukturraum. Scaffold Hunter benutzt eine hierarchische Klassifikation von Molekülen als Grundlage für die visuelle Navigation. Die Visualisierung ist eingebettet in eine interaktive Oberfläche die eine visuelle Analyse von chemischen Strukturdatenbanken erlaubt. Für zwei Typen von biologischen Netzwerken, Protein-Domänen Netzwerke und Residue-Interaktionsnetzwerke, stellen wir Ansätze für die interaktive Visualisierung dar. Die entsprechenden Layoutverfahren unterliegen einer Reihe von Nebenbedingungen für eine sinnvolle Darstellung

    Anotación funcional de proteínas basada en representación relacional en el entorno de la biología de sistemas

    Get PDF
    La anotación funcional es un tema de investigación abierto e importante en Biología Molecular. El problema de definir función a nivel de terminología es complicado, puesto que la función ocupa muchos niveles para una misma proteína y no existe un criterio unificado. Ante estas dificultades, la forma de determinar la función de una proteína es anotarla con distintos términos en diferentes vocabularios. Las proteínas desarrollan su función en cooperación con otras proteínas formando complejos. Estas interacciones se representan en una red, formada por interacciones que han sido demostradas experimentalmente entre proteínas. Analizar y utilizar la red de interacciones es una tarea de interés debido al gran número de asociaciones existentes, y a las múltiples formas en que una proteína puede influir en la función de otras. Por lo tanto, esta tesis se centra en la predicción de anotación funcional basada en redes Es evidente que este complejo escenario no puede afrontarse sin el uso de herramientas computacionales. De hecho existe una actividad considerable en el área de Biología Computacional dedicada específicamente a este tema. Esta tesis es parte de este esfuerzo en la aplicación de métodos computacionales a problemas biológicos en el área de Biología de Sistemas. Esta aproximación puede enmarcarse en este contexto de la Biología de Sistemas, puesto que no se analiza la función de forma aislada para cada molécula, sino a nivel de sistema, teniendo en cuenta todas las relaciones existentes entre genes y proteínas conectados a distintos niveles. Para aprovechar todas estas relaciones biológicas, y mantener su semántica estructural, esta tesis plantea usar Representación Relacional, por ser un dominio particularmente apropiado para ello. A partir de dicha representación se aplican múltiples transformaciones y técnicas de Inteligencia Artificial para extraer conocimiento de las proteínas relacionadas, y proponer nuevas funciones a través de la prediccion de asociaciones funcionales entre proteínas. La propuesta general de esta tesis es la caracterización de función de proteínas y genes basándose en información de redes, a través de la Representación Relacional y el Aprendizaje Automático. En concreto, partiendo de una representación relacional para anotación funcional, se busca el diseño computacional necesario para resolver dos problemas concretos, diferentes e interesantes en Biología. Uno es la predicción de asociaciones funcionales entre pares de proteínas en E.coli, y el otro la extensión de rutas biológicas en humanos. Ambos se evalúan en términos computacionales y de interpretación biológica. También se proponen nuevas anotaciones funcionales de proteínas a ser verificadas experimentalmente. Además, se exploran diversos enfoques en la representación del conocimiento y en las técnicas de aprendizaje, proponiendo estrategias concretas para resolver otros problemas bioinformáticos, especialmente influenciados por la información relacional y el aprendizaje multi-clase y multi-etiqueta. -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Functional annotation is an open and interesting research topic in Molecular Biology. Determining a function in terminology terms is a hard task, due to lack of unified criterion and also because a function takes up many levels for the same protein. Given this difficulties, the way to determine a protein function is to annotate it with several terms from different vocabularies. Proteins carry out their function together with other proteins, being part of protein complexes. These interactions are represented in a network of experimentally verified protein-protein interactions. Analyzing and using the interaction network is task of interest due to the great number of associations, and to the multiple ways in which a protein could influence in the function of others. Therefore, this thesis focuses in the prediction of functional annotation based on networks. It’s apparent that this complex scenario couldn’t be faced without computational techniques. In fact, in Computational Biology, there is a considerable activity specially devoted to this topic. This thesis is part of this effort for applying computational methods to biological problems in the Systems Biology area. This approximation can belong to the Systems Biology context, because it does not analyze function in an isolated way for each molecule, but at system level, taking into account all the relations among genes and proteins linked at different levels. To take advantages of all these biological relations, and to preserve their structured semantics, this thesis suggests to use Relational Representation, since in particular it is suitable for the concerning domain. Over such representation, multiple transformations and Artificial Intelligence techniques are applied to retrieve implicit knowledge from the related proteins, and to propose new functions through the prediction of functional associations between proteins. The main proposal of this thesis is to characterize the function of proteins and genes based on networks, through Relational Representation and Machine Learning. Specially, from a relational representation specific to functional annotation, we look for the computational design needed to solve two specific, biological interesting and different problems. The former consists of predicting functional association between pair of proteins in E.coli, and the latter comprises expanding pathways in humans. We perform an assessment in computational and biological interpretation terms. Besides, we propose new putative protein functional annotations to be experimentally verified. In addition, the thesis investigates diverse approaches to knowledge representation and learning techniques, suggesting specific strategies to tackle other biological problems, specially where relational data or multi-class and multi-label targets are present

    A cumulative index to Aeronautical Engineering: A special bibliography

    Get PDF
    This publication is a cumulative index to the abstracts contained in NASA SP-7037 (80) through NASA SP-7037 (91) of Aeronautical Engineering: A Special Bibliography. NASA SP-7037 and its supplements have been compiled through the cooperative efforts of the American Institute of Aeronautics (AIAA) and Space Administration (NASA). This cumulative index includes subject, personal author, corporate source, contract, and report number indexes
    corecore