738 research outputs found

    Structural network analysis of biological networks for assessment of potential disease model organisms

    Get PDF
    AbstractModel organisms provide opportunities to design research experiments focused on disease-related processes (e.g., using genetically engineered populations that produce phenotypes of interest). For some diseases, there may be non-obvious model organisms that can help in the study of underlying disease factors. In this study, an approach is presented that leverages knowledge about human diseases and associated biological interactions networks to identify potential model organisms for a given disease category. The approach starts with the identification of functional and interaction patterns of diseases within genetic pathways. Next, these characteristic patterns are matched to interaction networks of candidate model organisms to identify similar subsystems that have characteristic patterns for diseases of interest. The quality of a candidate model organism is then determined by the degree to which the identified subsystems match genetic pathways from validated knowledge. The results of this study suggest that non-obvious model organisms may be identified through the proposed approach

    Region based gene expression via reanalysis of publicly available microarray data sets.

    Get PDF
    A DNA microarray is a high-throughput technology used to identify relative gene expression. One of the most widely used platforms is the Affymetrix® GeneChip® technology which detects gene expression levels based on probe sets composed of a set of twenty-five nucleotide probes designed to hybridize with specific gene targets. Given a particular Affymetrix® GeneChip® platform, the design of the probes is fixed. However, the method of analysis is dynamic in nature due to the ability to annotate and group probes into uniquely defined groupings. This is particularly important since publicly available repositories of microarray datasets, such as ArrayExpress and NCBI’s Gene Expression Omnibus (GEO) have made millions of samples readily available to be reanalyzed computationally without the need for new biological experiments. One way in which the analysis can dynamically change is by correcting the mapping between probe sets and targets by creating custom Chip Description Files (CDFs) to arrange which probes belong to which probe set based on the latest genomic information or specific annotations of interest. Since default probe sets in Affymetrix® GeneChip® platforms are specific for a gene, transcript or exon, the analyses are then limited to profile differential expression at the gene, transcript or individual exon level. However, it has been revealed that untranslated regions (UTRs) of mRNA have important impacts on the regulation of proteins. We therefore developed a new probe mapping protocol that addresses three issues of Affymetrix® GeneChip® data analyses: removing nonspecific probes, updating probe target mapping based on the latest genome information and grouping the probes into region (UTR, individual exon), gene and transcript level targets of interest to support a better understanding of the effect of UTRs and individual exons on gene expression levels. Furthermore, we developed an R package, affyCustomCdf, for users to dynamically create custom CDFs. The affyCustomCdf tool takes annotations in a General/Gene Transfer Format File (GTF), aligns probes to gene annotations via Nested Containment List (NCList) indexing and generates a custom Chip Description File (CDF) to regroup probes into probe sets based on a region (UTR and individual exon), transcript or gene level. Our results indicate that removing probes that no longer align to the genome without mismatches or align to multiple locations can help to reduce false-positive differential expression, as can removal of probes in regions overlapping multiple genes. Moreover, our method based on regions can detect changes that would have been missed by analysis based on gene and transcript. It also allows for a better understanding of 3’ UTR dynamics through the reanalysis of publicly available data

    A comparative genomic framework for the in silico design and assessment of molecular typing methods using whole-genome sequence data with application to Listeria monocytogenes

    Get PDF
    xiii, 100 leaves : ill. ; 29 cmAlthough increased genome sequencing e orts have increased our understanding of genomic variability within many bacterial species, there has been limited application of this knowledge towards assessing current molecular typing methods and developing novel molecular typing methods. This thesis reports a novel in silico comparative genomic framework where the performance of typing methods is assessed on the basis of the discriminatory power of the method as well as the concordance of the method with a whole-genome phylogeny. Using this framework, we designed a comparative genomic ngerprinting (CGF) assay for Listeria monocytogenes through optimized molecular marker selection. In silico validation and assessment of the CGF assay against two other molecular typing methods for L. monocytogenes (multilocus sequence typing (MLST) and multiple virulence locus sequence typing (MVLST)) revealed that the CGF assay had better performance than these typing methods. Hence, optimized molecular marker selection can be used to produce highly discriminatory assays with high concordance to whole-genome phylogenies. The framework described in this thesis can be used to assess current molecular typing methods against whole-genome phylogenies and design the next generation of high-performance molecular typing methods from whole-genome sequence data

    Clustering of scientific fields by integrating text mining and bibliometrics.

    Get PDF
    De toenemende verspreiding van wetenschappelijke en technologische publicaties via het internet, en de beschikbaarheid ervan in grootschalige bibliografische databanken, leiden tot enorme mogelijkheden om de wetenschap en technologie in kaart te brengen. Ook de voortdurende toename van beschikbare rekenkracht en de ontwikkeling van nieuwe algoritmen dragen hiertoe bij. Belangrijke uitdagingen blijven echter bestaan. Dit proefschrift bevestigt de hypothese dat de nauwkeurigheid van zowel het clusteren van wetenschappelijke kennisgebieden als het classificeren van publicaties nog verbeterd kunnen worden door het integreren van tekstontginning en bibliometrie. Zowel de tekstuele als de bibliometrische benadering hebben voor- en nadelen, en allebei bieden ze een andere kijk op een corpus van wetenschappelijke publicaties of patenten. Enerzijds is er een schat aan tekstinformatie aanwezig in dergelijke documenten, anderzijds vormen de onderlinge citaties grote netwerken die extra informatie leveren. We integreren beide gezichtspunten en tonen hoe bestaande tekstuele en bibliometrische methoden kunnen verbeterd worden. De dissertatie is opgebouwd uit drie delen: Ten eerste bespreken we het gebruik van tekstontginningstechnieken voor informatievergaring en voor het in kaart brengen van kennis vervat in teksten. We introduceren en demonstreren het raamwerk voor tekstontginning, evenals het gebruik van agglomeratieve hiërarchische clustering. Voorts onderzoeken we de relatie tussen enerzijds de performantie van het clusteren en anderzijds het gewenste aantal clusters en het aantal factoren bij latent semantische indexering. Daarnaast beschrijven we een samengestelde, semi-automatische strategie om het aantal clusters in een verzameling documenten te bepalen. Ten tweede behandelen we netwerken die bestaan uit citaties tussen wetenschappelijke documenten en netwerken die ontstaan uit onderlinge samenwerkingsverbanden tussen auteurs. Dergelijke netwerken kunnen geanalyseerd worden met technieken van de bibliometrie en de grafentheorie, met als doel het rangschikken van relevante entiteiten, het clusteren en het ontdekken van gemeenschappen. Ten derde tonen we de complementariteit aan van tekstontginning en bibliometrie en stellen we mogelijkheden voor om beide werelden op correcte wijze te integreren. De performantie van ongesuperviseerd clusteren en van classificeren verbetert significant door het samenvoegen van de tekstuele inhoud van wetenschappelijke publicaties en de structuur van citatienetwerken. Een methode gebaseerd op statistische meta-analyse behaalt de beste resultaten en overtreft methoden die enkel gebaseerd zijn op tekst of citaties. Onze geïntegreerde of hybride strategieën voor informatievergaring en clustering worden gedemonstreerd in twee domeinstudies. Het doel van de eerste studie is het ontrafelen en visualiseren van de conceptstructuur van de informatiewetenschappen en het toetsen van de toegevoegde waarde van de hybride methode. De tweede studie omvat de cognitieve structuur, bibliometrische eigenschappen en de dynamica van bio-informatica. We ontwikkelen een methode voor dynamisch en geïntegreerd clusteren van evoluerende bibliografische corpora. Deze methode vergelijkt en volgt clusters doorheen de tijd. Samengevat kunnen we stellen dat we voor de complementaire tekst- en netwerkwerelden een hybride clustermethode ontwerpen die tegelijkertijd rekening houdt met beide paradigma's. We tonen eveneens aan dat de geïntegreerde zienswijze een beter begrip oplevert van de structuur en de evolutie van wetenschappelijke kennisgebieden.SISTA;

    Plant-parasitic nematodes: from genomics to functional analysis of parasitism genes

    Get PDF
    Nematodes (roundworms) belong to the largest phylum on earth. The numerous species inhabit practically all ecological niches, including plants. Plant-parasitic species live on plant roots, causing substantial damage to the plant and hampering its development. As such, they cause gigantic economical losses in crop production. We used a molecular approach to analyze the plant-parasitic nematode Radopholus similis by generating expressed sequence tags (ESTs). The most striking discovery was tags corresponding to aWolbachia-like endosymbiont, which was subsequently located in the ovaria of R. similis. Numerous tags corresponding to parasitism genes with potential roles in, amongst other things, host localisation, detoxification, cell wall modification, and even putative host transcriptional reprogramming were identified. In addition, a tool to explore all available nematode EST data is presented in this study. The ‘nematode EST exploration tool’ (NEXT) (http://zion.ugent.be/joachim/next) extends the usefulness by extracting and storing temporal and spatial information of all publicly available nematode EST libraries. Some members of the transthyretin-like gene family of R. similis were characterized. All stages except developing embryos express the analyzed genes, and expression is localized to the ventral nerve cord and tissues surrounding the vulva. Predicted secondary structure is suggestive of a binding capacity with a yet unknown ligand. Further, the annotation of the complete mitochondrial (mt) genome of R. similis is reported. The mt genome has the expected gene content, but shows many aberrant features such as: a considerably smaller 16S rRNA with reduced structures, two large repeat regions, the lack of stop codons on many genes and a unique codon reassignment UAA:Stop to UAA:Tyrosine. The aberrant features in the mt genome could be related to this codon reassignment, but results are ambiguous and require further research. A last part of the study reports on the response of the plant on nematode infection. Signaling of two plant hormones involved in plant defense is measured during early phases of parasitism. In addition, the role of flavonoid compounds produced by the plant is analyzed by infection tests on several mutants

    Bioinformatics applied to human genomics and proteomics: development of algorithms and methods for the discovery of molecular signatures derived from omic data and for the construction of co-expression and interaction networks

    Get PDF
    [EN] The present PhD dissertation develops and applies Bioinformatic methods and tools to address key current problems in the analysis of human omic data. This PhD has been organised by main objectives into four different chapters focused on: (i) development of an algorithm for the analysis of changes and heterogeneity in large-scale omic data; (ii) development of a method for non-parametric feature selection; (iii) integration and analysis of human protein-protein interaction networks and (iv) integration and analysis of human co-expression networks derived from tissue expression data and evolutionary profiles of proteins. In the first chapter, we developed and tested a new robust algorithm in R, called DECO, for the discovery of subgroups of features and samples within large-scale omic datasets, exploring all feature differences possible heterogeneity, through the integration of both data dispersion and predictor-response information in a new statistic parameter called h (heterogeneity score). In the second chapter, we present a simple non-parametric statistic to measure the cohesiveness of categorical variables along any quantitative variable, applicable to feature selection in all types of big data sets. In the third chapter, we describe an analysis of the human interactome integrating two global datasets from high-quality proteomics technologies: HuRI (a human protein-protein interaction network generated by a systematic experimental screening based on Yeast-Two-Hybrid technology) and Cell-Atlas (a comprehensive map of subcellular localization of human proteins generated by antibody imaging). This analysis aims to create a framework for the subcellular localization characterization supported by the human protein-protein interactome. In the fourth chapter, we developed a full integration of three high-quality proteome-wide resources (Human Protein Atlas, OMA and TimeTree) to generate a robust human co-expression network across tissues assigning each human protein along the evolutionary timeline. In this way, we investigate how old in evolution and how correlated are the different human proteins, and we place all them in a common interaction network. As main general comment, all the work presented in this PhD uses and develops a wide variety of bioinformatic and statistical tools for the analysis, integration and enlighten of molecular signatures and biological networks using human omic data. Most of this data corresponds to sample cohorts generated in recent biomedical studies on specific human diseases
    corecore