8 research outputs found

    Improving reproducibility and reuse of modelling results in the life sciences

    Get PDF
    Research results are complex and include a variety of heterogeneous data. This entails major computational challenges to (i) to manage simulation studies, (ii) to ensure model exchangeability, stability and validity, and (iii) to foster communication between partners. I describe techniques to improve the reproducibility and reuse of modelling results. First, I introduce a method to characterise differences in computational models. Second, I present approaches to obtain shareable and reproducible research results. Altogether, my methods and tools foster exchange and reuse of modelling results.Die verteilte Entwicklung von komplexen Simulationsstudien birgt eine große Zahl an informationstechnischen Herausforderungen: (i) Modelle müssen verwaltet werden; (ii) Reproduzierbarkeit, Stabilität und Gültigkeit von Ergebnissen muss sichergestellt werden; und (iii) die Kommunikation zwischen Partnern muss verbessert werden. Ich stelle Techniken vor, um die Reproduzierbarkeit und Wiederverwendbarkeit von Modellierungsergebnissen zu verbessern. Meine Implementierungen wurden erfolgreich in internationalen Anwendungen integriert und fördern das Teilen von wissenschaftlichen Ergebnissen

    Data Standards for the Genomes to Life Program

    Full text link

    The use of mobile phones as service-delivery devices in sign language machine translation system

    Get PDF
    Masters of ScienceThis thesis investigates the use of mobile phones as service-delivery devices in a sign language machine translation system. Four sign language visualization methods were evaluated on mobile phones. Three of the methods were synthetic sign language visualization methods. Three factors were considered: the intelligibility of sign language, as rendered by the method; the power consumption; and the bandwidth usage associated with each method. The average intelligibility rate was 65%, with some methods achieving intelligibility rates of up to 92%. The average size was 162 KB and, on average, the power consumption increased to 180% of the idle state, across all methods. This research forms part of the Integration of Signed and Verbal Communication: South African Sign Language Recognition and Animation (SASL) project at the University of the Western Cape and serves as an integration platform for the group's research. In order to perform this research a machine translation system that uses mobile phones as service-delivery devices was developed as well as a 3D Avatar for mobile phones. It was concluded that mobile phones are suitable service-delivery platforms for sign language machine translation systems.South Afric

    Bioinformatics assisted breeding, from QTL to candidate genes

    Get PDF
    Over the last decade, the amount of data generated by a single run of a NGS sequencer outperforms days of work done with Sanger sequencing. Metabolomics, proteomics and transcriptomics technologies have also involved producing more and more information at an ever faster rate. In addition, the number of databases available to biologists and breeders is increasing every year. The challenge for them becomes two-fold, namely: to cope with the increased amount of data produced by these new technologies and to cope with the distribution of the information across the Web. An example of a study with a lot of ~omics data is described in Chapter 2, where more than 600 peaks have been measured using liquid chromatography mass-spectrometry (LCMS) in peel and flesh of a segregating F1apple population. In total, 669 mQTL were identified in this study. The amount of mQTL identified is vast and almost overwhelming. Extracting meaningful information from such an experiment requires appropriate data filtering and data visualization techniques. The visualization of the distribution of the mQTL on the genetic map led to the discovery of QTL hotspots on linkage group: 1, 8, 13 and 16. The mQTL hotspot on linkage group 16 was further investigated and mainly contained compounds involved in the phenylpropanoid pathway. The apple genome sequence and its annotation were used to gain insight in genes potentially regulating this QTL hotspot. This led to the identification of the structural gene leucoanthocyanidin reductase (LAR1) as well as seven genes encoding transcription factors as putative candidates regulating the phenylpropanoid pathway, and thus candidates for the biosynthesis of health beneficial compounds. However, this study also indicated bottlenecks in the availability of biologist-friendly tools to visualize large-scale QTL mapping results and smart ways to mine genes underlying QTL intervals. In this thesis, we provide bioinformatics solutions to allow exploration of regions of interest on the genome more efficiently. In Chapter 3, we describe MQ2, a tool to visualize results of large-scale QTL mapping experiments. It allows biologists and breeders to use their favorite QTL mapping tool such as MapQTL or R/qtl and visualize the distribution of these QTL among the genetic map used in the analysis with MQ2. MQ2provides the distribution of the QTL over the markers of the genetic map for a few hundreds traits. MQ2is accessible online via its web interface but can also be used locally via its command line interface. In Chapter 4, we describe Marker2sequence (M2S), a tool to filter out genes of interest from all the genes underlying a QTL. M2S returns the list of genes for a specific genome interval and provides a search function to filter out genes related to the provided keyword(s) by their annotation. Genome annotations often contain cross-references to resources such as the Gene Ontology (GO), or proteins of the UniProt database. Via these annotations, additional information can be gathered about each gene. By integrating information from different resources and offering a way to mine the list of genes present in a QTL interval, M2S provides a way to reduce a list of hundreds of genes to possibly tens or less of genes potentially related to the trait of interest. Using semantic web technologies M2S integrates multiple resources and has the flexibility to extend this integration to more resources as they become available to these technologies. Besides the importance of efficient bioinformatics tools to analyze and visualize data, the work in Chapter 2also revealed the importance of regulatory elements controlling key genes of pathways. The limitation of M2S is that it only considers genes within the interval. In genome annotations, transcription factors are not linked to the trait (keyword) and to the gene it controls, and these relationships will therefore not be considered. By integrating information about the gene regulatory network of the organism into Marker2sequence, it should be able to integrate in its list of genes, genes outside of the QTL interval but regulated by elements present within the QTL interval. In tomato, the genome annotation already lists a number of transcription factors, however, it does not provide any information about their target. In Chapter 5, we describe how we combined transcriptomics information with six genotypes from an Introgression Line (IL) population to find genes differentially expressed while being in a similar genomic background (i.e.: outside of any introgression segments) as the reference genotype (with no introgression). These genes may be differentially expressed as a result of a regulatory element present in an introgression. The promoter regions of these genes have been analyzed for DNA motifs, and putative transcription factor binding sites have been found. The approaches taken in M2S (Chaper 4) are focused on a specific region of the genome, namely the QTL interval. In Chapter 6, we generalized this approach to develop Annotex. Annotex provides a simple way to browse the cross-references existing between biological databases (ChEBI, Rhea, UniProt, GO) and genome annotations. The main concept of Annotex being, that from any type of data present in the databases, one can navigate the cross-references to retrieve the desired type of information. This thesis has resulted in the production of three tools that biologists and breeders can use to speed up their research and build new hypothesis on. This thesis also revealed the state of bioinformatics with regards to data integration. It also reveals the need for integration into annotations (for example, genome annotations, protein annotations, and pathway annotations) of more ontologies than just the Gene Ontology (GO) currently used. Multiple platforms are arising to build these new ontologies but the process of integrating them into existing resources remains to be done. It also confirms the state of the data in plants where multiples resources may contain overlapping. Finally, this thesis also shows what can be achieved when the data is made inter-operable which should be an incentive to the community to work together and build inter-operable, non-overlapping resources, creating a bioinformatics Web for plant research.</p

    MIME Media Type for the Systems Biology Markup Language (SBML)

    No full text
    corecore