436 research outputs found
Bioinformatics assisted breeding, from QTL to candidate genes
Over the last decade, the amount of data generated by a single run of a NGS sequencer outperforms days of work done with Sanger sequencing. Metabolomics, proteomics and transcriptomics technologies have also involved producing more and more information at an ever faster rate. In addition, the number of databases available to biologists and breeders is increasing every year. The challenge for them becomes two-fold, namely: to cope with the increased amount of data produced by these new technologies and to cope with the distribution of the information across the Web. An example of a study with a lot of ~omics data is described in Chapter 2, where more than 600 peaks have been measured using liquid chromatography mass-spectrometry (LCMS) in peel and flesh of a segregating F1apple population. In total, 669 mQTL were identified in this study. The amount of mQTL identified is vast and almost overwhelming. Extracting meaningful information from such an experiment requires appropriate data filtering and data visualization techniques. The visualization of the distribution of the mQTL on the genetic map led to the discovery of QTL hotspots on linkage group: 1, 8, 13 and 16. The mQTL hotspot on linkage group 16 was further investigated and mainly contained compounds involved in the phenylpropanoid pathway. The apple genome sequence and its annotation were used to gain insight in genes potentially regulating this QTL hotspot. This led to the identification of the structural gene leucoanthocyanidin reductase (LAR1) as well as seven genes encoding transcription factors as putative candidates regulating the phenylpropanoid pathway, and thus candidates for the biosynthesis of health beneficial compounds. However, this study also indicated bottlenecks in the availability of biologist-friendly tools to visualize large-scale QTL mapping results and smart ways to mine genes underlying QTL intervals. In this thesis, we provide bioinformatics solutions to allow exploration of regions of interest on the genome more efficiently. In Chapter 3, we describe MQ2, a tool to visualize results of large-scale QTL mapping experiments. It allows biologists and breeders to use their favorite QTL mapping tool such as MapQTL or R/qtl and visualize the distribution of these QTL among the genetic map used in the analysis with MQ2. MQ2provides the distribution of the QTL over the markers of the genetic map for a few hundreds traits. MQ2is accessible online via its web interface but can also be used locally via its command line interface. In Chapter 4, we describe Marker2sequence (M2S), a tool to filter out genes of interest from all the genes underlying a QTL. M2S returns the list of genes for a specific genome interval and provides a search function to filter out genes related to the provided keyword(s) by their annotation. Genome annotations often contain cross-references to resources such as the Gene Ontology (GO), or proteins of the UniProt database. Via these annotations, additional information can be gathered about each gene. By integrating information from different resources and offering a way to mine the list of genes present in a QTL interval, M2S provides a way to reduce a list of hundreds of genes to possibly tens or less of genes potentially related to the trait of interest. Using semantic web technologies M2S integrates multiple resources and has the flexibility to extend this integration to more resources as they become available to these technologies. Besides the importance of efficient bioinformatics tools to analyze and visualize data, the work in Chapter 2also revealed the importance of regulatory elements controlling key genes of pathways. The limitation of M2S is that it only considers genes within the interval. In genome annotations, transcription factors are not linked to the trait (keyword) and to the gene it controls, and these relationships will therefore not be considered. By integrating information about the gene regulatory network of the organism into Marker2sequence, it should be able to integrate in its list of genes, genes outside of the QTL interval but regulated by elements present within the QTL interval. In tomato, the genome annotation already lists a number of transcription factors, however, it does not provide any information about their target. In Chapter 5, we describe how we combined transcriptomics information with six genotypes from an Introgression Line (IL) population to find genes differentially expressed while being in a similar genomic background (i.e.: outside of any introgression segments) as the reference genotype (with no introgression). These genes may be differentially expressed as a result of a regulatory element present in an introgression. The promoter regions of these genes have been analyzed for DNA motifs, and putative transcription factor binding sites have been found. The approaches taken in M2S (Chaper 4) are focused on a specific region of the genome, namely the QTL interval. In Chapter 6, we generalized this approach to develop Annotex. Annotex provides a simple way to browse the cross-references existing between biological databases (ChEBI, Rhea, UniProt, GO) and genome annotations. The main concept of Annotex being, that from any type of data present in the databases, one can navigate the cross-references to retrieve the desired type of information. This thesis has resulted in the production of three tools that biologists and breeders can use to speed up their research and build new hypothesis on. This thesis also revealed the state of bioinformatics with regards to data integration. It also reveals the need for integration into annotations (for example, genome annotations, protein annotations, and pathway annotations) of more ontologies than just the Gene Ontology (GO) currently used. Multiple platforms are arising to build these new ontologies but the process of integrating them into existing resources remains to be done. It also confirms the state of the data in plants where multiples resources may contain overlapping. Finally, this thesis also shows what can be achieved when the data is made inter-operable which should be an incentive to the community to work together and build inter-operable, non-overlapping resources, creating a bioinformatics Web for plant research.</p
DNA Copy Number Changes in Human Malignant Fibrous Histiocytomas by Array Comparative Genomic Hybridisation
BACKGROUND: Malignant fibrous histiocytomas (MFHs), or undifferentiated pleomorphic sarcomas, are in general high-grade tumours with extensive chromosomal aberrations. In order to identify recurrent chromosomal regions of gain and loss, as well as novel gene targets of potential importance for MFH development and/or progression, we have analysed DNA copy number changes in 33 MFHs using microarray-based comparative genomic hybridisation (array CGH). PRINCIPAL FINDINGS: In general, the tumours showed numerous gains and losses of large chromosomal regions. The most frequent minimal recurrent regions of gain were 1p33-p32.3, 1p31.3-p31.2 and 1p21.3 (all gained in 58% of the samples), as well as 1q21.2-q21.3 and 20q13.2 (both 55%). The most frequent minimal recurrent regions of loss were 10q25.3-q26.11, 13q13.3-q14.2 and 13q14.3-q21.1 (all lost in 64% of the samples), as well as 2q36.3-q37.2 (61%), 1q41 (55%) and 16q12.1-q12.2 (52%). Statistical analyses revealed that gain of 1p33-p32.3 and 1p21.3 was significantly associated with better patient survival (P = 0.021 and 0.046, respectively). Comparison with similar array CGH data from 44 leiomyosarcomas identified seven chromosomal regions; 1p36.32-p35.2, 1p21.3-p21.1, 1q32.1-q42.13, 2q14.1-q22.2, 4q33-q34.3, 6p25.1-p21.32 and 7p22.3-p13, which were significantly different in copy number between the MFHs and leiomyosarcomas. CONCLUSIONS: A number of recurrent regions of gain and loss have been identified, some of which were associated with better patient survival. Several specific chromosomal regions with significant differences in copy number between MFHs and leiomyosarcomas were identified, and these aberrations may be used as additional tools for the differential diagnosis of MFHs and leiomyosarcomas
Using semantic web technology to accelerate plant breeding
One goal within plant breeding is to find the causal gene(s) explaining a given phenotype. Semantic web technology brings opportu- nities to integration data and information accross spread data sources. Chebi2gene and Marker2sequence are two applications relying on this se- mantic web technology to integration genes, proteins, metabolites, path- ways, literature. Their web-based interface allows biologists to use and explore this network of information
Organ specificity and transcriptional control of metabolic routes revealed by expression QTL profiling of source-sink tissues in a segregating potato population
Background With the completion of genome sequences belonging to some of the major crop plants, new challenges arise to utilize this data for crop improvement and increased food security. The field of genetical genomics has the potential to identify genes displaying heritable differential expression associated to important phenotypic traits. Here we describe the identification of expression QTLs (eQTLs) in two different potato tissues of a segregating potato population and query the potato genome sequence to differentiate between cis- and trans-acting eQTLs in relation to gene subfunctionalization. Results Leaf and tuber samples were analysed and screened for the presence of conserved and tissue dependent eQTLs. Expression QTLs present in both tissues are predominantly cis-acting whilst for tissue specific QTLs, the percentage of trans-acting QTLs increases. Tissue dependent eQTLs were assigned to functional classes and visualized in metabolic pathways. We identified a potential regulatory network on chromosome 10 involving genes crucial for maintaining circadian rhythms and controlling clock output genes. In addition, we show that the type of genetic material screened and sampling strategy applied, can have a high impact on the output of genetical genomics studies. Conclusions Identification of tissue dependent regulatory networks based on mapped differential expression not only gives us insight in tissue dependent gene subfunctionalization but brings new insights into key biological processes and delivers targets for future haplotyping and genetic marker development
Crop Ontology: Vocabulary For Crop-related Concepts
Abstract. A recurrent issue for data integration is the lack of a common and structured vocabulary used by different parties to describe their data sets. The Crop Ontology (www.cropontology.org) project aims to provide a central place where the crop community can gather to generate such standardized vocabularies and structure them into ontologies. Having standardized ontologies opens the world of the Semantic Web to data integration between different data providers. Crop Ontology is a community-based project, providing a central place for the creation of crop-related ontologies, but it can also be integrated into third-party tools through its Application Programming Interface, providing retrieval of specific terms or a more generic search functionality for all terms. The ontologies are available in RDF format, described using the OWL and RDFS standards, allowing them to be consumed by popular semantic reasoners. We believe that Crop Ontology will lead to better description of crop-related data, improving collaboration between partners and should serve as an example for other scientific fields
MED12 Alterations in Both Human Benign and Malignant Uterine Soft Tissue Tumors
The relationship between benign uterine leiomyomas and their malignant counterparts, i.e. leiomyosarcomas and smooth muscle tumors of uncertain malignant potential (STUMP), is still poorly understood. The idea that a leiomyosarcoma could derive from a leiomyoma is still controversial. Recently MED12 mutations have been reported in uterine leiomyomas. In this study we asked whether such mutations could also be involved in leiomyosarcomas and STUMP oncogenesis. For this purpose we examined 33 uterine mesenchymal tumors by sequencing the hot-spot mutation region of MED12. We determined that MED12 is altered in 66.6% of typical leiomyomas as previously reported but also in 11% of STUMP and 20% of leiomyosarcomas. The mutated allele is predominantly expressed in leiomyomas and STUMP. Interestingly all classical leiomyomas exhibit MED12 protein expression while 40% of atypical leiomyomas, 50% of STUMP and 80% of leiomyosarcomas (among them the two mutated ones) do not express MED12. All these tumors without protein expression exhibit complex genomic profiles. No mutations and no expression loss were identified in an additional series of 38 non-uterine leiomyosarcomas. MED12 mutations are not exclusive to leiomyomas but seem to be specific to uterine malignancies. A previous study has suggested that MED12 mutations in leiomyomas could lead to Wnt/β-catenin pathway activation however our immunohistochemistry results show that there is no association between MED12 status and β-catenin nuclear/cytoplasmic localization. Collectively, our results show that subgroups of benign and malignant tumors share a common genetics. We propose here that MED12 alterations could be implicated in the development of smooth muscle tumor and that its expression could be inhibited in malignant tumors
Gene expression profiling identifies distinct molecular subgroups of leiomyosarcoma with clinical relevance
YesBackground: Soft tissue sarcomas are heterogeneous and a major complication in their management is that the existing
classification scheme is not definitive and is still evolving. Leiomyosarcomas, a major histologic category of soft tissue sarcomas,
are malignant tumours displaying smooth muscle differentiation. Although defined as a single group, they exhibit a wide range of
clinical behaviour. We aimed to carry out molecular classification to identify new molecular subgroups with clinical relevance.
Methods: We used gene expression profiling on 20 extra-uterine leiomyosarcomas and cross-study analyses for molecular
classification of leiomyosarcomas. Clinical significance of the subgroupings was investigated.
Results: We have identified two distinct molecular subgroups of leiomyosarcomas. One group was characterised by high
expression of 26 genes that included many genes from the sub-classification gene cluster proposed by Nielsen et al. These
sub-classification genes include genes that have importance structurally, as well as in cell signalling. Notably, we found a
statistically significant association of the subgroupings with tumour grade. Further refinement led to a group of 15 genes that
could recapitulate the tumour subgroupings in our data set and in a second independent sarcoma set. Remarkably, cross-study
analyses suggested that these molecular subgroups could be found in four independent data sets, providing strong support for
their existence.
Conclusions: Our study strongly supported the existence of distinct leiomyosarcoma molecular subgroups, which have clinical
association with tumour grade. Our findings will aid in advancing the classification of leiomyosarcomas and lead to more
individualised and better management of the disease.Alexander Boag Sarcoma Fund
The Neural Crest Migrating into the Twenty-First Century
From the initial discovery of the neural crest over 150 years ago to the seminal studies of Le Douarin and colleagues in the latter part of the twentieth century, understanding of the neural crest has moved from the descriptive to the experimental. Now, in the twenty-first century, neural crest research has migrated into the genomic age. Here, we reflect upon the major advances in neural crest biology and the open questions that will continue to make research on this incredible vertebrate cell type an important subject in developmental biology for the century to come
ChIP-seq Defined Genome-Wide Map of TGFβ/SMAD4 Targets: Implications with Clinical Outcome of Ovarian Cancer
Deregulation of the transforming growth factor-β (TGFβ) signaling pathway in epithelial ovarian cancer has been reported, but the precise mechanism underlying disrupted TGFβ signaling in the disease remains unclear. We performed chromatin immunoprecipitation followed by sequencing (ChIP-seq) to investigate genome-wide screening of TGFβ-induced SMAD4 binding in epithelial ovarian cancer. Following TGFβ stimulation of the A2780 epithelial ovarian cancer cell line, we identified 2,362 SMAD4 binding loci and 318 differentially expressed SMAD4 target genes. Comprehensive examination of SMAD4-bound loci, revealed four distinct binding patterns: 1) Basal; 2) Shift; 3) Stimulated Only; 4) Unstimulated Only. TGFβ stimulated SMAD4-bound loci were primarily classified as either Stimulated only (74%) or Shift (25%), indicating that TGFβ-stimulation alters SMAD4 binding patterns in epithelial ovarian cancer cells. Furthermore, based on gene regulatory network analysis, we determined that the TGFβ-induced, SMAD4-dependent regulatory network was strikingly different in ovarian cancer compared to normal cells. Importantly, the TGFβ/SMAD4 target genes identified in the A2780 epithelial ovarian cancer cell line were predictive of patient survival, based on in silico mining of publically available patient data bases. In conclusion, our data highlight the utility of next generation sequencing technology to identify genome-wide SMAD4 target genes in epithelial ovarian cancer and link aberrant TGFβ/SMAD signaling to ovarian tumorigenesis. Furthermore, the identified SMAD4 binding loci, combined with gene expression profiling and in silico data mining of patient cohorts, may provide a powerful approach to determine potential gene signatures with biological and future translational research in ovarian and other cancers
- …