1,699 research outputs found

    maigesPack: A Computational Environment for Microarray Data Analysis

    Full text link
    Microarray technology is still an important way to assess gene expression in molecular biology, mainly because it measures expression profiles for thousands of genes simultaneously, what makes this technology a good option for some studies focused on systems biology. One of its main problem is complexity of experimental procedure, presenting several sources of variability, hindering statistical modeling. So far, there is no standard protocol for generation and evaluation of microarray data. To mitigate the analysis process this paper presents an R package, named maigesPack, that helps with data organization. Besides that, it makes data analysis process more robust, reliable and reproducible. Also, maigesPack aggregates several data analysis procedures reported in literature, for instance: cluster analysis, differential expression, supervised classifiers, relevance networks and functional classification of gene groups or gene networks

    Development of Biclustering Techniques for Gene Expression Data Modeling and Mining

    Get PDF
    The next-generation sequencing technologies can generate large-scale biological data with higher resolution, better accuracy, and lower technical variation than the arraybased counterparts. RNA sequencing (RNA-Seq) can generate genome-scale gene expression data in biological samples at a given moment, facilitating a better understanding of cell functions at genetic and cellular levels. The abundance of gene expression datasets provides an opportunity to identify genes with similar expression patterns across multiple conditions, i.e., co-expression gene modules (CEMs). Genomescale identification of CEMs can be modeled and solved by biclustering, a twodimensional data mining technique that allows clustering of rows and columns in a gene expression matrix, simultaneously. Compared with traditional clustering that targets global patterns, biclustering can predict local patterns. This unique feature makes biclustering very useful when applied to big gene expression data since genes that participate in a cellular process are only active in specific conditions, thus are usually coexpressed under a subset of all conditions. The combination of biclustering and large-scale gene expression data holds promising potential for condition-specific functional pathway/network analysis. However, existing biclustering tools do not have satisfied performance on high-resolution RNA-Seq data, majorly due to the lack of (i) a consideration of high sparsity of RNA-Seq data, especially for scRNA-Seq data, and (ii) an understanding of the underlying transcriptional regulation signals of the observed gene expression values. QUBIC2, a novel biclustering algorithm, is designed for large-scale bulk RNA-Seq and single-cell RNA-seq (scRNA-Seq) data analysis. Critical novelties of the algorithm include (i) used a truncated model to handle the unreliable quantification of genes with low or moderate expression; (ii) adopted the Gaussian mixture distribution and an information-divergency objective function to capture shared transcriptional regulation signals among a set of genes; (iii) utilized a Dual strategy to expand the core biclusters, aiming to save dropouts from the background; and (iv) developed a statistical framework to evaluate the significances of all the identified biclusters. Method validation on comprehensive data sets suggests that QUBIC2 had superior performance in functional modules detection and cell type classification. The applications of temporal and spatial data demonstrated that QUBIC2 could derive meaningful biological information from scRNA-Seq data. Also presented in this dissertation is QUBICR. This R package is characterized by an 82% average improved efficiency compared to the source C code of QUBIC. It provides a set of comprehensive functions to facilitate biclustering-based biological studies, including the discretization of expression data, query-based biclustering, bicluster expanding, biclusters comparison, heatmap visualization of any identified biclusters, and co-expression networks elucidation. In the end, a systematical summary is provided regarding the primary applications of biclustering for biological data and more advanced applications for biomedical data. It will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency

    limma powers differential expression analyses for RNA-sequencing and microarray studies

    Get PDF
    limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously describe

    Application of knowledge discovery and data mining methods in livestock genomics for hypothesis generation and identification of biomarker candidates influencing meat quality traits in pigs

    Get PDF
    Recent advancements in genomics and genome profiling technologies have lead to an increase in the amount of data available in livestock genomics. Yet, most of the studies done in livestock genomics have been following a reductionist approach and very few studies have either followed data mining or knowledge discovery concepts or made use of the wealth of information available in the public domain to gain new knowledge. The goals of this thesis were: (i) the adoption of existing analysis strategies or the development of novel approaches in livestock genomics for integrative data analysis following the principles of data mining and knowledge discovery and (ii) demonstrating the application of such approaches in livestockgenomics for hypothesis generation and biomarker discovery. A pig meat quality trait termed androstenone measurement in backfat was selected as the target phenotype for the experiments. Two experiments were performed as a part of this thesis. The first one followed a knowledge driven approach merging high-throughput expression data with metabolic interaction network. Based on the results from this experiment, several novel biomarker candidates and a hypothesis regarding different mechanisms regulating androstenone synthesis in porcine testis samples with divergent androstenone measurements in back fat were proposed. The model proposed that the elevated levels of androstenone synthesis in sample population could be due to the combined effect of cAMP/PKA signaling, elevated levels of fatty acid metabolism and anti lipid peroxidation activity of members of glutathione metabolic pathway. The second experiment followed a data driven approach and integrated gene expression data from multiple porcine populations to identify similarities in gene expression patterns related to hepatic androstenone metabolism. The results indicated that one of the low androstenone phenotype specific co-expression cluster was functionally enriched in pathways related to androgen and androstenone metabolism and that the members of this cluster exhibited weak co-expression in high androstenone phenotype. Based on the results from this experiment, this co-expression cluster was proposed as a signature cluster for hepatic androstenone metabolism in boars with low androstenone content in back fat. The results from these experiments indicate that integrative analysis approaches following data mining and knowledge discovery concepts can be used for the generation of new knowledge from existing data in livestock genomics. But, limited data availability in livestock genomics is a hindrance to the extensive use such analysis methods in livestock genomics field for gaining new knowledge. In conclusion, this study was aimed at demonstrating the capabilities of data mining and knowledge discovery methods and integrative analysis approaches to generate new knowledge in livestock genomics using existing datasets. The results from the experiments hint the possibilities of further exploring such methods for knowledge generation in this field. Although the application of such methods is limited in livestock genomics due to data availability issues at present, the increase in data availability due to evolving high throughput technologies and decrease in data generation costs would aid in the wide spread use of such methods in livestock genomics in the coming future

    Annotation of gene function in citrus using gene expression information and co-expression networks

    No full text
    Background The genus Citrus encompasses major cultivated plants such as sweet orange, mandarin, lemon and grapefruit, among the world’s most economically important fruit crops. With increasing volumes of transcriptomics data available for these species, Gene Co-expression Network (GCN) analysis is a viable option for predicting gene function at a genome-wide scale. GCN analysis is based on a “guilt-by-association” principle whereby genes encoding proteins involved in similar and/or related biological processes may exhibit similar expression patterns across diverse sets of experimental conditions. While bioinformatics resources such as GCN analysis are widely available for efficient gene function prediction in model plant species including Arabidopsis, soybean and rice, in citrus these tools are not yet developed Results We have constructed a comprehensive GCN for citrus inferred from 297 publicly available Affymetrix Genechip Citrus Genome microarray datasets, providing gene co-expression relationships at a genome-wide scale (33,000 transcripts). The comprehensive citrus GCN consists of a global GCN (condition-independent) and four condition-dependent GCNs that survey the sweet orange species only, all citrus fruit tissues, all citrus leaf tissues, or stress-exposed plants. All of these GCNs are clustered using genome-wide, gene-centric (guide) and graph clustering algorithms for flexibility of gene function prediction. For each putative cluster, gene ontology (GO) enrichment and gene expression specificity analyses were performed to enhance gene function, expression and regulation pattern prediction. The guide-gene approach was used to infer novel roles of genes involved in disease susceptibility and vitamin C metabolism, and graph-clustering approaches were used to investigate isoprenoid/phenylpropanoid metabolism in citrus peel, and citric acid catabolism via the GABA shunt in citrus fruit Conclusions Integration of citrus gene co-expression networks, functional enrichment analysis and gene expression information provide opportunities to infer gene function in citrus. We present a publicly accessible tool, Network Inference for Citrus Co-Expression (NICCE, http://citrus.adelaide.edu.au/nicce/home.aspx), for the gene co-expression analysis in citru

    An investigation into the molecular determinants of salmon louse (Lepeophtheirus salmonis (Krøyer, 1837)) susceptibility to the antiparasitic drug emamectin benzoate.

    Get PDF
    Caligid copepods, also called sea lice, are ectoparasites of marine fish, with Lepeophtheirus salmonis (Krøyer, 1837) emerging as a problem for mariculture of Atlantic salmon (Salmo salar Linnaeus, 1758) in the northern hemisphere. Annual costs of sea lice to global salmon farming was estimated to be in excess of €300 million in 2006, with the majority of this accounted for through expenses accrued from chemical treatments. Only a limited range of anti-sea louse drugs are available and licensed for the treatment of fish, and the continued use of only a few compounds creates a situation potentially favouring the development of drug resistance. Emamectin benzoate (EMB) is currently used as a salmon delousing agent, being employed as a 0.2 % in-feed pre-mix (SLICE®). Atlantic salmon farmers have reported increased incidence of reduced L. salmonis sensitivity to SLICE®, which has highlighted the requirement for further research into the molecular mechanisms controlling salmon louse resistance to EMB. Genomic and transcriptomic research concerning L. salmonis drug resistance mechanisms has not often been reported, with previous transcriptomic studies using candidate gene approaches and genetic studies focussing on population genetics. Drug resistance in ecdysozoan invertebrates is associated with a variety of molecular mechanisms including target site mutations and changes in the expression of components in drug detoxification pathways. The research reported in this thesis was aimed at the exploration of mechanisms employed by L. salmonis to reduce the toxicity of EMB exposure, following a transcriptomic approach that utilised custom oligonucleotide (oligo) microarrays and a genetic approach that utilised Restriction-site associated DNA sequencing (RAD-seq) to identify Single Nucleotide Polymorphism (SNP) markers. An EMB-resistant (PT) and drug-susceptible (S) L. salmonis laboratory-maintained strain were to be used as a model for this research, as these two strains differ in EMB susceptibility (~ 7-fold) and show stable susceptibility profiles through multiple generations, suggesting that this drug resistance phenotype may be a heritable trait. Sequence resources available for salmon lice are limited as an annotated L. salmonis genome is currently under construction. Therefore, a significant amount of this study involved creating new resources to facilitate the analysis of EMB susceptibility. Suppression subtractive hybridisation (SSH) was used to enrich for transcripts that were differentially expressed between strains PT and S, which provided sufficient target sequence for the development of 15K oligo microarrays when combined with sequences assembled from existing L. salmonis ESTs. Additionally, transcripts were generated through sequencing a pooled sample representing key developmental stages of the L. salmonis life cycle, which were later used in the construction of a 44K oligo microarray. The toxicity of EMB and other avermectins (AVMs) against ecdysozoan invertebrates is reported to be based mainly on their interaction with ligand-gated ion channels (LGIC), specifically glutamate-gated chloride channels (GluCl). However, -aminobutyric acid (GABA)-gated chloride channels (GABA-Cls) are also believed to be targeted by AVMs and neuronal acetylcholine receptors (nAChRs) can be allosterically modulated by the AVM compound ivermectin. Transcriptional responses in PT and S salmon lice were investigated using custom 15K L. salmonis oligo microarrays. In the absence of EMB exposure, 359 targets differed in transcript abundance between the two strains. GABA-Cl and nAChR subunits showed significantly lower transcript levels in PT compared to S lice, which was estimated at ~1.4-fold for GABA-Cl and ~2.8-fold for nAChR using RT-qPCR, suggesting their involvement in AVM toxicity in caligids. Although, salmon lice from the PT strain showed few transcriptional responses following acute exposure (1 or 3 h) to 200 µg L-1 of EMB, a drug concentration tolerated by PT lice, but toxic for S lice. RAD-seq analysis of both genders from L. salmonis strains S and PT identified 15 RAD-markers that show complete association with salmon louse strain, although these preliminary results will need further analysis to confirm marker association with reduced EMB susceptibility. Additionally, RAD marker Lsa101901 showed complete association with sex for all individuals analysed, being heterozygous in females and homozygous in males. Using an allele-specific PCR assay, this SNP association pattern was further confirmed for three unrelated salmon louse strains. Marker Lsa101901 was located in the coding region of the prohibitin-2 gene, which showed a sex-dependent differential expression, with mRNA levels determined by RT-qPCR about 1.8-fold higher in adult female than adult male salmon lice. In conclusion, the identification of decreased transcript abundances for LGIC subunits in EMB-resistant salmon lice, and polymorphic SNP markers showing complete association with L. salmonis strains S or PT, provides suitable candidates for further investigation into their association with reduced EMB susceptibility. Further analysis will also be required to confirm whether EMB-induced mechanisms are not associated with reduced EMB susceptibility in L. salmonis. Additionally, the identification of sex-linked SNP Lsa101901 suggests that sex determination in the salmon louse is genetic and follows a female heterozygous system, with marker Lsa101901 providing a tool to determine the genetic sex of salmon lice. Improved knowledge of L. salmonis biology and the mechanisms potentially involved in EMB resistance, obtained during this study, may provide molecular markers that contribute to successful monitoring and management of this commercially important parasite of Atlantic salmon
    • …
    corecore