7,476 research outputs found
Can Zipf's law be adapted to normalize microarrays?
BACKGROUND: Normalization is the process of removing non-biological sources of variation between array experiments. Recent investigations of data in gene expression databases for varying organisms and tissues have shown that the majority of expressed genes exhibit a power-law distribution with an exponent close to -1 (i.e. obey Zipf's law). Based on the observation that our single channel and two channel microarray data sets also followed a power-law distribution, we were motivated to develop a normalization method based on this law, and examine how it compares with existing published techniques. A computationally simple and intuitively appealing technique based on this observation is presented. RESULTS: Using pairwise comparisons using MA plots (log ratio vs. log intensity), we compared this novel method to previously published normalization techniques, namely global normalization to the mean, the quantile method, and a variation on the loess normalization method designed specifically for boutique microarrays. Results indicated that, for single channel microarrays, the quantile method was superior with regard to eliminating intensity-dependent effects (banana curves), but Zipf's law normalization does minimize this effect by rotating the data distribution such that the maximal number of data points lie on the zero of the log ratio axis. For two channel boutique microarrays, the Zipf's law normalizations performed as well as, or better than existing techniques. CONCLUSION: Zipf's law normalization is a useful tool where the Quantile method cannot be applied, as is the case with microarrays containing functionally specific gene sets (boutique arrays)
Broad Epigenetic Signature of Maternal Care in the Brain of Adult Rats
BACKGROUND: Maternal care is associated with long-term effects on behavior and epigenetic programming of the NR3C1 (GLUCOCORTICOID RECEPTOR) gene in the hippocampus of both rats and humans. In the rat, these effects are reversed by cross-fostering, demonstrating that they are defined by epigenetic rather than genetic processes. However, epigenetic changes at a single gene promoter are unlikely to account for the range of outcomes and the persistent change in expression of hundreds of additional genes in adult rats in response to differences in maternal care. METHODOLOGY/PRINCIPAL FINDINGS: We examine here using high-density oligonucleotide array the state of DNA methylation, histone acetylation and gene expression in a 7 million base pair region of chromosome 18 containing the NR3C1 gene in the hippocampus of adult rats. Natural variations in maternal care are associated with coordinate epigenetic changes spanning over a hundred kilobase pairs. The adult offspring of high compared to low maternal care mothers show epigenetic changes in promoters, exons, and gene ends associated with higher transcriptional activity across many genes within the locus examined. Other genes in this region remain unchanged, indicating a clustered yet specific and patterned response. Interestingly, the chromosomal region containing the protocadherin-α, -β, and -γ (Pcdh) gene families implicated in synaptogenesis show the highest differential response to maternal care. CONCLUSIONS/SIGNIFICANCE: The results suggest for the first time that the epigenetic response to maternal care is coordinated in clusters across broad genomic areas. The data indicate that the epigenetic response to maternal care involves not only single candidate gene promoters but includes transcriptional and intragenic sequences, as well as those residing distantly from transcription start sites. These epigenetic and transcriptional profiles constitute the first tiling microarray data set exploring the relationship between epigenetic modifications and RNA expression in both protein coding and non-coding regions across a chromosomal locus in the mammalian brain
Insights into distributed feature ranking
This version of the article: Bolón-Canedo, V., Sechidis, K., Sánchez-Maroño, N., Alonso-Betanzos, A., & Brown, G. (2019). ‘Insights into distributed feature ranking’ has been accepted for publication in: Information Sciences, 496, 378–398. The Version of Record is available online at https://doi.org/10.1016/j.ins.2018.09.045.[Abstract]: In an era in which the volume and complexity of datasets is continuously growing, feature selection techniques have become indispensable to extract useful information from huge amounts of data. However, existing algorithms may not scale well when dealing with huge datasets, and a possible solution is to distribute the data in several nodes. In this work we explore the different ways of distributing the data (by features and by samples) and we evaluate to what extent it is possible to obtain similar results as those obtained with the whole dataset. Trying to deal with the challenge of distributing the feature ranking process, we have performed experiments with different aggregation methods and feature rankers, and also evaluated the effect of distributing the feature ranking process in the subsequent classification performance.This research has been economically supported in part by the Spanish
Ministerio de EconomÃa y Competitividad and FEDER funds of the European
Union through the research project TIN2015-65069-C2-1-R; and by the
ConsellerÃa de Industria of the Xunta de Galicia through the research project
GRC2014/035. Financial support from the Xunta de Galicia (Centro singular
de investigación de Galicia accreditation 2016-2019) and the European Union
(European Regional Development Fund - ERDF), is gratefully acknowledged
(research project ED431G/01). V. Bolón-Canedo acknowledges support of
the Xunta de Galicia under postdoctoral Grant code ED481B 2014/164-0.Xunta de Galicia; GRC2014/035Xunta de Galicia; ED431G/01Xunta de Galicia; ED481B 2014/164-
New Trends in Artificial Intelligence: Applications of Particle Swarm Optimization in Biomedical Problems
Optimization is a process to discover the most effective element or solution from a set of all possible resources or solutions. Currently, there are various biological problems such as extending from biomolecule structure prediction to drug discovery that can be elevated by opting standard protocol for optimization. Particle swarm optimization (PSO) process, purposed by Dr. Eberhart and Dr. Kennedy in 1995, is solely based on population stochastic optimization technique. This method was designed by the researchers after inspired by social behavior of flocking bird or schooling fishes. This method shares numerous resemblances with the evolutionary computation procedures such as genetic algorithms (GA). Since, PSO algorithms is easy process to subject with minor adjustment of a few restrictions, it has gained more attention or advantages over other population based algorithms. Hence, PSO algorithms is widely used in various research fields like ranging from artificial neural network training to other areas where GA can be used in the system
Intelligent techniques using molecular data analysis in leukaemia: an opportunity for personalized medicine support system
The use of intelligent techniques in medicine has brought a ray of hope in terms of treating leukaemia patients. Personalized treatment uses patient’s genetic profile to select a mode of treatment. This process makes use of molecular technology and machine learning, to determine the most suitable approach to treating a leukaemia patient. Until now, no reviews have been published from a computational perspective concerning the development of personalized medicine intelligent techniques for leukaemia patients using molecular data analysis. This review studies the published empirical research on personalized medicine in leukaemia and synthesizes findings across studies related to intelligence techniques in leukaemia, with specific attention to particular categories of these studies to help identify opportunities for further research into personalized medicine support systems in chronic myeloid leukaemia. A systematic search was carried out to identify studies using intelligence techniques in leukaemia and to categorize these studies based on leukaemia type and also the task, data source, and purpose of the studies. Most studies used molecular data analysis for personalized medicine, but future advancement for leukaemia patients requires molecular models that use advanced machine-learning methods to automate decision-making in treatment management to deliver supportive medical information to the patient in clinical practice.Haneen Banjar, David Adelson, Fred Brown, and Naeem Chaudhr
Recommended from our members
Meta-analysis of massively parallel reporter assays enables prediction of regulatory function across cell types.
Deciphering the potential of noncoding loci to influence gene regulation has been the subject of intense research, with important implications in understanding genetic underpinnings of human diseases. Massively parallel reporter assays (MPRAs) can measure regulatory activity of thousands of DNA sequences and their variants in a single experiment. With increasing number of publically available MPRA data sets, one can now develop data-driven models which, given a DNA sequence, predict its regulatory activity. Here, we performed a comprehensive meta-analysis of several MPRA data sets in a variety of cellular contexts. We first applied an ensemble of methods to predict MPRA output in each context and observed that the most predictive features are consistent across data sets. We then demonstrate that predictive models trained in one cellular context can be used to predict MPRA output in another, with loss of accuracy attributed to cell-type-specific features. Finally, we show that our approach achieves top performance in the Fifth Critical Assessment of Genome Interpretation "Regulation Saturation" Challenge for predicting effects of single-nucleotide variants. Overall, our analysis provides insights into how MPRA data can be leveraged to highlight functional regulatory regions throughout the genome and can guide effective design of future experiments by better prioritizing regions of interest
Making open data work for plant scientists
Despite the clear demand for open data sharing, its implementation within plant science is still limited. This is, at least in part, because open data-sharing raises several unanswered questions and challenges to current research practices. In this commentary, some of the challenges encountered by plant researchers at the bench when generating, interpreting, and attempting to disseminate their data have been highlighted. The difficulties involved in sharing sequencing, transcriptomics, proteomics, and metabolomics data are reviewed. The benefits and drawbacks of three data-sharing venues currently available to plant scientists are identified and assessed: (i) journal publication; (ii) university repositories; and (iii) community and project-specific databases. It is concluded that community and project-specific databases are the most useful to researchers interested in effective data sharing, since these databases are explicitly created to meet the researchers’ needs, support extensive curation, and embody a heightened awareness of what it takes to make data reuseable by others. Such bottom-up and community-driven approaches need to be valued by the research community, supported by publishers, and provided with long-term sustainable support by funding bodies and government. At the same time, these databases need to be linked to generic databases where possible, in order to be discoverable to the majority of researchers and thus promote effective and efficient data sharing. As we look forward to a future that embraces open access to data and publications, it is essential that data policies, data curation, data integration, data infrastructure, and data funding are linked together so as to foster data access and research productivity
- …