2,703 research outputs found

    A Grid-based solution for management and analysis of microarrays in distributed experiments

    Get PDF
    Several systems have been presented in the last years in order to manage the complexity of large microarray experiments. Although good results have been achieved, most systems tend to lack in one or more fields. A Grid based approach may provide a shared, standardized and reliable solution for storage and analysis of biological data, in order to maximize the results of experimental efforts. A Grid framework has been therefore adopted due to the necessity of remotely accessing large amounts of distributed data as well as to scale computational performances for terabyte datasets. Two different biological studies have been planned in order to highlight the benefits that can emerge from our Grid based platform. The described environment relies on storage services and computational services provided by the gLite Grid middleware. The Grid environment is also able to exploit the added value of metadata in order to let users better classify and search experiments. A state-of-art Grid portal has been implemented in order to hide the complexity of framework from end users and to make them able to easily access available services and data. The functional architecture of the portal is described. As a first test of the system performances, a gene expression analysis has been performed on a dataset of Affymetrix GeneChip® Rat Expression Array RAE230A, from the ArrayExpress database. The sequence of analysis includes three steps: (i) group opening and image set uploading, (ii) normalization, and (iii) model based gene expression (based on PM/MM difference model). Two different Linux versions (sequential and parallel) of the dChip software have been developed to implement the analysis and have been tested on a cluster. From results, it emerges that the parallelization of the analysis process and the execution of parallel jobs on distributed computational resources actually improve the performances. Moreover, the Grid environment have been tested both against the possibility of uploading and accessing distributed datasets through the Grid middleware and against its ability in managing the execution of jobs on distributed computational resources. Results from the Grid test will be discussed in a further paper

    GPU cards as a low cost solution for efficient and fast classification of high dimensional gene expression datasets

    Get PDF
    The days when bioinformatics tools will be so reliable to become a standard aid in routine clinical diagnostics are getting very close. However, it is important to remember that the more complex and advanced bioinformatics tools become, the more performances are required by the computing platforms. Unfortunately, the cost of High Performance Computing (HPC) platforms is still prohibitive for both public and private medical practices. Therefore, to promote and facilitate the use of bioinformatics tools it is important to identify low-cost parallel computing solutions. This paper presents a successful experience in using the parallel processing capabilities of Graphical Processing Units (GPU) to speed up classification of gene expression profiles. Results show that using open source CUDA programming libraries allows to obtain a significant increase in performances and therefore to shorten the gap between advanced bioinformatics tools and real medical practic

    Survival Online: a web-based service for the analysis of correlations between gene expression and clinical and follow-up data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Complex microarray gene expression datasets can be used for many independent analyses and are particularly interesting for the validation of potential biomarkers and multi-gene classifiers. This article presents a novel method to perform correlations between microarray gene expression data and clinico-pathological data through a combination of available and newly developed processing tools.</p> <p>Results</p> <p>We developed Survival Online (available at <url>http://ada.dist.unige.it:8080/enginframe/bioinf/bioinf.xml</url>), a Web-based system that allows for the analysis of Affymetrix GeneChip microarrays by using a parallel version of dChip. The user is first enabled to select pre-loaded datasets or single samples thereof, as well as single genes or lists of genes. Expression values of selected genes are then correlated with sample annotation data by uni- or multi-variate Cox regression and survival analyses. The system was tested using publicly available breast cancer datasets and GO (Gene Ontology) derived gene lists or single genes for survival analyses.</p> <p>Conclusion</p> <p>The system can be used by bio-medical researchers without specific computation skills to validate potential biomarkers or multi-gene classifiers. The design of the service, the parallelization of pre-processing tasks and the implementation on an HPC (High Performance Computing) environment make this system a useful tool for validation on several independent datasets.</p

    A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray techniques are one of the main methods used to investigate thousands of gene expression profiles for enlightening complex biological processes responsible for serious diseases, with a great scientific impact and a wide application area. Several standalone applications had been developed in order to analyze microarray data. Two of the most known free analysis software packages are the R-based Bioconductor and dChip. The part of dChip software concerning the calculation and the analysis of gene expression has been modified to permit its execution on both cluster environments (supercomputers) and Grid infrastructures (distributed computing).</p> <p>This work is not aimed at replacing existing tools, but it provides researchers with a method to analyze large datasets without any hardware or software constraints.</p> <p>Results</p> <p>An application able to perform the computation and the analysis of gene expression on large datasets has been developed using algorithms provided by dChip. Different tests have been carried out in order to validate the results and to compare the performances obtained on different infrastructures. Validation tests have been performed using a small dataset related to the comparison of HUVEC (Human Umbilical Vein Endothelial Cells) and Fibroblasts, derived from same donors, treated with IFN-α.</p> <p>Moreover performance tests have been executed just to compare performances on different environments using a large dataset including about 1000 samples related to Breast Cancer patients.</p> <p>Conclusion</p> <p>A Grid-enabled software application for the analysis of large Microarray datasets has been proposed. DChip software has been ported on Linux platform and modified, using appropriate parallelization strategies, to permit its execution on both cluster environments and Grid infrastructures. The added value provided by the use of Grid technologies is the possibility to exploit both computational and data Grid infrastructures to analyze large datasets of distributed data. The software has been validated and performances on cluster and Grid environments have been compared obtaining quite good scalability results.</p

    Supporting Regularized Logistic Regression Privately and Efficiently

    Full text link
    As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Increasing concerns over data privacy make it more and more difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used machine learning model in various disciplines while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluation on several studies validated the privacy guarantees, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc

    Novel design and controls for focused DNA microarrays: applications in quality assurance/control and normalization for the Health Canada ToxArray™

    Get PDF
    BACKGROUND: Microarray normalizations typically apply methods that assume absence of global transcript shifts, or absence of changes in internal control features such as housekeeping genes. These normalization approaches are not appropriate for focused arrays with small sets of genes where a large portion may be expected to change. Furthermore, many microarrays lack control features that can be used for quality assurance (QA). Here, we describe a novel external control series integrated with a design feature that addresses the above issues. RESULTS: An EC dilution series that involves spike-in of a single concentration of the A. thaliana chlorophyll synthase gene to hybridize against spotted dilutions (0.000015 to 100 μM) of a single complimentary oligonucleotide representing the gene was developed. The EC series is printed in duplicate within each subgrid of the microarray and covers the full range of signal intensities from background to saturation. The design and placement of the series allows for QA examination of frequently encountered problems in hybridization (e.g., uneven hybridizations) and printing (e.g., cross-spot contamination). Additionally, we demonstrate that the series can be integrated with a LOWESS normalization to improve the detection of differential gene expression (improved sensitivity and predictivity) over LOWESS normalization on its own. CONCLUSION: The quality of microarray experiments and the normalization methods used affect the ability to measure accurate changes in gene expression. Novel methods are required for normalization of small focused microarrays, and for incorporating measures of performance and quality. We demonstrate that dilution of oligonucleotides on the microarray itself provides an innovative approach allowing the full dynamic range of the scanner to be covered with a single gene spike-in. The dilution series can be used in a composite normalization to improve detection of differential gene expression and to provide quality control measures
    corecore