Search CORE

2,703 research outputs found

A Grid-based solution for management and analysis of microarrays in distributed experiments

Author: Corradi Luca
Fato Marco
Papadimitropoulos Adam
Porro Ivan
Scaglione Silvia
Schenone Andrea
Torterolo Livia
Viti Federica
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Several systems have been presented in the last years in order to manage the complexity of large microarray experiments. Although good results have been achieved, most systems tend to lack in one or more fields. A Grid based approach may provide a shared, standardized and reliable solution for storage and analysis of biological data, in order to maximize the results of experimental efforts. A Grid framework has been therefore adopted due to the necessity of remotely accessing large amounts of distributed data as well as to scale computational performances for terabyte datasets. Two different biological studies have been planned in order to highlight the benefits that can emerge from our Grid based platform. The described environment relies on storage services and computational services provided by the gLite Grid middleware. The Grid environment is also able to exploit the added value of metadata in order to let users better classify and search experiments. A state-of-art Grid portal has been implemented in order to hide the complexity of framework from end users and to make them able to easily access available services and data. The functional architecture of the portal is described. As a first test of the system performances, a gene expression analysis has been performed on a dataset of Affymetrix GeneChip® Rat Expression Array RAE230A, from the ArrayExpress database. The sequence of analysis includes three steps: (i) group opening and image set uploading, (ii) normalization, and (iii) model based gene expression (based on PM/MM difference model). Two different Linux versions (sequential and parallel) of the dChip software have been developed to implement the analysis and have been tested on a cluster. From results, it emerges that the parallelization of the analysis process and the execution of parallel jobs on distributed computational resources actually improve the performances. Moreover, the Grid environment have been tested both against the possibility of uploading and accessing distributed datasets through the Grid middleware and against its ability in managing the execution of jobs on distributed computational resources. Results from the Grid test will be discussed in a further paper

Crossref

PubMed Central

Archivio istituzionale della ricerca - Università di Genova

GPU cards as a low cost solution for efficient and fast classification of high dimensional gene expression datasets

Author: Benso Alfredo
Di Carlo Stefano
Politano Gianfranco Michele Maria
Savino Alessandro
Scionti A.
Publication venue: SRAIT
Publication date: 01/01/2010
Field of study

The days when bioinformatics tools will be so reliable to become a standard aid in routine clinical diagnostics are getting very close. However, it is important to remember that the more complex and advanced bioinformatics tools become, the more performances are required by the computing platforms. Unfortunately, the cost of High Performance Computing (HPC) platforms is still prohibitive for both public and private medical practices. Therefore, to promote and facilitate the use of bioinformatics tools it is important to identify low-cost parallel computing solutions. This paper presents a successful experience in using the parallel processing capabilities of Graphical Processing Units (GPU) to speed up classification of gene expression profiles. Results show that using open source CUDA programming libraries allows to obtain a significant increase in performances and therefore to shorten the gap between advanced bioinformatics tools and real medical practic

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Survival Online: a web-based service for the analysis of correlations between gene expression and clinical and follow-up data

Author: Corradi Luca
Fato Marco
Mirisola Valentina
Pfeffer Ulrich
Porro Ivan
Romano Paolo
Torterolo Livia
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Complex microarray gene expression datasets can be used for many independent analyses and are particularly interesting for the validation of potential biomarkers and multi-gene classifiers. This article presents a novel method to perform correlations between microarray gene expression data and clinico-pathological data through a combination of available and newly developed processing tools. Results We developed Survival Online (available at <url>http://ada.dist.unige.it:8080/enginframe/bioinf/bioinf.xml</url>), a Web-based system that allows for the analysis of Affymetrix GeneChip microarrays by using a parallel version of dChip. The user is first enabled to select pre-loaded datasets or single samples thereof, as well as single genes or lists of genes. Expression values of selected genes are then correlated with sample annotation data by uni- or multi-variate Cox regression and survival analyses. The system was tested using publicly available breast cancer datasets and GO (Gene Ontology) derived gene lists or single genes for survival analyses. Conclusion The system can be used by bio-medical researchers without specific computation skills to validate potential biomarkers or multi-gene classifiers. The design of the service, the parallelization of pre-processing tasks and the implementation on an HPC (High Performance Computing) environment make this system a useful tool for validation on several independent datasets.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Genova

A Web-based and Grid-enabled dChip version for the analysis of large sets of gene expression data

Author: C Li
C Li
C Li
F Beltrame
FF Millenaar
I Porro
Ivan Porro
Livia Torterolo
Luca Corradi
LX Qin
Marco Fato
RA Irizarry
RC Gentleman
S Tuecke
Silvia Scaglione
U Pfeffer
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Microarray techniques are one of the main methods used to investigate thousands of gene expression profiles for enlightening complex biological processes responsible for serious diseases, with a great scientific impact and a wide application area. Several standalone applications had been developed in order to analyze microarray data. Two of the most known free analysis software packages are the R-based Bioconductor and dChip. The part of dChip software concerning the calculation and the analysis of gene expression has been modified to permit its execution on both cluster environments (supercomputers) and Grid infrastructures (distributed computing). This work is not aimed at replacing existing tools, but it provides researchers with a method to analyze large datasets without any hardware or software constraints. Results An application able to perform the computation and the analysis of gene expression on large datasets has been developed using algorithms provided by dChip. Different tests have been carried out in order to validate the results and to compare the performances obtained on different infrastructures. Validation tests have been performed using a small dataset related to the comparison of HUVEC (Human Umbilical Vein Endothelial Cells) and Fibroblasts, derived from same donors, treated with IFN-α. Moreover performance tests have been executed just to compare performances on different environments using a large dataset including about 1000 samples related to Breast Cancer patients. Conclusion A Grid-enabled software application for the analysis of large Microarray datasets has been proposed. DChip software has been ported on Linux platform and modified, using appropriate parallelization strategies, to permit its execution on both cluster environments and Grid infrastructures. The added value provided by the use of Grid technologies is the possibility to exploit both computational and data Grid infrastructures to analyze large datasets of distributed data. The software has been validated and performances on cluster and Grid environments have been compared obtaining quite good scalability results.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Genova

Identification of Relevant Genes with a Multi-Agent System using Gene Expression Data

Author: Ana Espinosa
Christian Lemaitre
Edna Márquez
Jaime Berumen
Jesús Savage
Ron Leder
Publication venue: 'IntechOpen'
Publication date: 01/04/2011
Field of study

IntechOpen

Supporting Regularized Logistic Regression Privately and Efficiently

Author: Li Wenfa
Liu Hongzhe
Xie Wei
Yang Peng
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 30/09/2015
Field of study

As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Increasing concerns over data privacy make it more and more difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used machine learning model in various disciplines while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluation on several studies validated the privacy guarantees, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Novel design and controls for focused DNA microarrays: applications in quality assurance/control and normalization for the Health Canada ToxArray™

Author: Berndt Lynn M
Boucher Sherri
Dong Hongyan
Douglas George R
Lambert Iain B
Parfett Craig L
Rowan-Carroll Andrea
Williams Andrew
Yauk Carole L
Zheng Jenny L
Zhou Gu
Publication venue: BioMed Central
Publication date: 01/10/2006
Field of study

BACKGROUND: Microarray normalizations typically apply methods that assume absence of global transcript shifts, or absence of changes in internal control features such as housekeeping genes. These normalization approaches are not appropriate for focused arrays with small sets of genes where a large portion may be expected to change. Furthermore, many microarrays lack control features that can be used for quality assurance (QA). Here, we describe a novel external control series integrated with a design feature that addresses the above issues. RESULTS: An EC dilution series that involves spike-in of a single concentration of the A. thaliana chlorophyll synthase gene to hybridize against spotted dilutions (0.000015 to 100 μM) of a single complimentary oligonucleotide representing the gene was developed. The EC series is printed in duplicate within each subgrid of the microarray and covers the full range of signal intensities from background to saturation. The design and placement of the series allows for QA examination of frequently encountered problems in hybridization (e.g., uneven hybridizations) and printing (e.g., cross-spot contamination). Additionally, we demonstrate that the series can be integrated with a LOWESS normalization to improve the detection of differential gene expression (improved sensitivity and predictivity) over LOWESS normalization on its own. CONCLUSION: The quality of microarray experiments and the normalization methods used affect the ability to measure accurate changes in gene expression. Novel methods are required for normalization of small focused microarrays, and for incorporating measures of performance and quality. We demonstrate that dilution of oligonucleotides on the microarray itself provides an innovative approach allowing the full dynamic range of the scanner to be covered with a single gene spike-in. The dilution series can be used in a composite normalization to improve detection of differential gene expression and to provide quality control measures

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central