6,327 research outputs found
maigesPack: A Computational Environment for Microarray Data Analysis
Microarray technology is still an important way to assess gene expression in
molecular biology, mainly because it measures expression profiles for thousands
of genes simultaneously, what makes this technology a good option for some
studies focused on systems biology. One of its main problem is complexity of
experimental procedure, presenting several sources of variability, hindering
statistical modeling. So far, there is no standard protocol for generation and
evaluation of microarray data. To mitigate the analysis process this paper
presents an R package, named maigesPack, that helps with data organization.
Besides that, it makes data analysis process more robust, reliable and
reproducible. Also, maigesPack aggregates several data analysis procedures
reported in literature, for instance: cluster analysis, differential
expression, supervised classifiers, relevance networks and functional
classification of gene groups or gene networks
Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis
Multiple testing analysis, based on clustering methodologies, is usually applied in Microarray Data Analysis for comparisons between pair of groups. In this paper, we generalize this methodology to deal with multiple comparisons among more than two groups obtained from microarray expressions of genes. Assuming normal data, we define a statistic which depends on sample means and sample variances, distributed as a non-central t-distribution. As we consider multiple comparisons among groups, a mixture of non-central t-distributions is derived. The estimation of the components of mixtures is obtained via a Bayesian approach, and the model is applied in a multiple comparison problem from a microarray experiment obtained from gorilla, bonobo and human cultured fibroblasts.Clustering, MCMC computation, Microarray analysis, Mixture distributions, Multiple hypothesis testing, Non-central t-distribution
On clustering procedures and nonparametric mixture estimation
This paper deals with nonparametric estimation of conditional den-sities in
mixture models in the case when additional covariates are available. The
proposed approach consists of performing a prelim-inary clustering algorithm on
the additional covariates to guess the mixture component of each observation.
Conditional densities of the mixture model are then estimated using kernel
density estimates ap-plied separately to each cluster. We investigate the
expected L 1 -error of the resulting estimates and derive optimal rates of
convergence over classical nonparametric density classes provided the
clustering method is accurate. Performances of clustering algorithms are
measured by the maximal misclassification error. We obtain upper bounds of this
quantity for a single linkage hierarchical clustering algorithm. Lastly,
applications of the proposed method to mixture models involving elec-tricity
distribution data and simulated data are presented
Improving energy modeling of large building stock through the development of archetype buildings
12th Conference of International Building Performance Simulation Associatio
Caracterización e interpretación automática de descripciones conceptuales en dominios poco estructurados usando variables numéricas
La investigación que se presenta en este proyecto, tiene como
objetivo fundamental: establecer una metodología formal para
la generación automática de descripciones conceptuales de
clases construidas en dominios de naturaleza continua, reales y
complejos, llamados Dominios poco Estructurados.
Si bien, la metodología tiene como punto de partida el
estudio del boxplot múltiple, la formalización del procedimiento de
interpretación visual pasa por determinar los valores de
cada variable donde se producen cambios en la distribución y
construir la tabla de frecuencias condicionadas a dichos
intervalos. Ello da lugar a una representación difusa de los
grados de pertenencia de los valores de la variable a las
distintas clases; lo que constituye un cómodo soporte para
caracterizar e interpretar automáticamente las descripciones
conceptuales de las clases.
La metodología aporta un sistema de caracterización de
clases, desde un punto de vista semántico, en comparación con
otros métodos de cluster, cuando se aplica sobre datos
provenientes de un Dominio poco Estructurado; además, de
una nueva aproximación para discretizar el espacio de atributos
cuantitativos en términos de intervalos de longitud variable
como base de la metodología, y contribuciones a la
validación de una clasificación, en cuanto a su
representación y calidad, en el sentido de que una
clasificación es válida si probamos que las clases obtenidas
tienen sentido o utilidad y a la generación
automática de clases resultantes como base del proceso
predicción y/o diagnóstico.
La metodología representa una nueva forma para extraer
conocimiento útil y comprensible por el usuario usando una
combinación de herramientas estadísticas (boxplot múltiple,
análisis de datos), inteligencia artificial (aprendizaje
automático, sistemas basados en el conocimiento) y lógica
difusa (modelos y razonamiento difusos). Como caso de estudio se
ha aplicado a una base de datos de una depuradora de aguas
residuales que se describe en el capítulo 4 usando atributos
cuantitativos,
los resultados que se han obtenidos son prometedores, constituyendo un
primer paso para establecer una metodología formal en la obtención
automática de interpretaciones conceptuales de clases, sobre la
base de atributos cuantitativos para describir los objetos
(días en este caso de estudio).
Finalmente, nuestro trabajo cumple todas las fases del proceso
KDD (Knowledge Discovery in Databases)
descritas por Fayyad et al., enfatizando la fase de generación
automática
de interpretación, en nuestro caso, de las clases resultantes
de una partición de referencia.Postprint (published version
Class Discovery and Prediction of Tumor with Microarray Data
Current microarray technology is able take a single tissue sample to construct an Affymetrix oglionucleotide array containing (estimated) expression levels of thousands of different genes for that tissue. The objective is to develop a more systematic approach to cancer classification based on Affymetrix oglionucleotide microarrays. For this purpose, I studied published colon cancer microarray data. Colon cancer, with 655,000 deaths worldwide per year, has become the fourth most common form of cancer in the United States and the third leading cause of cancer - related death in the Western world. This research has been focuses in two areas: class discovery, which means using a variety of clustering algorithms to discover clusters among samples and genes; and class prediction that refers to the process of developing a multi-gene predictor of class label for a sample using its gene expression profile. The accuracy of a predictor is also assessed by using it to predict the class of already known samples
- …