6,327 research outputs found

    maigesPack: A Computational Environment for Microarray Data Analysis

    Full text link
    Microarray technology is still an important way to assess gene expression in molecular biology, mainly because it measures expression profiles for thousands of genes simultaneously, what makes this technology a good option for some studies focused on systems biology. One of its main problem is complexity of experimental procedure, presenting several sources of variability, hindering statistical modeling. So far, there is no standard protocol for generation and evaluation of microarray data. To mitigate the analysis process this paper presents an R package, named maigesPack, that helps with data organization. Besides that, it makes data analysis process more robust, reliable and reproducible. Also, maigesPack aggregates several data analysis procedures reported in literature, for instance: cluster analysis, differential expression, supervised classifiers, relevance networks and functional classification of gene groups or gene networks

    Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis

    Get PDF
    Multiple testing analysis, based on clustering methodologies, is usually applied in Microarray Data Analysis for comparisons between pair of groups. In this paper, we generalize this methodology to deal with multiple comparisons among more than two groups obtained from microarray expressions of genes. Assuming normal data, we define a statistic which depends on sample means and sample variances, distributed as a non-central t-distribution. As we consider multiple comparisons among groups, a mixture of non-central t-distributions is derived. The estimation of the components of mixtures is obtained via a Bayesian approach, and the model is applied in a multiple comparison problem from a microarray experiment obtained from gorilla, bonobo and human cultured fibroblasts.Clustering, MCMC computation, Microarray analysis, Mixture distributions, Multiple hypothesis testing, Non-central t-distribution

    On clustering procedures and nonparametric mixture estimation

    Full text link
    This paper deals with nonparametric estimation of conditional den-sities in mixture models in the case when additional covariates are available. The proposed approach consists of performing a prelim-inary clustering algorithm on the additional covariates to guess the mixture component of each observation. Conditional densities of the mixture model are then estimated using kernel density estimates ap-plied separately to each cluster. We investigate the expected L 1 -error of the resulting estimates and derive optimal rates of convergence over classical nonparametric density classes provided the clustering method is accurate. Performances of clustering algorithms are measured by the maximal misclassification error. We obtain upper bounds of this quantity for a single linkage hierarchical clustering algorithm. Lastly, applications of the proposed method to mixture models involving elec-tricity distribution data and simulated data are presented

    Caracterización e interpretación automática de descripciones conceptuales en dominios poco estructurados usando variables numéricas

    Get PDF
    La investigación que se presenta en este proyecto, tiene como objetivo fundamental: establecer una metodología formal para la generación automática de descripciones conceptuales de clases construidas en dominios de naturaleza continua, reales y complejos, llamados Dominios poco Estructurados. Si bien, la metodología tiene como punto de partida el estudio del boxplot múltiple, la formalización del procedimiento de interpretación visual pasa por determinar los valores de cada variable donde se producen cambios en la distribución y construir la tabla de frecuencias condicionadas a dichos intervalos. Ello da lugar a una representación difusa de los grados de pertenencia de los valores de la variable a las distintas clases; lo que constituye un cómodo soporte para caracterizar e interpretar automáticamente las descripciones conceptuales de las clases. La metodología aporta un sistema de caracterización de clases, desde un punto de vista semántico, en comparación con otros métodos de cluster, cuando se aplica sobre datos provenientes de un Dominio poco Estructurado; además, de una nueva aproximación para discretizar el espacio de atributos cuantitativos en términos de intervalos de longitud variable como base de la metodología, y contribuciones a la validación de una clasificación, en cuanto a su representación y calidad, en el sentido de que una clasificación es válida si probamos que las clases obtenidas tienen sentido o utilidad y a la generación automática de clases resultantes como base del proceso predicción y/o diagnóstico. La metodología representa una nueva forma para extraer conocimiento útil y comprensible por el usuario usando una combinación de herramientas estadísticas (boxplot múltiple, análisis de datos), inteligencia artificial (aprendizaje automático, sistemas basados en el conocimiento) y lógica difusa (modelos y razonamiento difusos). Como caso de estudio se ha aplicado a una base de datos de una depuradora de aguas residuales que se describe en el capítulo 4 usando atributos cuantitativos, los resultados que se han obtenidos son prometedores, constituyendo un primer paso para establecer una metodología formal en la obtención automática de interpretaciones conceptuales de clases, sobre la base de atributos cuantitativos para describir los objetos (días en este caso de estudio). Finalmente, nuestro trabajo cumple todas las fases del proceso KDD (Knowledge Discovery in Databases) descritas por Fayyad et al., enfatizando la fase de generación automática de interpretación, en nuestro caso, de las clases resultantes de una partición de referencia.Postprint (published version

    Class Discovery and Prediction of Tumor with Microarray Data

    Get PDF
    Current microarray technology is able take a single tissue sample to construct an Affymetrix oglionucleotide array containing (estimated) expression levels of thousands of different genes for that tissue. The objective is to develop a more systematic approach to cancer classification based on Affymetrix oglionucleotide microarrays. For this purpose, I studied published colon cancer microarray data. Colon cancer, with 655,000 deaths worldwide per year, has become the fourth most common form of cancer in the United States and the third leading cause of cancer - related death in the Western world. This research has been focuses in two areas: class discovery, which means using a variety of clustering algorithms to discover clusters among samples and genes; and class prediction that refers to the process of developing a multi-gene predictor of class label for a sample using its gene expression profile. The accuracy of a predictor is also assessed by using it to predict the class of already known samples
    corecore