11 research outputs found

    Pareto-Optimal Methods for Gene Ranking

    Full text link
    The massive scale and variability of microarray gene data creates new and challenging problems of signal extraction, gene clustering, and data mining, especially for temporal gene profiles. Many data mining methods for finding interesting gene expression patterns are based on thresholding single discriminants, e.g. the ratio of between-class to within-class variation or correlation to a template. Here a different approach is introduced for extracting information from gene microarrays. The approach is based on multiple objective optimization and we call it Pareto front analysis (PFA). This method establishes a ranking of genes according to estimated probabilities that each gene is Pareto-optimal, i.e., that it lies on the Pareto front of the multiple objective scattergram. Both a model-driven Bayesian Pareto method and a data-driven non-parametric Pareto method, based on rank-order statistics, are presented. The methods are illustrated for two gene microarray experiments.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/41339/1/11265_2005_Article_5273219.pd

    Multi-criteria Anomaly Detection using Pareto Depth Analysis

    Full text link
    We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. In most anomaly detection algorithms, the dissimilarity between data samples is calculated by a single criterion, such as Euclidean distance. However, in many cases there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such a case, multiple criteria can be defined, and one can test for anomalies by scalarizing the multiple criteria using a linear combination of them. If the importance of the different criteria are not known in advance, the algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we introduce a novel non-parametric multi-criteria anomaly detection method using Pareto depth analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach scales linearly in the number of criteria and is provably better than linear combinations of the criteria.Comment: Removed an unnecessary line from Algorithm

    Temporal Dynamics of Host Molecular Responses Differentiate Symptomatic and Asymptomatic Influenza A Infection

    Get PDF
    Exposure to influenza viruses is necessary, but not sufficient, for healthy human hosts to develop symptomatic illness. The host response is an important determinant of disease progression. In order to delineate host molecular responses that differentiate symptomatic and asymptomatic Influenza A infection, we inoculated 17 healthy adults with live influenza (H3N2/Wisconsin) and examined changes in host peripheral blood gene expression at 16 timepoints over 132 hours. Here we present distinct transcriptional dynamics of host responses unique to asymptomatic and symptomatic infections. We show that symptomatic hosts invoke, simultaneously, multiple pattern recognition receptors-mediated antiviral and inflammatory responses that may relate to virus-induced oxidative stress. In contrast, asymptomatic subjects tightly regulate these responses and exhibit elevated expression of genes that function in antioxidant responses and cell-mediated responses. We reveal an ab initio molecular signature that strongly correlates to symptomatic clinical disease and biomarkers whose expression patterns best discriminate early from late phases of infection. Our results establish a temporal pattern of host molecular responses that differentiates symptomatic from asymptomatic infections and reveals an asymptomatic host-unique non-passive response signature, suggesting novel putative molecular targets for both prognostic assessment and ameliorative therapeutic intervention in seasonal and pandemic influenza

    Expresión de alcohol deshidrogenasa 1 β y su efecto sobre la viabilidad de la línea celular de cáncer pulmonar a549.

    Get PDF
    El cáncer pulmonar es el tipo de cáncer con la más alta tasa de mortalidad a nivel mundial y los tratamientos actuales contra esta patología aún son subóptimos. Aumentar el conocimiento sobre las alteraciones moleculares más significativas que se encuentran en este tipo de cáncer ha generado mejores terapias. El gen ADH1 (Alcohol deshidrogenasa 1) ha sido identificado como disminuido a nivel de RNAm en tejido de carcinoma pulmonar y otros tipos de cáncer, sin embargo no se ha confirmado esto a nivel proteína y el papel de la enzima ADH1 en la fisiopatología del cáncer pulmonar se desconoce. Por lo que para confirmar su subexpresión se evaluó el nivel de expresión de la proteína ADH1 mediante Western blot en la línea celular de carcinoma pulmonar A549 y se comparó con la expresión de ADH1 de la línea celular normal pulmonar MRC5. Una vez que se confirmó la baja expresión a nivel proteína de ADH1 en la línea celular A549, se prosiguió a evaluar si el aumento de su expresión podría impactar en el metabolismo, viabilidad, y producción de especies reactivas de oxígeno (ROS) de las células A549 con una mayor expresión de ADH1, debido a su transfección con el vector pCMV6-ADH1. Para ello se evaluó su capacidad metabólica utilizando un ensayo de MTT; posteriormente se evaluó la producción de ROS mediante el ensayo de DCFDA, y finalmente con el ensayo de Anexina V se evaluó la exposición de fosfatidil serina. Los resultados obtenidos indican que la línea A549 tiene una sub expresión a nivel proteína de la enzima ADH1 en comparación a la línea MRC5; las células A549 transfectadas con el vector pCMV6-ADH1 que expresan ADH1 exógenamente exhiben una capacidad metabólica disminuida; un nivel aumentado de ROS y de células con exposición de fosfatidil serina. Estos resultados sugieren que la expresión exógena de ADH1 en la línea celular A549 induce mecanismos celulares, aún desconocidos, que llevan a una disminución en la viabilidad relativa de las células y que además propician la exposición de la fosfatidil serina en la membrana externa de las células al mismo tiempo que inducen una mayor producción de ROS. Este estudio es el primero en demostrar la subexpresión de ADH1 en la línea celular de cáncer pulmonar A549 y en demostrar que su aumento tiene un papel en la biología celular de cáncer pulmonar

    Combining Disparate Information for Machine Learning.

    Full text link
    This thesis considers information fusion for four different types of machine learning problems: anomaly detection, information retrieval, collaborative filtering and structure learning for time series, and focuses on a common theme -- the benefit to combining disparate information resulting in improved algorithm performance. In this dissertation, several new algorithms and applications to real-world datasets are presented. In Chapter II, a novel approach called Pareto Depth Analysis (PDA) is proposed for combining different dissimilarity metrics for anomaly detection. PDA is applied to video-based anomaly detection of pedestrian trajectories. Following a similar idea, in Chapter III we propose to use a similar Pareto Front method for a multiple-query information retrieval problem when different queries represent different semantic concepts. Pareto Front information retrieval is applied to multiple query image retrieval. In Chapter IV, we extend a recently proposed collaborative retrieval approach to incorporate complementary social network information, an approach we call Social Collaborative Retrieval (SCR). SCR is applied to a music recommendation system that combines both user history and friendship network information to improve recall and weighted recall performance. In Chapter V, we propose a framework that combines time series data at different time scales and offsets for more accurate estimation of multiple precision matrices. We propose a general fused graphical lasso approach to jointly estimate these precision matrices. The framework is applied to modeling financial time series data.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108878/1/coolmark_1.pd
    corecore