11 research outputs found
Pareto-Optimal Methods for Gene Ranking
The massive scale and variability of microarray gene data creates new and challenging problems of signal extraction, gene clustering, and data mining, especially for temporal gene profiles. Many data mining methods for finding interesting gene expression patterns are based on thresholding single discriminants, e.g. the ratio of between-class to within-class variation or correlation to a template. Here a different approach is introduced for extracting information from gene microarrays. The approach is based on multiple objective optimization and we call it Pareto front analysis (PFA). This method establishes a ranking of genes according to estimated probabilities that each gene is Pareto-optimal, i.e., that it lies on the Pareto front of the multiple objective scattergram. Both a model-driven Bayesian Pareto method and a data-driven non-parametric Pareto method, based on rank-order statistics, are presented. The methods are illustrated for two gene microarray experiments.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/41339/1/11265_2005_Article_5273219.pd
Multi-criteria Anomaly Detection using Pareto Depth Analysis
We consider the problem of identifying patterns in a data set that exhibit
anomalous behavior, often referred to as anomaly detection. In most anomaly
detection algorithms, the dissimilarity between data samples is calculated by a
single criterion, such as Euclidean distance. However, in many cases there may
not exist a single dissimilarity measure that captures all possible anomalous
patterns. In such a case, multiple criteria can be defined, and one can test
for anomalies by scalarizing the multiple criteria using a linear combination
of them. If the importance of the different criteria are not known in advance,
the algorithm may need to be executed multiple times with different choices of
weights in the linear combination. In this paper, we introduce a novel
non-parametric multi-criteria anomaly detection method using Pareto depth
analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies
under multiple criteria without having to run an algorithm multiple times with
different choices of weights. The proposed PDA approach scales linearly in the
number of criteria and is provably better than linear combinations of the
criteria.Comment: Removed an unnecessary line from Algorithm
Temporal Dynamics of Host Molecular Responses Differentiate Symptomatic and Asymptomatic Influenza A Infection
Exposure to influenza viruses is necessary, but not sufficient, for healthy human hosts to develop symptomatic illness. The host response is an important determinant of disease progression. In order to delineate host molecular responses that differentiate symptomatic and asymptomatic Influenza A infection, we inoculated 17 healthy adults with live influenza (H3N2/Wisconsin) and examined changes in host peripheral blood gene expression at 16 timepoints over 132 hours. Here we present distinct transcriptional dynamics of host responses unique to asymptomatic and symptomatic infections. We show that symptomatic hosts invoke, simultaneously, multiple pattern recognition receptors-mediated antiviral and inflammatory responses that may relate to virus-induced oxidative stress. In contrast, asymptomatic subjects tightly regulate these responses and exhibit elevated expression of genes that function in antioxidant responses and cell-mediated responses. We reveal an ab initio molecular signature that strongly correlates to symptomatic clinical disease and biomarkers whose expression patterns best discriminate early from late phases of infection. Our results establish a temporal pattern of host molecular responses that differentiates symptomatic from asymptomatic infections and reveals an asymptomatic host-unique non-passive response signature, suggesting novel putative molecular targets for both prognostic assessment and ameliorative therapeutic intervention in seasonal and pandemic influenza
Expresión de alcohol deshidrogenasa 1 β y su efecto sobre la viabilidad de la línea celular de cáncer pulmonar a549.
El cáncer pulmonar es el tipo de cáncer con la más alta tasa de mortalidad a nivel mundial y los tratamientos actuales contra esta patología aún son subóptimos. Aumentar el conocimiento sobre las alteraciones moleculares más significativas que se encuentran en este tipo de cáncer ha generado mejores terapias. El gen ADH1 (Alcohol deshidrogenasa 1) ha sido identificado como disminuido a nivel de RNAm en tejido de carcinoma pulmonar y otros tipos de cáncer, sin embargo no se ha confirmado esto a nivel proteína y el papel de la enzima ADH1 en la fisiopatología del cáncer pulmonar se desconoce. Por lo que para confirmar su subexpresión se evaluó el nivel de expresión de la proteína ADH1 mediante Western blot en la línea celular de carcinoma pulmonar A549 y se comparó con la expresión de ADH1 de la línea celular normal pulmonar MRC5. Una vez que se confirmó la baja expresión a nivel proteína de ADH1 en la línea celular A549, se prosiguió a evaluar si el aumento de su expresión podría impactar en el metabolismo, viabilidad, y producción de especies reactivas de oxígeno (ROS) de las células A549 con una mayor expresión de ADH1, debido a su transfección con el vector pCMV6-ADH1. Para ello se evaluó su capacidad metabólica utilizando un ensayo de MTT; posteriormente se evaluó la producción de ROS mediante el ensayo de DCFDA, y finalmente con el ensayo de Anexina V se evaluó la exposición de fosfatidil serina. Los resultados obtenidos indican que la línea A549 tiene una sub expresión a nivel proteína de la enzima ADH1 en comparación a la línea MRC5; las células A549 transfectadas con el vector pCMV6-ADH1 que expresan ADH1 exógenamente exhiben una capacidad metabólica disminuida; un nivel aumentado de ROS y de células con exposición de fosfatidil serina. Estos resultados sugieren que la expresión exógena de ADH1 en la línea celular A549 induce mecanismos celulares, aún desconocidos, que llevan a una disminución en la viabilidad relativa de las células y que además propician la exposición de la fosfatidil serina en la membrana externa de las células al mismo tiempo que inducen una mayor producción de ROS. Este estudio es el primero en demostrar la subexpresión de ADH1 en la línea celular de cáncer pulmonar A549 y en demostrar que su aumento tiene un papel en la biología celular de cáncer pulmonar
Combining Disparate Information for Machine Learning.
This thesis considers information fusion for four different types of machine learning problems: anomaly detection, information retrieval, collaborative filtering and structure learning for time series, and focuses on a common theme -- the benefit to combining disparate information resulting in improved algorithm performance.
In this dissertation, several new algorithms and applications to real-world datasets are presented. In Chapter II, a novel approach called Pareto Depth Analysis (PDA) is proposed for combining different dissimilarity metrics for anomaly detection. PDA is applied to video-based anomaly detection of pedestrian trajectories. Following a similar idea, in Chapter III we propose to use a similar Pareto Front method for a multiple-query information retrieval problem when different queries represent different semantic concepts. Pareto Front information retrieval is applied to multiple query image retrieval. In Chapter IV, we extend a recently proposed collaborative retrieval approach to incorporate complementary social network information, an approach we call Social Collaborative Retrieval (SCR). SCR is applied to a music recommendation system that combines both user history and friendship network information to improve recall and weighted recall performance. In Chapter V, we propose a framework that combines time series data at different time scales and offsets for more accurate estimation of multiple precision matrices. We propose a general fused graphical lasso approach to jointly estimate these precision matrices. The framework is applied to modeling financial time series data.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108878/1/coolmark_1.pd