6 research outputs found

    Differential modeling for cancer microarray data

    Get PDF
    Capturing the changes between two biological phenotypes is a crucial task in understanding the mechanisms of various diseases. Most of the existing computational approaches depend on testing the changes in the expression levels of each single gene individually. In this work, we proposed novel computational approaches to identify the differential genes between two phenotypes. These approaches aim to quantitatively characterize the differences between two phenotypes and can provide better insights and understanding of various diseases. The purpose of this thesis is three-fold. Firstly, we review the state-of-the-art approaches for differential analysis of gene expression data. Secondly, we propose a novel differential network analysis approach that is composed of two algorithms, namely, DiffRank and DiffSubNet, to identify differential hubs and differential subnetworks, respectively. In this approach, two datasets are represented as two networks , and then the problem of identifying differential genes is transformed to the problem of comparing two networks to identify the most differential network omponents. Studying such networks can provide valuable knowledge about the data. The DiffRank algorithm ranks the nodes of two networks based on their differential behavior using two novel differential measures: differential connectivity and differential betweenness centrality for each node. These measures are propagated through the network and are optimized to capture the local and global structural changes between two networks. Then, we integrated the results of this algorithm into the proposed differential subnetwork algorithm which is called DiffSubNet. This algorithm aims to identify sets of differentially connected nodes. We demonstrated the effectiveness of these algorithms on synthetic datasets and real-world applications and showed that these algorithms identified meaningful and valuable information compared to some of the baseline methods that can be used for such a task. Thirdly, we propose a novel differential co-clustering approach to efficiently find arbitrarily positioned difeferntial (or discriminative) co-clusters from large datasets. The goal of this approach is to discover a distinguishing set of gene patterns that are highly correlated in a subset of the samples (subspace co-expressions) in one phenotype but not in the other. This approach is useful when the biological samples are assumed to be heterogenous or have multiple subtypes. To achieve this goal, we propose a novel co-clustering algorithm, Ranking-based Arbitrarily Positioned Overlapping Co-Clustering (RAPOCC), to efficiently extract significant co-clusters. This algorithm optimizes a novel ranking-based objective function to find arbitrarily positioned co-clusters, and it can extract large and overlapping co-clusters containing both positively and negatively correlated genes. Then, we extend this algorithm to discover discriminative co-clusters by incorporating the class information into the co-cluster search process. The novel discriminative co-clustering algorithm is called Discriminative RAPOCC (Di-RAPOCC), to efficiently extract the discriminative co-clusters from labeled datasets. We also characterize the discriminative co-clusters and propose three novel measures that can be used to evaluate the performance of any discriminative subspace algorithm. We evaluated the proposed algorithms on several synthetic and real gene expression datasets, and our experimental results showed that the proposed algorithms outperformed several existing algorithms available in the literature. The shift from single gene analysis to the differential gene network analysis and differential co-clustering can play a crucial role in future analysis of gene expression and can help in understanding the mechanism of various diseases

    Analyse et fouille de données de trajectoires d'objets mobiles

    Get PDF
    In this thesis, we explore two problems related to managing and mining moving object trajectories. First, we study the problem of sampling trajectory data streams. Storing the entirety of the trajectories provided by modern location-aware devices can entail severe storage and processing overheads. Therefore, adapted sampling techniques are necessary in order to discard unneeded positions and reduce the size of the trajectories while still preserving their key spatiotemporal features. In streaming environments, this process needs to be conducted "on-the-fly" since the data are transient and arrive continuously. To this end, we introduce a new sampling algorithm called spatiotemporal stream sampling (STSS). This algorithm is computationally-efficient and guarantees an upper bound for the approximation error introduced during the sampling process. Experimental results show that stss achieves good performances and can compete with more sophisticated and costly approaches. The second problem we study is clustering trajectory data in road network environments. We present three approaches to clustering such data: the first approach discovers clusters of trajectories that traveled along the same parts of the road network; the second approach is segment-oriented and aims to group together road segments based on trajectories that they have in common; the third approach combines both aspects and simultaneously clusters trajectories and road segments. We show how these approaches can be used to reveal useful knowledge about flow dynamics and characterize traffic in road networks. We also provide experimental results where we evaluate the performances of our propositions.Dans un premier temps, nous étudions l'échantillonnage de flux de trajectoires. Garder l'intégralité des trajectoires capturées par les terminaux de géo-localisation modernes peut s'avérer coûteux en espace de stockage et en temps de calcul. L'élaboration de techniques d'échantillonnage adaptées devient primordiale afin de réduire la taille des données en supprimant certaines positions tout en veillant à préserver le maximum des caractéristiques spatiotemporelles des trajectoires originales. Dans le contexte de flux de données, ces techniques doivent en plus être exécutées "à la volée" et s'adapter au caractère continu et éphémère des données. A cet effet, nous proposons l'algorithme STSS (spatiotemporal stream sampling) qui bénéficie d'une faible complexité temporelle et qui garantit une borne supérieure pour les erreurs d’échantillonnage. Nous montrons les performances de notre proposition en la comparant à d'autres approches existantes. Nous étudions également le problème de la classification non supervisée de trajectoires contraintes par un réseau routier. Nous proposons trois approches pour traiter ce cas. La première approche se focalise sur la découverte de groupes de trajectoires ayant parcouru les mêmes parties du réseau routier. La deuxième approche vise à grouper des segments routiers visités très fréquemment par les mêmes trajectoires. La troisième approche combine les deux aspects afin d'effectuer un co-clustering simultané des trajectoires et des segments. Nous démontrons comment ces approches peuvent servir à caractériser le trafic et les dynamiques de mouvement dans le réseau routier et réalisons des études expérimentales afin d'évaluer leurs performances