4 research outputs found

    HAMSTER: visualizing microarray experiments as a set of minimum spanning trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Visualization tools allow researchers to obtain a global view of the interrelationships between the probes or experiments of a gene expression (<it>e.g. microarray</it>) data set. Some existing methods include hierarchical clustering and k-means. In recent years, others have proposed applying minimum spanning trees (MST) for microarray clustering. Although MST-based clustering is formally equivalent to the dendrograms produced by hierarchical clustering under certain conditions; visually they can be quite different.</p> <p>Methods</p> <p>HAMSTER (Helpful Abstraction using Minimum Spanning Trees for Expression Relations) is an open source system for generating a <b>set </b>of MSTs from the experiments of a microarray data set. While previous works have generated a single MST from a data set for data clustering, we recursively merge experiments and repeat this process to obtain a set of MSTs for data visualization. Depending on the parameters chosen, each tree is analogous to a snapshot of one step of the hierarchical clustering process. We scored and ranked these trees using one of three proposed schemes. HAMSTER is implemented in C++ and makes use of Graphviz for laying out each MST.</p> <p>Results</p> <p>We report on the running time of HAMSTER and demonstrate using data sets from the NCBI Gene Expression Omnibus (GEO) that the images created by HAMSTER offer insights that differ from the dendrograms of hierarchical clustering. In addition to the C++ program which is available as open source, we also provided a web-based version (HAMSTER<sup>+</sup>) which allows users to apply our system through a web browser without any computer programming knowledge.</p> <p>Conclusion</p> <p>Researchers may find it helpful to include HAMSTER in their microarray analysis workflow as it can offer insights that differ from hierarchical clustering. We believe that HAMSTER would be useful for certain types of gradient data sets (e.g time-series data) and data that indicate relationships between cells/tissues. Both the source and the web server variant of HAMSTER are available from <url>http://hamster.cbrc.jp/</url>.</p

    Unsupervised host behavior classification from connection patterns

    Get PDF
    International audienceA novel host behavior classification approach is proposed as a preliminary step toward traffic classification and anomaly detection in network communication. Though many attempts described in the literature were devoted to flow or application classifications, these approaches are not always adaptable to operational constraints of traffic monitoring (expected to work even without packet payload, without bidirectionality, on highspeed networks or from flow reports only...). Instead, the classification proposed here relies on the leading idea that traffic is relevantly analyzed in terms of host typical behaviors: typical connection patterns of both legitimate applications (data sharing, downloading,...) and anomalous (eventually aggressive) behaviors are obtained by profiling traffic at the host level using unsupervised statistical classification. Classification at the host level is not reducible to flow or application classification, and neither is the contrary: they are different operations which might have complementary roles in network management. The proposed host classification is based on a nine-dimensional feature space evaluating host Internet connectivity, dispersion and exchanged traffic content. A Minimum Spanning Tree (MST) clustering technique is developed that does not require any supervised learning step to produce a set of statistically established typical host behaviors. Not relying on a priori defined classes of known behaviors enables the procedure to discover new host behaviors, that potentially were never observed before. This procedure is applied to traffic collected over the entire year 2008 on a transpacific (Japan/USA) link. A cross-validation of this unsupervised classification against a classical port-based inspection and a state-of-the-art method provides assessment of the meaningfulness and the relevance of the obtained classes for host behaviors

    Transcription factor expression dynamics of early T-lymphocyte specification and commitment

    Get PDF
    Mammalian T lymphocytes are a prototype for development from adult pluripotent stem cells. While T-cell specification is driven by Notch signaling, T-lineage commitment is only finalized after prolonged Notch activation. However, no T-lineage specific regulatory factor has been reported that mediates commitment. We used a gene-discovery approach to identify additional candidate T-lineage transcription factors and characterized expression of > 100 regulatory genes in early T-cell precursors using realtime RT-PCR. These regulatory genes were also monitored in multilineage precursors as they entered T-cell or non-T-cell pathways in vitro; in non-T cells ex vivo; and in later T-cell developmental stages after lineage commitment. At least three major expression patterns were observed. Transcription factors in the largest group are expressed at relatively stable levels throughout T-lineage specification as a legacy from prethymic precursors, with some continuing while others are downregulated after commitment. Another group is highly expressed in the earliest stages only, and is downregulated before or during commitment. Genes in a third group undergo upregulation at one of three distinct transitions, suggesting a positive regulatory cascade. However, the transcription factors induced during commitment are not T-lineage specific. Different members of the same transcription factor family can follow opposite trajectories during specification and commitment, while factors co-expressed early can be expressed in divergent patterns in later T-cell development. Some factors reveal new regulatory distinctions between αβ and γδ T-lineage differentiation. These results show that T-cell identity has an essentially complex regulatory basis and provide a detailed framework for regulatory network modeling of T-cell specification
    corecore