13 research outputs found

    Principal Graph and Structure Learning Based on Reversed Graph Embedding

    Full text link
    © 2017 IEEE. Many scientific datasets are of high dimension, and the analysis usually requires retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are mathematically formulated by curves, which is quite restrictive for real applications. A few methods can overcome the above problem, but they either require complicated human-made rules for a specific task with lack of adaption flexibility to different tasks, or cannot obtain explicit structures of data. To address these issues, we develop a novel principal graph and structure learning framework that captures the local information of the underlying graph structure based on reversed graph embedding. As showcases, models that can learn a spanning tree or a weighted undirected ℓ1 graph are proposed, and a new learning algorithm is developed that learns a set of principal points and a graph structure from data, simultaneously. The new algorithm is simple with guaranteed convergence. We then extend the proposed framework to deal with large-scale data. Experimental results on various synthetic and six real world datasets show that the proposed method compares favorably with baselines and can uncover the underlying structure correctly

    FLASC: A Flare-Sensitive Clustering Algorithm: Extending HDBSCAN* for Detecting Branches in Clusters

    Full text link
    We present FLASC, an algorithm for flare-sensitive clustering. Our algorithm builds upon HDBSCAN* -- which provides high-quality density-based clustering performance -- through a post-processing step that differentiates branches within the detected clusters' manifold, adding a type of pattern that can be discovered. Two variants of the algorithm are presented, which trade computational cost for noise robustness. We show that both variants scale similarly to HDBSCAN* in terms of computational cost and provide stable outputs using synthetic data sets, resulting in an efficient flare-sensitive clustering algorithm. In addition, we demonstrate the algorithm's benefit in data exploration over HDBSCAN* clustering on two real-world data sets.Comment: 20 pages, 11 figures, submitted to ACM TKD

    Spatiotemporal structure of cell fate decisions in murine neural crest

    Get PDF
    Neural crest cells are embryonic progenitors that generate numerous cell types in vertebrates. With single-cell analysis, we show that mouse trunk neural crest cells become biased toward neuronal lineages when they delaminate from the neural tube, whereas cranial neural crest cells acquire ectomesenchyme potential dependent on activation of the transcription factor Twist1. The choices that neural crest cells make to become sensory, glial, autonomic, or mesenchymal cells can be formalized as a series of sequential binary decisions. Each branch of the decision tree involves initial coactivation of bipotential properties followed by gradual shifts toward commitment. Competing fate programs are coactivated before cells acquire fate-specific phenotypic traits. Determination of a specific fate is achieved by increased synchronization of relevant programs and concurrent repression of competing fate programs

    Cyclical fate restriction:A new view of neural crest cell fate specification

    Get PDF
    Neural crest cells are crucial in development, not least because of their remarkable multipotency. Early findings stimulated two hypotheses for how fate specification and commitment from fully multipotent neural crest cells might occur, progressive fate restriction (PFR) and direct fate restriction, differing in whether partially restricted intermediates were involved. Initially hotly debated, they remain unreconciled, although PFR has become favoured. However, testing of a PFR hypothesis of zebrafish pigment cell development refutes this view. We propose a novel ‘cyclical fate restriction’ hypothesis, based upon a more dynamic view of transcriptional states, reconciling the experimental evidence underpinning the traditional hypotheses.</p

    Analysis of single-cell RNA-Seq reveals dynamic changes during macrophage state transition in atherosclerosis mouse model

    Get PDF
    Background: Atherosclerosis is an arterial inflammation that causes ischemic heart disease, which is the first leading cause of death worldwide. Macrophages play major roles during disease development by having pro-inflammatory and anti-inflammatory functions. Lack of effective treatment is mainly due to incomplete understanding of the molecular mechanisms underlying disease progression and regression. Materials and methods: The transcripts of the macrophages from two aortic samples from atherosclerotic region during disease progression and regression were analyzed using previously published dataset (GEO Accession GSE123587). Pre-processing, clustering of cells and identification of unique markers for each cluster were done using Seurat package implemented in R programming language. Monocle package was used to order the cells in pseudotime and to detect the key molecules that changed dramatically during comparison between distinct macrophages states (pro-inflammatory and anti-inflammatory). Ingenuity Pathway Analysis (IPA) software was used to analyze the pathways activity across macrophage states along the trajectory and to retrieve the transcriptional regulatory network between the genes determining the final states. Prediction of the miRNAs that might be involved in the disease progression was performed using TargetScan and GSEA (Gene Set Enrichment Analysis). Cytoscape application was used to visualize the regulatory network between the differentially regulated genes across macrophages states. Results: Clustering analysis of macrophages revealed their presence in distinct 11 states. In addition, Two states were found to be dominant in the progression group macrophages, and one state was found to be dominant in the regression group macrophages. Moreover, trajectory analysis showed a bifurcation point near the end of the trajectory, where macrophages fates were destined to be either pro-inflammatory or anti-inflammatory. Macrophages unique to the disease progression branch were found to activate STAT cascade, induce acute inflammatory response and upregulate inflammatory cytokines, denoting M1 polarization. In contrast, regression-branch specific macrophages were found to activate cholesterol efflux pathways and upregulate anti-inflammatory cytokines such as TSLP and CCL24. The transcription regulatory network between differentially regulated genes in both branches revealed changes in the transcriptional dynamics acquired during macrophage states transition. STAT1 (Signal transducer and activator of transcription 1) and IRF7 (Interferon Regulatory Factor 7) were found to be upregulated in the progression branch to maintain an inflammatory module resulting in production of distinct inflammatory cytokines. On the other hand, MAFB (MAF BZIP Transcription Factor B) and IGF1 (Insulin-like growth factor 1) were found to be upregulated in the regression branch to interrupt the inflammatory module at different levels. In addition, 10 miRNAs were predicted to be unregulated in progression-branch specific macrophages such as miR-344, miR-346 and miR-485. Conclusion: Inflammatory sites in atherosclerosis lesions contain both pro-inflammatory and anti-inflammatory macrophages. Each subset of macrophage activates unique transcriptional program. Certain transcription factors and growth factors have potential to alter the whole transcriptional regulatory network, thereby shifting the macrophages from inflammatory to anti-inflammatory state. Understanding how macrophage state transition occurs from inflammatory to anti-inflammatory state will be a key step to better understanding and treating atherosclerosis

    Spatiotemporal structure of cell fate decisions in murine neural crest

    Get PDF
    Neural crest cells are embryonic progenitors that generate numerous cell types in vertebrates. With single-cell analysis, we show that mouse trunk neural crest cells become biased toward neuronal lineages when they delaminate from the neural tube, whereas cranial neural crest cells acquire ectomesenchyme potential dependent on activation of the transcription factor Twist1. The choices that neural crest cells make to become sensory, glial, autonomic, or mesenchymal cells can be formalized as a series of sequential binary decisions. Each branch of the decision tree involves initial coactivation of bipotential properties followed by gradual shifts toward commitment. Competing fate programs are coactivated before cells acquire fate-specific phenotypic traits. Determination of a specific fate is achieved by increased synchronization of relevant programs and concurrent repression of competing fate programs

    단일세포 전사체 분석을 통한 암미세환경 내 면역세포의 이질성에 관한 연구

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 의과대학 의과학과, 2018. 2. 묵인희.Introduction: Tumor environment is established by various components including the malignant, stromal and immune cells. Single-cell transcriptome profiling of tumor samples allows the dissection of heterogeneous tumor cells and the neighboring stromal or immune cells. Precise characterization of the infiltrating immune cells may provide clues for novel immunotherapy strategies. Methods: A total of 515 individual cells from 11 breast cancer patients and 162 cells from four advanced gastric cancer (AGC) patients were analyzed by single-cell RNA sequencing (RNA-seq). Reference single-cell transcriptomes for M1-type or M2-type macrophages were generated from normal blood-derived monocytes after in vitro differentiation. Results: Copy number alteration patterns inferred from the single-cell RNA-seq data separated tumor cells and non-tumor cells. Most of the non-tumor cells were immune cells. In breast cancer*, three distinct immune cell clusters of T lymphocytes, B lymphocytes, and macrophages were identified. T lymphocytes displayed immunosuppressive characteristics with a regulatory or exhausted phenotype. B lymphocytes were divided into two subgroups, the anti-apoptotic naïve/memory cell group and the highly proliferative B cell group. In AGC, all of the detected immune cells were tumor-associated macrophages (TAMs). When compared to the reference transcriptomes of the M1 or M2 macrophages, an M2-biased tendency was observed in the AGC TAMs with a heterogeneous level of polarization. In comparison, TAMs originating from breast cancer or colorectal cancer showed both M1-biased and M2-biased cells. Conclusions: This study demonstrates the power of single-cell RNA sequencing for the characterization of tumor-infiltrating immune cells to develop immunotherapeutic strategies. Single-cell transcriptome analysis subdivides the microenvironmental immune cells by cell types and their properties with heterogeneous levels of pathway activation. High resolution profiles can provide a new view on targeting the exhausted tumor-infiltrating lymphocytes or transforming the anti-inflammatory M2-type TAM to the pro-inflammatory M1-type.Introduction 1 Material and Methods 5 Results 16 Part I. Separation and identification of immune cells from tumors by single-cell RNA sequencing in breast cancer 1-1. Pathologic profiles of patients for single-cell analysis 17 1-2. Reliability of single-cell RNA sequencing 17 1-3. Separation of tumor and tumor-associated normal cells 18 1-4. Immune cell populations identified in tumor microenvironment 20 1-5. Heterogeneity within tumor-infiltrating immune cells 21 Part II. Uncovering heterogeneous polarization levels of tumor-associated macrophages using single-cell RNA sequencing 2-1. Separation of macrophages in advanced gastric cancer environment by single-cell RNA-seq 42 2-2. Construction of M1 and M2 single-cell transcriptome profiles 43 2-3. M2 scoring using reference transcriptomes 44 2-4. Heterogeneous polarization levels of TAMs in various tumor types 45 Discussion 56 References 60 Abstract in Korean 68Docto

    Geometric data understanding : deriving case specific features

    Get PDF
    There exists a tradition using precise geometric modeling, where uncertainties in data can be considered noise. Another tradition relies on statistical nature of vast quantity of data, where geometric regularity is intrinsic to data and statistical models usually grasp this level only indirectly. This work focuses on point cloud data of natural resources and the silhouette recognition from video input as two real world examples of problems having geometric content which is intangible at the raw data presentation. This content could be discovered and modeled to some degree by such machine learning (ML) approaches like deep learning, but either a direct coverage of geometry in samples or addition of special geometry invariant layer is necessary. Geometric content is central when there is a need for direct observations of spatial variables, or one needs to gain a mapping to a geometrically consistent data representation, where e.g. outliers or noise can be easily discerned. In this thesis we consider transformation of original input data to a geometric feature space in two example problems. The first example is curvature of surfaces, which has met renewed interest since the introduction of ubiquitous point cloud data and the maturation of the discrete differential geometry. Curvature spectra can characterize a spatial sample rather well, and provide useful features for ML purposes. The second example involves projective methods used to video stereo-signal analysis in swimming analytics. The aim is to find meaningful local geometric representations for feature generation, which also facilitate additional analysis based on geometric understanding of the model. The features are associated directly to some geometric quantity, and this makes it easier to express the geometric constraints in a natural way, as shown in the thesis. Also, the visualization and further feature generation is much easier. Third, the approach provides sound baseline methods to more traditional ML approaches, e.g. neural network methods. Fourth, most of the ML methods can utilize the geometric features presented in this work as additional features.Geometriassa käytetään perinteisesti tarkkoja malleja, jolloin datassa esiintyvät epätarkkuudet edustavat melua. Toisessa perinteessä nojataan suuren datamäärän tilastolliseen luonteeseen, jolloin geometrinen säännönmukaisuus on datan sisäsyntyinen ominaisuus, joka hahmotetaan tilastollisilla malleilla ainoastaan epäsuorasti. Tämä työ keskittyy kahteen esimerkkiin: luonnonvaroja kuvaaviin pistepilviin ja videohahmontunnistukseen. Nämä ovat todellisia ongelmia, joissa geometrinen sisältö on tavoittamattomissa raakadatan tasolla. Tämä sisältö voitaisiin jossain määrin löytää ja mallintaa koneoppimisen keinoin, esim. syväoppimisen avulla, mutta joko geometria pitää kattaa suoraan näytteistämällä tai tarvitaan neuronien lisäkerros geometrisia invariansseja varten. Geometrinen sisältö on keskeinen, kun tarvitaan suoraa avaruudellisten suureiden havainnointia, tai kun tarvitaan kuvaus geometrisesti yhtenäiseen dataesitykseen, jossa poikkeavat näytteet tai melu voidaan helposti erottaa. Tässä työssä tarkastellaan datan muuntamista geometriseen piirreavaruuteen kahden esimerkkiohjelman suhteen. Ensimmäinen esimerkki on pintakaarevuus, joka on uudelleen virinneen kiinnostuksen kohde kaikkialle saatavissa olevan datan ja diskreetin geometrian kypsymisen takia. Kaarevuusspektrit voivat luonnehtia avaruudellista kohdetta melko hyvin ja tarjota koneoppimisessa hyödyllisiä piirteitä. Toinen esimerkki koskee projektiivisia menetelmiä käytettäessä stereovideosignaalia uinnin analytiikkaan. Tavoite on löytää merkityksellisiä paikallisen geometrian esityksiä, jotka samalla mahdollistavat muun geometrian ymmärrykseen perustuvan analyysin. Piirteet liittyvät suoraan johonkin geometriseen suureeseen, ja tämä helpottaa luonnollisella tavalla geometristen rajoitteiden käsittelyä, kuten väitöstyössä osoitetaan. Myös visualisointi ja lisäpiirteiden luonti muuttuu helpommaksi. Kolmanneksi, lähestymistapa suo selkeän vertailumenetelmän perinteisemmille koneoppimisen lähestymistavoille, esim. hermoverkkomenetelmille. Neljänneksi, useimmat koneoppimismenetelmät voivat hyödyntää tässä työssä esitettyjä geometrisia piirteitä lisäämällä ne muiden piirteiden joukkoon

    Principal Graph and Structure Learning Based on Reversed Graph Embedding

    No full text
    corecore