3 research outputs found

    Cloud-based highly parallel execution of t-SNE and SPADE with metaclustering for analysis and visualization of large single-cell datasets

    Full text link
    The use of machine learning techniques, in particular unsupervised clustering and dimensionality reduction algorithms, is quickly becoming a standard workflow for identifying and visualizing biological populations from within high-dimensional data. These methods allow researchers to approach data analysis without the bias and subjectivity that has traditionally been standard in the field. Algorithms have context-dependent strengths and weaknesses. Across algorithms, an inability to scale computation to large datasets is a common theme. Most algorithms are designed and distributed to run on individual computers where memory and CPU are quickly exhausted by large datasets. Even when high-performance compute resources are available, algorithms often don't scale to large datasets as a fundamental property of their design. If they do, it might result in an untenable increase in runtime or diminished quality of results. t-SNE and SPADE are two well-published algorithms that suffer problems as discussed above after datasets exceed a number of observations on the order of 1 million. This study introduces an alternative approach to the use of SPADE and t- SNE whereby a dataset is divided and distributed across numerous compute nodes in the cloud to process independently in parallel. The results of each computation are then combined in a metaclustering step for final visualization and analysis. The improvement in execution speed as a function of degree of parallelization is established. The method is validated against a non-parallel analysis of the same dataset to establish concordance of identified populations. The workflow is executed on Cytobank for portability to other researchers

    viSNE fine-tuning enables better resolution of cell populations

    Full text link
    t-Distributed Stochastic Neighbor Embedding (t-SNE or viSNE) is a dimensionality reduction algorithm that allows visualization of complex high-dimensional cytometry data as a two-dimensional distribution or " map ". These maps can be interrogated by human-guided or automated techniques to categorize single cell data into relevant biological populations and otherwise visualize important differences between samples. The method has been extensively adopted and reported in the literature to be superior to traditional biaxial gating. The analyst must carefully choose the parameters of a t-SNE computation, as incorrectly chosen parameters might create artifacts that make the resulting map difficult or impossible to interpret. The correct choice of algorithm parameters is complicated by a lack of agreed-upon quantitative framework for assessing the quality of algorithm results. Gauging result quality currently relies on subjective visual evaluation by an experienced t-SNE user. To overcome these limitations, we used Cytobank viSNE engine for all t-SNE analyses and employed 18-parameter flow cytometry data as well as 32-parameter mass cytometry data of varying numbers of events to optimize t-SNE parameters such as total number of iterations and perplexity. We also investigated the utility of Kullback-Liebler (KL) divergence as a metric for map quality as well as SPADE clustering as an indirect measure of multidimensional data integrity when flattened into t-SNE coordinates. We have established the imperative requirement for the number of t-SNE analysis optimization steps ('iteration number') to be scaled with the total number of data points (events) in the set, suggesting that a number of existing software solutions produce unclear t-SNE maps of flow and mass cytometry data due to built-in user control restrictions. We also evaluated lower-level parameters within the t-SNE code that control the 'early exaggeration' stage initially introduced into t-SNE algorithm for better map optimization. These parameters are not available as part of the standard algorithm interface, but we found that they can be tuned to produce high quality results in shorter periods of time, avoiding unnecessary increases of both analysis duration and computation cost. Therefore, our approach allows to fine-tune the t-SNE analysis to ensure both optimal resolution of t-SNE low-dimensional maps and better faithfulness of their presentation of high-parameter cytometry data

    Concussion policy and law: Primacy or subservience for the athletic trainer?

    No full text
    Legal mandates consistent with up-todate scientific understanding are necessary for successful enforcement of concussion protocols. The health and safety of participants in both training and competition is ensured by the diligence of athletic trainers as their primary responsibility. Coaches must acquiesce to the health care provider\u27s return-to-play recommendation because the coach\u27s primary responsibility is coordination of play and training. This article provides a review of law and policy necessary for the enforcement of concussion protocols. A brief review of the history of concussions in sport is followed by a discussion of necessary legal mandates to support the primacy of the athletic trainer serving as a health care provider in determining the return-to-play status of a student-athlete with concussion. Legal mandates focus on emerging statutory law and the employment relationship of the athletic trainer and the school. The ultimate goal is protection of student-athletes from the medical consequences associated with concussions
    corecore