72,253 research outputs found
Visualization Techniques for Tongue Analysis in Traditional Chinese Medicine
Visual inspection of the tongue has been an important diagnostic method of Traditional Chinese Medicine (TCM). Clinic data have shown significant connections between various viscera cancers and abnormalities in the tongue and the tongue coating. Visual inspection of the tongue is simple and inexpensive, but the current practice in TCM is mainly experience-based and the quality of the visual inspection varies between individuals. The computerized inspection method provides quantitative models to evaluate color, texture and surface features on the tongue. In this paper, we investigate visualization techniques and processes to allow interactive data analysis with the aim to merge computerized measurements with human expert's diagnostic variables based on five-scale diagnostic conditions: Healthy (H), History Cancers (HC), History of Polyps (HP), Polyps (P) and Colon Cancer (C)
Recommended from our members
ManiNetCluster: a novel manifold learning approach to reveal the functional links between gene networks.
BACKGROUND:The coordination of genomic functions is a critical and complex process across biological systems such as phenotypes or states (e.g., time, disease, organism, environmental perturbation). Understanding how the complexity of genomic function relates to these states remains a challenge. To address this, we have developed a novel computational method, ManiNetCluster, which simultaneously aligns and clusters gene networks (e.g., co-expression) to systematically reveal the links of genomic function between different conditions. Specifically, ManiNetCluster employs manifold learning to uncover and match local and non-linear structures among networks, and identifies cross-network functional links. RESULTS:We demonstrated that ManiNetCluster better aligns the orthologous genes from their developmental expression profiles across model organisms than state-of-the-art methods (p-value <2.2×10-16). This indicates the potential non-linear interactions of evolutionarily conserved genes across species in development. Furthermore, we applied ManiNetCluster to time series transcriptome data measured in the green alga Chlamydomonas reinhardtii to discover the genomic functions linking various metabolic processes between the light and dark periods of a diurnally cycling culture. We identified a number of genes putatively regulating processes across each lighting regime. CONCLUSIONS:ManiNetCluster provides a novel computational tool to uncover the genes linking various functions from different networks, providing new insight on how gene functions coordinate across different conditions. ManiNetCluster is publicly available as an R package at https://github.com/daifengwanglab/ManiNetCluster
Somoclu: An Efficient Parallel Library for Self-Organizing Maps
Somoclu is a massively parallel tool for training self-organizing maps on
large data sets written in C++. It builds on OpenMP for multicore execution,
and on MPI for distributing the workload across the nodes in a cluster. It is
also able to boost training by using CUDA if graphics processing units are
available. A sparse kernel is included, which is useful for high-dimensional
but sparse data, such as the vector spaces common in text mining workflows.
Python, R and MATLAB interfaces facilitate interactive use. Apart from fast
execution, memory use is highly optimized, enabling training large emergent
maps even on a single computer.Comment: 26 pages, 9 figures. The code is available at
https://peterwittek.github.io/somoclu
Unsupervised String Transformation Learning for Entity Consolidation
Data integration has been a long-standing challenge in data management with
many applications. A key step in data integration is entity consolidation. It
takes a collection of clusters of duplicate records as input and produces a
single "golden record" for each cluster, which contains the canonical value for
each attribute. Truth discovery and data fusion methods, as well as Master Data
Management (MDM) systems, can be used for entity consolidation. However, to
achieve better results, the variant values (i.e., values that are logically the
same with different formats) in the clusters need to be consolidated before
applying these methods.
For this purpose, we propose a data-driven method to standardize the variant
values based on two observations: (1) the variant values usually can be
transformed to the same representation (e.g., "Mary Lee" and "Lee, Mary") and
(2) the same transformation often appears repeatedly across different clusters
(e.g., transpose the first and last name). Our approach first uses an
unsupervised method to generate groups of value pairs that can be transformed
in the same way (i.e., they share a transformation). Then the groups are
presented to a human for verification and the approved ones are used to
standardize the data. In a real-world dataset with 17,497 records, our method
achieved 75% recall and 99.5% precision in standardizing variant values by
asking a human 100 yes/no questions, which completely outperformed a state of
the art data wrangling tool
- …