16,219 research outputs found
A survey on utilization of data mining approaches for dermatological (skin) diseases prediction
Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Generalized Fragmentation Functions for Fractal Jet Observables
We introduce a broad class of fractal jet observables that recursively probe
the collective properties of hadrons produced in jet fragmentation. To describe
these collinear-unsafe observables, we generalize the formalism of
fragmentation functions, which are important objects in QCD for calculating
cross sections involving identified final-state hadrons. Fragmentation
functions are fundamentally nonperturbative, but have a calculable
renormalization group evolution. Unlike ordinary fragmentation functions,
generalized fragmentation functions exhibit nonlinear evolution, since fractal
observables involve correlated subsets of hadrons within a jet. Some special
cases of generalized fragmentation functions are reviewed, including jet charge
and track functions. We then consider fractal jet observables that are based on
hierarchical clustering trees, where the nonlinear evolution equations also
exhibit tree-like structure at leading order. We develop a numeric code for
performing this evolution and study its phenomenological implications. As an
application, we present examples of fractal jet observables that are useful in
discriminating quark jets from gluon jets.Comment: 37+18 pages, 24 figure
Non-parametric Bayesian modelling of digital gene expression data
Next-generation sequencing technologies provide a revolutionary tool for
generating gene expression data. Starting with a fixed RNA sample, they
construct a library of millions of differentially abundant short sequence tags
or "reads", which constitute a fundamentally discrete measure of the level of
gene expression. A common limitation in experiments using these technologies is
the low number or even absence of biological replicates, which complicates the
statistical analysis of digital gene expression data. Analysis of this type of
data has often been based on modified tests originally devised for analysing
microarrays; both these and even de novo methods for the analysis of RNA-seq
data are plagued by the common problem of low replication. We propose a novel,
non-parametric Bayesian approach for the analysis of digital gene expression
data. We begin with a hierarchical model for modelling over-dispersed count
data and a blocked Gibbs sampling algorithm for inferring the posterior
distribution of model parameters conditional on these counts. The algorithm
compensates for the problem of low numbers of biological replicates by
clustering together genes with tag counts that are likely sampled from a common
distribution and using this augmented sample for estimating the parameters of
this distribution. The number of clusters is not decided a priori, but it is
inferred along with the remaining model parameters. We demonstrate the ability
of this approach to model biological data with high fidelity by applying the
algorithm on a public dataset obtained from cancerous and non-cancerous neural
tissues
- …