66,599 research outputs found
FP-tree and COFI Based Approach for Mining of Multiple Level Association Rules in Large Databases
In recent years, discovery of association rules among itemsets in a large
database has been described as an important database-mining problem. The
problem of discovering association rules has received considerable research
attention and several algorithms for mining frequent itemsets have been
developed. Many algorithms have been proposed to discover rules at single
concept level. However, mining association rules at multiple concept levels may
lead to the discovery of more specific and concrete knowledge from data. The
discovery of multiple level association rules is very much useful in many
applications. In most of the studies for multiple level association rule
mining, the database is scanned repeatedly which affects the efficiency of
mining process. In this research paper, a new method for discovering multilevel
association rules is proposed. It is based on FP-tree structure and uses
cooccurrence frequent item tree to find frequent items in multilevel concept
hierarchy.Comment: Pages IEEE format, International Journal of Computer Science and
Information Security, IJCSIS, Vol. 7 No. 2, February 2010, USA. ISSN 1947
5500, http://sites.google.com/site/ijcsis
SICLE: A high-throughput tool for extracting evolutionary relationships from phylogenetic trees
We present the phylogeny analysis software SICLE (Sister Clade Extractor), an
easy-to-use, high- throughput tool to describe the nearest neighbors to a node
of interest in a phylogenetic tree as well as the support value for the
relationship. The application is a command line utility that can be embedded
into a phylogenetic analysis pipeline or can be used as a subroutine within
another C++ program. As a test case, we applied this new tool to the published
phylome of Salinibacter ruber, a species of halophilic Bacteriodetes,
identifying 13 unique sister relationships to S. ruber across the 4589 gene
phylogenies. S. ruber grouped with bacteria, most often other Bacteriodetes, in
the majority of phylogenies, but 91 phylogenies showed a branch-supported
sister association between S. ruber and Archaea, an evolutionarily intriguing
relationship indicative of horizontal gene transfer. This test case
demonstrates how SICLE makes it possible to summarize the phylogenetic
information produced by automated phylogenetic pipelines to rapidly identify
and quantify the possible evolutionary relationships that merit further
investigation. SICLE is available for free for noncommercial use at
http://eebweb.arizona.edu/sicle/.Comment: 8 pages, 4 figures in journal submission forma
Hierarchical Star-Formation in M33: Fundamental properties of the star-forming regions
Star-formation within galaxies appears on multiple scales, from spiral
structure, to OB associations, to individual star clusters, and often
sub-structure within these clusters. This multitude of scales calls for
objective methods to find and classify star-forming regions, regardless of
spatial size. To this end, we present an analysis of star-forming groups in the
local group spiral galaxy M33, based on a new implementation of the Minimum
Spanning Tree (MST) method. Unlike previous studies which limited themselves to
a single spatial scale, we study star-forming structures from the effective
resolution limit (~20pc) to kpc scales. We find evidence for a continuum of
star-forming group sizes, from pc to kpc scales. We do not find a
characteristic scale for OB associations, unlike that found in previous
studies, and we suggest that the appearance of such a scale was caused by
spatial resolution and selection effects. The luminosity function of the groups
is found to be well represented by a power-law with an index, -2, similar to
that found for clusters and GMCs. Additionally, the groups follow a similar
mass-radius relation as GMCs. The size distribution of the groups is best
described by a log-normal distribution and we show that within a hierarchical
distribution, if a scale is selected to find structure, the resulting size
distribution will have a log-normal distribution. We find an abrupt drop of the
number of groups outside a galactic radius of ~4kpc, suggesting a change in the
structure of the star-forming ISM, possibly reflected in the lack of GMCs
beyond this radius. (abridged)Comment: 12 pages, 16 figures, accepted MNRA
Latent protein trees
Unbiased, label-free proteomics is becoming a powerful technique for
measuring protein expression in almost any biological sample. The output of
these measurements after preprocessing is a collection of features and their
associated intensities for each sample. Subsets of features within the data are
from the same peptide, subsets of peptides are from the same protein, and
subsets of proteins are in the same biological pathways, therefore, there is
the potential for very complex and informative correlational structure inherent
in these data. Recent attempts to utilize this data often focus on the
identification of single features that are associated with a particular
phenotype that is relevant to the experiment. However, to date, there have been
no published approaches that directly model what we know to be multiple
different levels of correlation structure. Here we present a hierarchical
Bayesian model which is specifically designed to model such correlation
structure in unbiased, label-free proteomics. This model utilizes partial
identification information from peptide sequencing and database lookup as well
as the observed correlation in the data to appropriately compress features into
latent proteins and to estimate their correlation structure. We demonstrate the
effectiveness of the model using artificial/benchmark data and in the context
of a series of proteomics measurements of blood plasma from a collection of
volunteers who were infected with two different strains of viral influenza.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS639 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …