Search CORE

66,599 research outputs found

FP-tree and COFI Based Approach for Mining of Multiple Level Association Rules in Large Databases

Author: Kumar Parveen
Pardasani K. R.
Shrivastava Virendra Kumar
Publication venue: 'Research Publishing Services'
Publication date: 01/01/2010
Field of study

In recent years, discovery of association rules among itemsets in a large database has been described as an important database-mining problem. The problem of discovering association rules has received considerable research attention and several algorithms for mining frequent itemsets have been developed. Many algorithms have been proposed to discover rules at single concept level. However, mining association rules at multiple concept levels may lead to the discovery of more specific and concrete knowledge from data. The discovery of multiple level association rules is very much useful in many applications. In most of the studies for multiple level association rule mining, the database is scanned repeatedly which affects the efficiency of mining process. In this research paper, a new method for discovering multilevel association rules is proposed. It is based on FP-tree structure and uses cooccurrence frequent item tree to find frequent items in multilevel concept hierarchy.Comment: Pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS, Vol. 7 No. 2, February 2010, USA. ISSN 1947 5500, http://sites.google.com/site/ijcsis

arXiv.org e-Print Archive

Crossref

SICLE: A high-throughput tool for extracting evolutionary relationships from phylogenetic trees

Author: DeBlasio Dan
Wiscaver Jennifer
Publication venue: 'PeerJ'
Publication date: 16/06/2016
Field of study

We present the phylogeny analysis software SICLE (Sister Clade Extractor), an easy-to-use, high- throughput tool to describe the nearest neighbors to a node of interest in a phylogenetic tree as well as the support value for the relationship. The application is a command line utility that can be embedded into a phylogenetic analysis pipeline or can be used as a subroutine within another C++ program. As a test case, we applied this new tool to the published phylome of Salinibacter ruber, a species of halophilic Bacteriodetes, identifying 13 unique sister relationships to S. ruber across the 4589 gene phylogenies. S. ruber grouped with bacteria, most often other Bacteriodetes, in the majority of phylogenies, but 91 phylogenies showed a branch-supported sister association between S. ruber and Archaea, an evolutionarily intriguing relationship indicative of horizontal gene transfer. This test case demonstrates how SICLE makes it possible to summarize the phylogenetic information produced by automated phylogenetic pipelines to rapidly identify and quantify the possible evolutionary relationships that merit further investigation. SICLE is available for free for noncommercial use at http://eebweb.arizona.edu/sicle/.Comment: 8 pages, 4 figures in journal submission forma

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Hierarchical Star-Formation in M33: Fundamental properties of the star-forming regions

Author: Bastian N.
Efremov Yu.
Ercolano B.
Gieles M.
Gutermuth R.
Rosolowsky E.
Scheepmaker R. A.
Publication venue: 'Wiley'
Publication date: 01/01/2007
Field of study

Star-formation within galaxies appears on multiple scales, from spiral structure, to OB associations, to individual star clusters, and often sub-structure within these clusters. This multitude of scales calls for objective methods to find and classify star-forming regions, regardless of spatial size. To this end, we present an analysis of star-forming groups in the local group spiral galaxy M33, based on a new implementation of the Minimum Spanning Tree (MST) method. Unlike previous studies which limited themselves to a single spatial scale, we study star-forming structures from the effective resolution limit (~20pc) to kpc scales. We find evidence for a continuum of star-forming group sizes, from pc to kpc scales. We do not find a characteristic scale for OB associations, unlike that found in previous studies, and we suggest that the appearance of such a scale was caused by spatial resolution and selection effects. The luminosity function of the groups is found to be well represented by a power-law with an index, -2, similar to that found for clusters and GMCs. Additionally, the groups follow a similar mass-radius relation as GMCs. The size distribution of the groups is best described by a log-normal distribution and we show that within a hierarchical distribution, if a scale is selected to find structure, the resulting size distribution will have a log-normal distribution. We find an abrupt drop of the number of groups outside a galactic radius of ~4kpc, suggesting a change in the structure of the star-forming ISM, possibly reflected in the lack of GMCs beyond this radius. (abridged)Comment: 12 pages, 16 figures, accepted MNRA

arXiv.org e-Print Archive

Crossref

Utrecht University Repository

Surrey Research Insight

Latent protein trees

Author: Carin Lawrence
Ginsburg Geoffrey S.
Henao Ricardo
Lucas Joseph E.
Moseley M. Arthur
Thompson J. Will
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/06/2013
Field of study

Unbiased, label-free proteomics is becoming a powerful technique for measuring protein expression in almost any biological sample. The output of these measurements after preprocessing is a collection of features and their associated intensities for each sample. Subsets of features within the data are from the same peptide, subsets of peptides are from the same protein, and subsets of proteins are in the same biological pathways, therefore, there is the potential for very complex and informative correlational structure inherent in these data. Recent attempts to utilize this data often focus on the identification of single features that are associated with a particular phenotype that is relevant to the experiment. However, to date, there have been no published approaches that directly model what we know to be multiple different levels of correlation structure. Here we present a hierarchical Bayesian model which is specifically designed to model such correlation structure in unbiased, label-free proteomics. This model utilizes partial identification information from peptide sequencing and database lookup as well as the observed correlation in the data to appropriately compress features into latent proteins and to estimate their correlation structure. We demonstrate the effectiveness of the model using artificial/benchmark data and in the context of a series of proteomics measurements of blood plasma from a collection of volunteers who were infected with two different strains of viral influenza.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS639 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

DukeSpace