271 research outputs found

    Interactive Constrained Association Rule Mining

    Full text link
    We investigate ways to support interactive mining sessions, in the setting of association rule mining. In such sessions, users specify conditions (queries) on the associations to be generated. Our approach is a combination of the integration of querying conditions inside the mining phase, and the incremental querying of already generated associations. We present several concrete algorithms and compare their performance.Comment: A preliminary report on this work was presented at the Second International Conference on Knowledge Discovery and Data Mining (DaWaK 2000

    A Tight Upper Bound on the Number of Candidate Patterns

    Full text link
    In the context of mining for frequent patterns using the standard levelwise algorithm, the following question arises: given the current level and the current set of frequent patterns, what is the maximal number of candidate patterns that can be generated on the next level? We answer this question by providing a tight upper bound, derived from a combinatorial result from the sixties by Kruskal and Katona. Our result is useful to reduce the number of database scans

    Knowledge, false beliefs and fact-driven perceptions of Muslims in Australia: a national survey

    Full text link
    Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, independently of the algorithm used. Therefore, it is often more interesting to mine a reduced collection of interesting itemsets, i.e., a condensed representation. Recently, in this context, the non-derivable itemsets were proposed as an important class of itemsets. An itemset is called derivable when its support is completely determined by the support of its subsets. As such, derivable itemsets represent redundant information and can be pruned from the collection of frequent itemsets. It was shown both theoretically and experimentally that the collection of non-derivable frequent itemsets is in general much smaller than the complete set of frequent itemsets. A breadth-first, Apriori-based algorithm, called NDI, to find all non-derivable itemsets was proposed. In this paper we present a depth-first algorithm, dfNDI, that is based on Eclat for mining the non-derivable itemsets. dfNDI is evaluated on real-life datasets, and experiments show that dfNDI outperforms NDI with an order of magnitude.

    Pattern mining of mass spectrometry quality control data

    Get PDF
    Pattern mining of mass spectrometry quality control data Mass spectrometry is widely used to identify proteins based on the mass distribution of their peptides. Unfortunately, because of its inherent complexity, the results of a mass spectrometry experiment can be subject to a large variability. As a means of quality control, recently several qualitative metrics have been defined. [1] Initially these quality control metrics were evaluated independently in order to separately assess particular stages of a mass spectrometry experiment. However, this method is insufficient because the different stages of an experiment do not function in isolation, instead they will influence each other. As a result, subsequent work employed a multivariate statistics approach to assess the correlation structure of the different quality control metrics. [2] However, by making use of some more advanced data mining techniques, additional useful information can be extracted from these quality control metrics. Various pattern mining techniques can be employed to discover hidden patterns in this quality control data. Subspace clustering tries to detect clusters of items based on a restricted set of dimensions. [3] This can be leveraged to for example detect aberrant experiments where only a few of the quality control metrics are outliers, but the experiment still behaved correctly in general. In addition, specialized frequent itemset mining and association rule learning techniques can be used to discover relationships between the various stages of a mass spectrometry experiment, as they are exhibited by the different quality control metrics. Finally, a major source of untapped information lies in the temporal aspect. Most often, problems in a mass spectrometry setup appear gradually, but are only observed after a critical juncture. As previous analyses have not used this temporal information directly, there remains a large potential to detect these problems as soon as they start to manifest by taking this additional dimension of information into account. Based on the previously discovered patterns, these can be evaluated over time by making use of sequential pattern mining techniques. The awareness has risen that suitable quality control information is mandatory to assess the validity of a mass spectrometry experiment. Current efforts aim to standardize this quality control information [4], which will facilitate the dissemination of the data. This results in a large amount of as of yet untapped information, which can be leveraged by making use of specific data mining techniques in order to harness the full power of this new information. [1] Rudnick, P. A. et al. Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Molecular & Cellular Proteomics 9, 225–241 (2010). [2] Wang, X. et al. QC metrics from CPTAC raw LC-MS/MS data interpreted through multivariate statistics. Analytical Chemistry 86, 2497–2509 (2014). [3] Aksehirli, E., Goethals, B., Müller, E. & Vreeken, J. Cartification: A neighborhood preserving transformation for mining high dimensional data. in Thirteenth IEEE International Conference on Data Mining - ICDM ’13 937–942 (IEEE, 2013). doi:10.1109/ICDM.2013.146 [4] Walzer, M. et al. qcML: An exchange format for quality control metrics from mass spectrometry experiments. Molecular & Cellular Proteomics (2014). doi:10.1074/mcp.M113.03590

    Effect of post-hatch transportation duration and parental age on broiler chicken quality, welfare, and productivity

    Get PDF
    Broiler chicks are transported to production sites within one to 2 d post-hatch. Possible effects of this transportation are poorly understood and could vary among chicks from breeder flocks of different ages. The aim of the present study was to investigate the effects of transportation duration and parental flock age on chick welfare, productivity, and quality. After hatch in a commercial hatchery, 1,620 mixed-sex chicks from 29-wk old (young) and 1,620 chicks from 60-wk old (old) breeders were subjected to transportation of 1.5 h or 11 h duration. After transportation, 2,800 chicks were divided among 100 pens, with each pen containing 28 chicks from one transportation crate (2 or 3 pens per crate). From the remaining chicks, on average 6 chicks (min 4, max 8) per crate (n = 228) were randomly selected and assessed for chick quality, weighed, and culled for yolk sac weighing (one d). Chicks that had not been assigned to pens or were not used for post-transportation measurements, were removed from the experiment (n = 212). Mortality, ADG, BW, and feed conversion (FC) of the experimental chicks were recorded until 41 d. Meat quality was measured for breast fillets (n = 47). No interaction effect of parental age and transportation duration was found for any variables. BW and yolk sac weight at one d were lower for chicks transported 11 h than 1.5 h and for chicks from young versus old breeders. The effect of parental flock age on BW persisted until slaughter. Additionally, parental age positively affected ADG until slaughter. Chick quality was lower in chicks from old versus young breeders. Chick quality and productivity were not affected by transportation duration. Mortality and meat quality were not affected by either parental age or transportation duration. To conclude, no long-term detrimental effects were found from long post-hatch transportation in chicks from young or old parent flocks. Based on these results, we suggest that 11 h post-hatch transportations under similar conditions do not impose long-term welfare or productivity risks

    BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation

    Get PDF
    We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be
    • …
    corecore