Search CORE

273 research outputs found

Interactive Constrained Association Rule Mining

Author: Bussche Jan Van den
Goethals Bart
Publication venue
Publication date: 01/01/2003
Field of study

We investigate ways to support interactive mining sessions, in the setting of association rule mining. In such sessions, users specify conditions (queries) on the associations to be generated. Our approach is a combination of the integration of querying conditions inside the mining phase, and the incremental querying of already generated associations. We present several concrete algorithms and compare their performance.Comment: A preliminary report on this work was presented at the Second International Conference on Knowledge Discovery and Data Mining (DaWaK 2000

arXiv.org e-Print Archive

CiteSeerX

A Tight Upper Bound on the Number of Candidate Patterns

Author: Bussche Jan Van den
Geerts Floris
Goethals Bart
Publication venue
Publication date: 01/01/2001
Field of study

In the context of mining for frequent patterns using the standard levelwise algorithm, the following question arises: given the current level and the current set of frequent patterns, what is the maximal number of candidate patterns that can be generated on the next level? We answer this question by providing a tight upper bound, derived from a combinatorial result from the sixties by Kruskal and Katona. Our result is useful to reduce the number of database scans

arXiv.org e-Print Archive

CiteSeerX

Knowledge, false beliefs and fact-driven perceptions of Muslims in Australia: a national survey

Author: Bart Goethals
Toon Calders
Publication venue: David Lovell Publishing
Publication date: 01/01/2005
Field of study

Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, independently of the algorithm used. Therefore, it is often more interesting to mine a reduced collection of interesting itemsets, i.e., a condensed representation. Recently, in this context, the non-derivable itemsets were proposed as an important class of itemsets. An itemset is called derivable when its support is completely determined by the support of its subsets. As such, derivable itemsets represent redundant information and can be pruned from the collection of frequent itemsets. It was shown both theoretically and experimentally that the collection of non-derivable frequent itemsets is in general much smaller than the complete set of frequent itemsets. A breadth-first, Apriori-based algorithm, called NDI, to find all non-derivable itemsets was proposed. In this paper we present a depth-first algorithm, dfNDI, that is based on Eclat for mining the non-derivable itemsets. dfNDI is evaluated on real-life datasets, and experiments show that dfNDI outperforms NDI with an order of magnitude.

CiteSeerX

Deakin Research Online

Crossref

Pure OAI Repository

Institutional Repository Universiteit Antwerpen

DI-fusion

Leveraging Sequential Episode Mining for Session-Based News Recommendation

Author: Goethals Bart
Karimi Mozhgan
Čule Boris
Publication venue
Publication date: 01/01/2023
Field of study

News recommender systems aim to help users find interesting and relevant news stories while mitigating information overload. Over the past few decades, various challenges have emerged in developing effective algorithms for real-world scenarios. These challenges include capturing evolving user preferences and addressing concept drift during reading sessions. Additionally, ensuring the freshness and timeliness of news content poses significant obstacles. To address these issues, we utilize an innovative sequential pattern mining approach known as Marbles to capture user behavior. Marbles leverages frequent episodes to generate a collection of association rules, where a frequent episode is a partially ordered pattern that occurs frequently in the input sequence. The recommendation process involves identifying relevant rules extracted from these patterns and weighting them. Subsequently, a heuristic procedure assesses candidate rules and generates a list of recommendations for users based on their most recent reading session. Notably, we conduct our evaluation in a streaming scenario, simulating real-world usage, where both our algorithm and baselines dynamically improve their models with each user click. Through our empirical evaluation in this streaming-based scenario, which closely models real-world usage, we demonstrate the applicability of the Marbles algorithm in session-based recommendation. Our proposed approach outperforms baseline algorithms on two real-world data sets, effectively addressing the challenges specific to the news domain.</p

Tilburg University Repository

Guest Editors’ introduction: special issue of selected papers from ECML PKDD 2008

Author: Bart Goethals
Katharina Morik
Walter Daelemans
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Pattern mining of mass spectrometry quality control data

Author: Bittremieux Wout
Goethals Bart
Laukens Kris
Mrzic Aida
Valkenborg Dirk
Willems Hanny
Publication venue
Publication date
Field of study

Pattern mining of mass spectrometry quality control data Mass spectrometry is widely used to identify proteins based on the mass distribution of their peptides. Unfortunately, because of its inherent complexity, the results of a mass spectrometry experiment can be subject to a large variability. As a means of quality control, recently several qualitative metrics have been defined. [1] Initially these quality control metrics were evaluated independently in order to separately assess particular stages of a mass spectrometry experiment. However, this method is insufficient because the different stages of an experiment do not function in isolation, instead they will influence each other. As a result, subsequent work employed a multivariate statistics approach to assess the correlation structure of the different quality control metrics. [2] However, by making use of some more advanced data mining techniques, additional useful information can be extracted from these quality control metrics. Various pattern mining techniques can be employed to discover hidden patterns in this quality control data. Subspace clustering tries to detect clusters of items based on a restricted set of dimensions. [3] This can be leveraged to for example detect aberrant experiments where only a few of the quality control metrics are outliers, but the experiment still behaved correctly in general. In addition, specialized frequent itemset mining and association rule learning techniques can be used to discover relationships between the various stages of a mass spectrometry experiment, as they are exhibited by the different quality control metrics. Finally, a major source of untapped information lies in the temporal aspect. Most often, problems in a mass spectrometry setup appear gradually, but are only observed after a critical juncture. As previous analyses have not used this temporal information directly, there remains a large potential to detect these problems as soon as they start to manifest by taking this additional dimension of information into account. Based on the previously discovered patterns, these can be evaluated over time by making use of sequential pattern mining techniques. The awareness has risen that suitable quality control information is mandatory to assess the validity of a mass spectrometry experiment. Current efforts aim to standardize this quality control information [4], which will facilitate the dissemination of the data. This results in a large amount of as of yet untapped information, which can be leveraged by making use of specific data mining techniques in order to harness the full power of this new information. [1] Rudnick, P. A. et al. Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Molecular & Cellular Proteomics 9, 225–241 (2010). [2] Wang, X. et al. QC metrics from CPTAC raw LC-MS/MS data interpreted through multivariate statistics. Analytical Chemistry 86, 2497–2509 (2014). [3] Aksehirli, E., Goethals, B., Müller, E. & Vreeken, J. Cartification: A neighborhood preserving transformation for mining high dimensional data. in Thirteenth IEEE International Conference on Data Mining - ICDM ’13 937–942 (IEEE, 2013). doi:10.1109/ICDM.2013.146 [4] Walzer, M. et al. qcML: An exchange format for quality control metrics from mass spectrometry experiments. Molecular & Cellular Proteomics (2014). doi:10.1074/mcp.M113.03590

ZENODO

FigShare

Effect of post-hatch transportation duration and parental age on broiler chicken quality, welfare, and productivity

Author: Ampe Bart
Delezie Evelyne
Duchateau Luc
Gellynck Xavier
Goethals Klara
Jacobs Leonie
Lambrecht Evelien
Tuyttens Frank
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

Broiler chicks are transported to production sites within one to 2 d post-hatch. Possible effects of this transportation are poorly understood and could vary among chicks from breeder flocks of different ages. The aim of the present study was to investigate the effects of transportation duration and parental flock age on chick welfare, productivity, and quality. After hatch in a commercial hatchery, 1,620 mixed-sex chicks from 29-wk old (young) and 1,620 chicks from 60-wk old (old) breeders were subjected to transportation of 1.5 h or 11 h duration. After transportation, 2,800 chicks were divided among 100 pens, with each pen containing 28 chicks from one transportation crate (2 or 3 pens per crate). From the remaining chicks, on average 6 chicks (min 4, max 8) per crate (n = 228) were randomly selected and assessed for chick quality, weighed, and culled for yolk sac weighing (one d). Chicks that had not been assigned to pens or were not used for post-transportation measurements, were removed from the experiment (n = 212). Mortality, ADG, BW, and feed conversion (FC) of the experimental chicks were recorded until 41 d. Meat quality was measured for breast fillets (n = 47). No interaction effect of parental age and transportation duration was found for any variables. BW and yolk sac weight at one d were lower for chicks transported 11 h than 1.5 h and for chicks from young versus old breeders. The effect of parental flock age on BW persisted until slaughter. Additionally, parental age positively affected ADG until slaughter. Chick quality was lower in chicks from old versus young breeders. Chick quality and productivity were not affected by transportation duration. Mortality and meat quality were not affected by either parental age or transportation duration. To conclude, no long-term detrimental effects were found from long post-hatch transportation in chicks from young or old parent flocks. Based on these results, we suggest that 11 h post-hatch transportations under similar conditions do not impose long-term welfare or productivity risks

Ghent University Academic Bibliography

PubMed Central