Search CORE

13 research outputs found

Mining local staircase patterns in noisy data

Author: De Raedt Luc
Fierro Ana Carolina
Guns Tias
International workshop on Co-Clustering and Applications
Le Van Thanh
Marchal Kathleen
Nijssen Siegfried
van Leeuwen Matthijs
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Most traditional biclustering algorithms identify biclusters with no or little overlap. In this paper, we introduce the problem of identifying staircases of biclusters. Such staircases may be indicative for causal relationships between columns and can not easily be identified by existing biclustering algorithms. Our formalization relies on a scoring function based on the Minimum Description Length principle. Furthermore, we propose a first algorithm for identifying staircase biclusters, based on a combination of local search and constraint programming. Experiments show that the approach is promising

Crossref

Ghent University Academic Bibliography

DIAL UCLouvain

Finding banded patternsin large data set using segmentation

Author: Abdullahi F.B.
Coenen F.
Publication venue: 'African Journals Online (AJOL)'
Publication date: 15/12/2021
Field of study

No Abstrac

AJOL - African Journals Online

University of Liverpool Repository

Finding Banded Patterns in Data: The Banded Pattern Mining Algorithm

Author: Abdullahi Fatimah B
Coenen Frans
Martin Russell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

University of Liverpool Repository

Crossref

What you will gain by rounding : theory and algorithms for rounding rank

Author: Gemulla Rainer
Miettinen Pauli
Neumann Stefan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

When factorizing binary matrices, we often have to make a choice between using expensive combinatorial methods that retain the discrete nature of the data and using continuous methods that can be more efficient but destroy the discrete structure. Alternatively, we can first compute a continuous factorization and subsequently apply a rounding procedure to obtain a discrete representation. But what will we gain by rounding? Will this yield lower reconstruction errors? Is it easy to find a low-rank matrix that rounds to a given binary matrix? Does it matter which threshold we use for rounding? Does it matter if we allow for only non-negative factorizations? In this paper, we approach these and further questions by presenting and studying the concept of rounding rank. We show that rounding rank is related to linear classification, dimensionality reduction, and nested matrices. We also report on an extensive experimental study that compares different algorithms for finding good factorizations under the rounding rank model

arXiv.org e-Print Archive

Crossref

MAnnheim DOCument Server

MPG.PuRe

Banded structure in binary matrices

Author: Garriga Gemma
Junttila Esa
Mannila Heikki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/07/2011
Field of study

International audienceA binary matrix has a banded structure if both rows and columns can be permuted so that the non-zero entries exhibit a staircase pattern of overlapping rows. The concept of banded matrices has its origins in numerical analysis, where entries can be viewed as descriptions between the problem variables; the bandedness corresponds to variables that are coupled over short distances. Banded data occurs also in other applications, for example in the physical mapping problem of the human genome, in paleontological data, in network data and in the discovery of overlapping communities without cycles. We study the banded structure of binary matrices, give a formal definition of the concept and discuss its theoretical properties. We consider the algorithmic problems of computing how far a matrix is from being banded, and of ﬁnding a good submatrix of the original data that exhibits approximate bandedness. Finally, we show by experiments on real data from ecology and other applications the usefulness of the concept. Our results reveal that bands exist in real datasets and that the ﬁnal obtained orderings of rows and columns have natural interpretations

HAL - Lille 3

INRIA a CCSD electronic archive server

Extending data mining techniques for frequent pattern discovery : trees, low-entropy sets, and crossmining

Author: Heikinheimo Hannes
Publication venue: Aalto-yliopiston teknillinen korkeakoulu
Publication date: 01/01/2010
Field of study

The idea of frequent pattern discovery is to find frequently occurring events in large databases. Such data mining techniques can be useful in various domains. For instance, in recommendation and e-commerce systems frequently occurring product purchase combinations are essential in user preference modeling. In the ecological domain, patterns of frequently occurring groups of species can be used to reveal insight into species interaction dynamics. Over the past few years, most frequent pattern mining research has concentrated on efficiency (speed) of mining algorithms. However, it has been argued within the community that while efficiency of the mining task is no longer a bottleneck, there is still an urgent need for methods that derive compact, yet high quality results with good application properties. The aim of this thesis is to address this need. The first part of the thesis discusses a new type of tree pattern class for expressing hierarchies of general and more specific attributes in unstructured binary data. The new pattern class is shown to have advantageous properties, and to discover relationships in data that cannot be expressed alone with the more traditional frequent itemset or association rule patterns. The second and third parts of the thesis discuss the use of entropy as a score measure for frequent pattern mining. A new pattern class is defined, low-entropy sets, which allow to express more general types of occurrence structure than with frequent itemsets. The concept can also be easily applied to tree types of pattern. Furthermore, by applying minimum description length in pattern selection for low-entropy sets it is shown experimentally that in most cases the collections of selected patterns are much smaller than by using frequent itemsets. The fourth part of the thesis examines the idea of crossmining itemsets, that is, relating itemsets to numerical variables in a database of mixed data types. The problem is formally defined and turns out to be NP-hard, although it is approximately solvable within a constant-factor of the optimum solution. Experiments show that the algorithm finds itemsets that convey structure in both the binary and the numerical part of the data

Aaltodoc Publication Archive

Complex systems in financial economics: Applications to interbank and stock markets

Author: in 't Veld D.L.
Publication venue
Publication date: 01/01/2014
Field of study

Complex systems are characterised by strong interaction at the micro level that can induce large changes at the macro level. This thesis applies the theory of complex systems to the interbank market (Part I) and the stock market (Part II). Evidence found in data from the Netherlands and the US makes clear in what sense these markets are complex systems. The observed phenomena are explained by modelling the adaptive behaviour of financial agents, for example how they form their trading relationships or how they choose investment strategies. The applications help to understand the mechanisms behind the emergence of the financial-economic crisis in 2007 and 2008, and relate to the debate on policy measures aiming to prevent a future crisis of this kind

International Migration, Integration and Social Cohesion online publications

UvA-DARE