154 research outputs found
Revisiting Numerical Pattern Mining with Formal Concept Analysis
In this paper, we investigate the problem of mining numerical data in the
framework of Formal Concept Analysis. The usual way is to use a scaling
procedure --transforming numerical attributes into binary ones-- leading either
to a loss of information or of efficiency, in particular w.r.t. the volume of
extracted patterns. By contrast, we propose to directly work on numerical data
in a more precise and efficient way, and we prove it. For that, the notions of
closed patterns, generators and equivalent classes are revisited in the
numerical context. Moreover, two original algorithms are proposed and used in
an evaluation involving real-world data, showing the predominance of the
present approach
Characterization of order-like dependencies with formal concept analysis
Functional Dependencies (FDs) play a key role in many fields
of the relational database model, one of the most widely used database
systems. FDs have also been applied in data analysis, data quality, knowl-
edge discovery and the like, but in a very limited scope, because of their
fixed semantics. To overcome this limitation, many generalizations have
been defined to relax the crisp definition of FDs. FDs and a few of their
generalizations have been characterized with Formal Concept Analysis
which reveals itself to be an interesting unified framework for charac-
terizing dependencies, that is, understanding and computing them in a
formal way. In this paper, we extend this work by taking into account
order-like dependencies. Such dependencies, well defined in the database
field, consider an ordering on the domain of each attribute, and not sim-
ply an equality relation as with standard FDs.Peer ReviewedPostprint (published version
Identifying Avatar Aliases in Starcraft 2
In electronic sports, cyberathletes conceal their online training using
different avatars (virtual identities), allowing them not being recognized by
the opponents they may face in future competitions. In this article, we propose
a method to tackle this avatar aliases identification problem. Our method
trains a classifier on behavioural data and processes the confusion matrix to
output label pairs which concentrate confusion. We experimented with Starcraft
2 and report our first results.Comment: Machine Learning and Data Mining for Sports Analytics ECML/PKDD 2015
workshop, 11 September 2015, Porto, Portuga
The Coron System
Coron is a domain and platform independent, multi-purposed data mining
toolkit, which incorporates not only a rich collection of data mining
algorithms, but also allows a number of auxiliary operations. To the best of
our knowledge, a data mining toolkit designed specifically for itemset
extraction and association rule generation like Coron does not exist elsewhere.
Coron also provides support for preparing and filtering data, and for
interpreting the extracted units of knowledge
Anytime Subgroup Discovery in Numerical Domains with Guarantees
International audienceSubgroup discovery is the task of discovering patterns that accurately discriminate a class label from the others. Existing approaches can uncover such patterns either through an exhaustive or an approximate exploration of the pattern search space. However, an exhaustive exploration is generally unfeasible whereas approximate approaches do not provide guarantees bounding the error of the best pattern quality nor the exploration progression ("How far are we of an exhaustive search"). We design here an algorithm for mining numerical data with three key properties w.r.t. the state of the art: (i) It yields progressively interval patterns whose quality improves over time; (ii) It can be interrupted anytime and always gives a guarantee bounding the error on the top pattern quality and (iii) It always bounds a distance to the exhaustive exploration. After reporting experimentations showing the effectiveness of our method, we discuss its generalization to other kinds of patterns
Mining Biclusters of Similar Values with Triadic Concept Analysis
Biclustering numerical data became a popular data-mining task in the
beginning of 2000's, especially for analysing gene expression data. A bicluster
reflects a strong association between a subset of objects and a subset of
attributes in a numerical object/attribute data-table. So called biclusters of
similar values can be thought as maximal sub-tables with close values. Only few
methods address a complete, correct and non redundant enumeration of such
patterns, which is a well-known intractable problem, while no formal framework
exists. In this paper, we introduce important links between biclustering and
formal concept analysis. More specifically, we originally show that Triadic
Concept Analysis (TCA), provides a nice mathematical framework for
biclustering. Interestingly, existing algorithms of TCA, that usually apply on
binary data, can be used (directly or with slight modifications) after a
preprocessing step for extracting maximal biclusters of similar values.Comment: Concept Lattices and their Applications (CLA) (2011
On-Premise AIOps Infrastructure for a Software Editor SME: An Experience Report
Information Technology has become a critical component in various industries,
leading to an increased focus on software maintenance and monitoring. With the
complexities of modern software systems, traditional maintenance approaches
have become insufficient. The concept of AIOps has emerged to enhance
predictive maintenance using Big Data and Machine Learning capabilities.
However, exploiting AIOps requires addressing several challenges related to the
complexity of data and incident management. Commercial solutions exist, but
they may not be suitable for certain companies due to high costs, data
governance issues, and limitations in covering private software. This paper
investigates the feasibility of implementing on-premise AIOps solutions by
leveraging open-source tools. We introduce a comprehensive AIOps infrastructure
that we have successfully deployed in our company, and we provide the rationale
behind different choices that we made to build its various components.
Particularly, we provide insights into our approach and criteria for selecting
a data management system and we explain its integration. Our experience can be
beneficial for companies seeking to internally manage their software
maintenance processes with a modern AIOps approach
AIOps Solutions for Incident Management: Technical Guidelines and A Comprehensive Literature Review
The management of modern IT systems poses unique challenges, necessitating
scalability, reliability, and efficiency in handling extensive data streams.
Traditional methods, reliant on manual tasks and rule-based approaches, prove
inefficient for the substantial data volumes and alerts generated by IT
systems. Artificial Intelligence for Operating Systems (AIOps) has emerged as a
solution, leveraging advanced analytics like machine learning and big data to
enhance incident management. AIOps detects and predicts incidents, identifies
root causes, and automates healing actions, improving quality and reducing
operational costs. However, despite its potential, the AIOps domain is still in
its early stages, decentralized across multiple sectors, and lacking
standardized conventions. Research and industrial contributions are distributed
without consistent frameworks for data management, target problems,
implementation details, requirements, and capabilities. This study proposes an
AIOps terminology and taxonomy, establishing a structured incident management
procedure and providing guidelines for constructing an AIOps framework. The
research also categorizes contributions based on criteria such as incident
management tasks, application areas, data sources, and technical approaches.
The goal is to provide a comprehensive review of technical and research aspects
in AIOps for incident management, aiming to structure knowledge, identify gaps,
and establish a foundation for future developments in the field
Computing Functional Dependencies with Pattern Structures
The treatment of many-valued data with FCA has been achieved by means of scaling. This method has some drawbacks, since the size of the resulting formal contexts depends usually on the number of di erent values that are present in a table, which can be very large.
Pattern structures have been proved to deal with many-valued data, offering a viable and sound alternative to scaling in order to represent and analyze sets of many-valued data with FCA.
Functional dependencies have already been dealt with FCA using the binarization of a table, that is, creating a formal context out of a set of data. Unfortunately, although this method is standard and simple, it has an important drawback, which is the fact that the resulting context is
quadratic in number of objects w.r.t. the original set of data.
In this paper, we examine how we can extract the functional dependencies that hold in a set of data using pattern structures. This allows to build an equivalent concept lattice avoiding the step of binarization, and thus comes with better concept representation and computation.Postprint (published version
SeqScout: Using a Bandit Model to Discover Interesting Subgroups in Labeled Sequences
International audienceIt is extremely useful to exploit labeled datasets not only to learn models but also to improve our understanding of a domain and its available targeted classes. The so-called subgroup discovery task has been considered for a long time. It concerns the discovery of patterns or descriptions, the set of supporting objects of which have interesting properties, e.g., they characterize or discriminate a given target class. Though many subgroup discovery algorithms have been proposed for transactional data, discovering subgroups within labeled sequential data and thus searching for descriptions as sequential patterns has been much less studied. In that context, exhaustive exploration strategies can not be used for real-life applications and we have to look for heuristic approaches. We propose the algorithm SeqScout to discover interesting subgroups (w.r.t. a chosen quality measure) from labeled sequences of itemsets. This is a new sampling algorithm that mines discriminant sequential patterns using a multi-armed bandit model. It is an anytime algorithm that, for a given budget, finds a collection of local optima in the search space of descriptions and thus subgroups. It requires a light configuration and it is independent from the quality measure used for pattern scoring. Furthermore, it is fairly simple to implement. We provide qualitative and quantitative experiments on several datasets to illustrate its added-value
- …