17 research outputs found
Knowledge discovery through creating formal contexts
Knowledge discovery is important for systems
that have computational intelligence in helping them learn
and adapt to changing environments. By representing, in
a formal way, the context in which an intelligent system
operates, it is possible to discover knowledge through an
emerging data technology called Formal Concept Analysis
(FCA). This paper describes a tool called FcaBedrock that
converts data into Formal Contexts for FCA. The paper
describes how, through a process of guided automation,
data preparation techniques such as attribute exclusion and
value restriction allow data to be interpreted to meet the requirements
of the analysis. Creating Formal Contexts using
FcaBedrock is shown to be straightforward and versatile.
Large data sets are easily converted into a standard FCA
format
Knowledge discovery through creating formal contexts
Knowledge discovery is important for systems that have computational intelligence in helping them learn and adapt to changing environments. By representing, in a formal way, the context in which an intelligent system operates, it is possible to discover knowledge through an emerging data technology called formal concept analysis (FCA). This paper describes a tool called FcaBedrock that converts data into formal contexts for FCA. This paper describes how, through a process of guided automation, data preparation techniques such as attribute exclusion and value restriction allow data to be interpreted to meet the requirements of the analysis. Examples are given of how formal contexts can be created using FcaBedrock and then analysed for knowledge discovery, using real datasets. Creating formal contexts using FcaBedrock is shown to be straightforward and versatile. Large datasets are easily converted into a standard FCA format
A conceptual approach to gene expression analysis enhanced by visual analytics
The analysis of gene expression data is a complex task for biologists wishing to understand the role of genes in the formation of diseases such as cancer. Biologists need greater support when trying to discover, and comprehend, new relationships within their data. In this paper, we describe an approach to the analysis of gene expression data where overlapping groupings are generated by Formal Concept Analysis and interactively analyzed in a tool called CUBIST. The CUBIST workflow involves querying a semantic database and converting the result into a formal context, which can be simplified to make it manageable, before it is visualized as a concept lattice and associated charts
Exploring the applicability of formal concept analysis on Market Intelligence data
This paper examines and identifies issues associated with the applicability of FCA on sample data provided by a CUBIST use-case partner. The paper explains the various steps related to the transformation of these data to formal contexts, such as preprocessing, cleansing and simplification, as well as preprocessing and limitation issues, by us-ing two FCA tools currently being developed in CUBIST, FcaBedrock and InClose. The paper demonstrates what is achievable to date, using the above-mentioned tools and what issues need to be considered to achieve more meaningful and intuitive FCA analyses. The paper con-
cludes by suggesting and explaining techniques and features that should be implemented in later iterations of these tools, to deal with the identified barriers. This work has been carried out as a part of the European
CUBIST FP7 Project: http://www.cubist-project.e
Appropriating Data from Structured Sources for Formal Concept Analysis
Formal Concept Analysis (FCA) is a principled way of deriving a concept hierarchy from a collection of objects and their associated attributes, building on the mathematical theory of lattices and ordered sets.
To conduct FCA, some appropriation steps need to be taken on a set of data as a prerequisite. Firstly, the data need to be acquired from a data source and decisions need to be made about how the data will be analyzed, such as
which objects will be included in the analysis, and how each attribute should be interpreted. They then need to be transformed into a formal context, which can then be visualized as a formal concept lattice.
Transforming a formal context into its constituent formal concepts is a process which is well defined and well understood in literature. The same holds true for converting formal concepts into a formal concept lattice.
On the other hand, the process of appropriating a dataset into a formal context tends to be an ad-hoc and bespoke one. ToscanaJ can produce formal contexts from a relational database, while ConExp can produce simple formal
contexts in a manual fashion. In the CUBIST project, Dau developed a semi-automated, scalingless approach to generate formal contexts out of a triple store by concatenating the object-attribute pairs returned from the resulting
table into their corresponding formal attributes, while Orphanides developed an approach that also provided scaling capabilities, albeit again relying on triple store data. Cubix, the final prototype of the CUBIST project, incorporated the approaches of Dau and Orphanides in an interactive web frontend.
FcaBedrock is an Open-source software (OSS) developed as part of this study, employing a series of steps to appropriate data for FCA in a semi-automated, user-driven environment. To underpin this work, we take a case
study approach, using two case studies in particular: the UCI Machine Learning (ML) Repository—a dataset repository for the empirical analysis of machine learning algorithms—and the e-Mouse Atlas of Gene Expression (EMAGE), a
database of anatomical terms for each Theiler Stage (TS) in mouse development. We compare our approach with existing approaches, using datasets from the two case studies. The appropriation of the datasets become an integral part of
our evaluation, providing the real-life context and use-cases of our proposed approach.
The UCI ML Repository and EMAGE case studies revealed how prior to this study, a multitude of existing data sources, types and formats, were either not accessible, or not easily accessible to FCA; the data appropriation processes were in most cases tedious and time-consuming, often requiring the
manual creation of formal contexts out of datasets. In other cases, rigid, non-flexible approaches were developed, with hardcoded assumptions made about the underlying use-case they were developed for. This is unlike the software
and techniques developed in this study, where the same semi-automated steps can consistently facilitate the appropriation of data for FCA, from the most common data sources and their underlying data types.
The aim of this study was to discover how effective each FCA approach is in appropriating data for FCA, answering the research question: “How can data from structured sources, consisting of various data types, be acquired
and appropriated for FCA in a semi-automated, user-driven environment?”. FcaBedrock emerged as the best appropriation approach, by abstracting the issue of structured sources away using the ubiquitous CSV (Comma Separated Value) file format as an input source and by providing both automated and semi-automated means of creating meaningful formal contexts from a dataset. Dau’s CUBIST scalingless approach, while semi-automated, was restricted to RDF triple stores and did not provide any flexibility as to how each attribute
of the dataset should be interpreted. Orphanides’s CUBIST scaleful approach, while providing more flexibility with its scaling capabilities, was again restricted to RDF triple stores. The CUBIST interactive approach improved upon those
ideas, by allowing the user to drive the analysis via a user-friendly web frontend. ToscanaJ was restricted to only using SQL/NoSQL databases as an input source, while both ToscanaJ and ConExp provided no substantial appropriation
techniques other than creating a formal context by hand
Discovering Knowledge in Data Using Formal Concept Analysis
Formal Concept Analysis (FCA) has been successfully ap-
plied to data in a number of problem domains. However, its use has tended to be on an ad hoc, bespoke basis, relying on FCA experts working closely with domain experts and requiring the production of specialised FCA software for the data analysis. The availability of generalised tools
and techniques, that might allow FCA to be applied to data more widely,is limited. Two important issues provide barriers: raw data is not normally in a form suitable for FCA and requires undergoing a process of transformation to make it suitable, and even when converted into a suit-
able form for FCA, real data sets tend to produce a large number of results that can be difficult to manage and interpret.
This article describes how some open-source tools and techniques have been developed and used to address these issues and make FCA more widely available and applicable. Three examples of real data sets, and real problems related to them, are used to illustrate the application of the tools and techniques and demonstrate how FCA can be used as a semantic technology to discover knowledge. Furthermore, it is shown how these tools and techniques enable FCA to deliver a visual and intuitive means of mining large data sets for association and implication rules that complements the semantic analysis. In fact, it transpires that FCA
reveals hidden meaning in data that can then be examined in more detail using an FCA approach to traditional data mining methods
FCAWarehouse, a prototype online data repository for FCA
This paper presents FCAWarehouse, a prototype online data repository for FCA. The paper explains the motivation behind the development of FCAWarehouse and the features available, such as the ability to donate datasets and their respective formal contexts, the ability to generate artificial formal contexts on-the-fly, and how these features are also available through a set of web services. The paper concludes by suggesting future work in order to enhance it's usability