80 research outputs found

    A novel Boolean kernels family for categorical data

    Get PDF
    Kernel based classifiers, such as SVM, are considered state-of-the-art algorithms and are widely used on many classification tasks. However, this kind of methods are hardly interpretable and for this reason they are often considered as black-box models. In this paper, we propose a new family of Boolean kernels for categorical data where features correspond to propositional formulas applied to the input variables. The idea is to create human-readable features to ease the extraction of interpretation rules directly from the embedding space. Experiments on artificial and benchmark datasets show the effectiveness of the proposed family of kernels with respect to established ones, such as RBF, in terms of classification accuracy

    Definition and learning of logic-based kernels for categorical data, and application to collaborative filtering

    Get PDF
    The continuous pursuit of better prediction quality has gradually led to the development of increasingly complex machine learning models, e.g., deep neural networks. Despite the great success in many domains, the black-box nature of these models makes them not suitable for applications in which the model understanding is at least as important as the prediction accuracy, such as medical applications. On the other hand, more interpretable models, as decision trees, are in general much less accurate. In this thesis, we try to merge the positive aspects of these two realities, by injecting interpretable elements inside complex methods. We focus on kernel methods which have an elegant framework that decouples learning algorithms from data representations. In particular, the first main contribution of this thesis is the proposal of a new family of Boolean kernels, i.e., kernels defined on binary data, with the aim of creating interpretable feature spaces. Assuming binary input vectors, the core idea is to build embedding spaces in which the dimensions represent logical formulas (of a specific form) of the input variables. As a result the solution of a kernel machine can be represented as a weighted sum of logical propositions, and this allows to extract from it human-readable rules. Our framework provides a constructive and efficient way to calculate Boolean kernels of different forms (e.g., disjunctive, conjunctive, DNF, CNF). We show that on binary classification tasks over categorical datasets the proposed kernels achieve state-of-the-art performances. We also provide some theoretical properties about the expressiveness of such kernels. The second main contribution consists in the development of a new multiple kernel learning algorithm to automatically learn the best representation (avoiding the validation). We start from a theoretical result which states that, under mild conditions, any dot-product kernel can be seen as a linear non-negative combination of Boolean conjunctive kernels. Then, from this combination, our MKL algorithm learns non-parametrically the best combination of the conjunctive kernels. This algorithm is designed to optimize the radius-margin ratio of the combined kernel, which has been demonstrated of being an upper bound of the Leave-One-Out error. An extensive empirical evaluation, on several binary classification tasks, shows how our MKL technique is able to outperform state-of-the-art MKL approaches. A third contribution is the proposal of another kernel family for binary input data, which aims to overcome the limitations of the Boolean kernels. In this case the focus is not exclusively on the interpretability, but also on the expressivity. With this new framework, that we dubbed propositional kernel framework, is possible to build kernel functions able to create feature spaces containing almost any kind of logical propositions. Finally, the last contribution is the application of the Boolean kernels to Recommender Systems, specifically, on top-N recommendation tasks. First of all, we propose a novel kernel-based collaborative filtering method and we apply on top of it our Boolean kernels. Empirical results on several collaborative filtering datasets show how less expressive kernels can alleviate the sparsity issue, which is peculiar in this kind of applications

    FCAIR 2012 Formal Concept Analysis Meets Information Retrieval Workshop co-located with the 35th European Conference on Information Retrieval (ECIR 2013) March 24, 2013, Moscow, Russia

    Get PDF
    International audienceFormal Concept Analysis (FCA) is a mathematically well-founded theory aimed at data analysis and classifiation. The area came into being in the early 1980s and has since then spawned over 10000 scientific publications and a variety of practically deployed tools. FCA allows one to build from a data table with objects in rows and attributes in columns a taxonomic data structure called concept lattice, which can be used for many purposes, especially for Knowledge Discovery and Information Retrieval. The Formal Concept Analysis Meets Information Retrieval (FCAIR) workshop collocated with the 35th European Conference on Information Retrieval (ECIR 2013) was intended, on the one hand, to attract researchers from FCA community to a broad discussion of FCA-based research on information retrieval, and, on the other hand, to promote ideas, models, and methods of FCA in the community of Information Retrieval

    Latent Semantic Indexing (LSI) Based Distributed System and Search On Encrypted Data

    Get PDF
    Latent semantic indexing (LSI) was initially introduced to overcome the issues of synonymy and polysemy of the traditional vector space model (VSM). LSI, however, has challenges of its own, mainly scalability. Despite being introduced in 1990, there are few attempts that provide an efficient solution for LSI, most of the literature is focuses on LSI’s applications rather than improving the original algorithm. In this work we analyze the first framework to provide scalable implementation of LSI and report its performance on the distributed environment of RAAD. The possibility of adopting LSI in the field of searching over encrypted data is also investigated. The importance of that field is stemmed from the need for cloud computing as an effective computing paradigm that provides an affordable access to high computational power. Encryption is usually applied to prevent unauthorized access to the data (the host is assumed to be curious), however this limits accessibility to the data given that search over encryption is yet to catch with the latest techniques adopted by the Information Retrieval (IR) community. In this work we propose a system that uses LSI for indexing and free-query text for retrieving. The results show that the available LSI framework does scale on large datasets, however it had some limitations with respect to factors like dictionary size and memory limit. When replicating the exact settings of the baseline on RAAD, it performed relatively slower. This could be resulted by the fact that RAAD uses a distributed file system or because of network latency. The results also show that the proposed system for applying LSI on encrypted data retrieved documents in the same order as the baseline (unencrypted data)

    Paving the Way for Lifelong Learning:Facilitating competence development through a learning path specification

    Get PDF
    Janssen, J. (2010). Paving the Way for Lifelong Learning. Facilitating competence development through a learning path specification. September, 17, 2010, Heerlen, The Netherlands: Open University of the Netherlands, CELSTEC. SIKS Dissertation Series No. 2010-36. ISBN 978-90-79447-43-5Efficient and effective lifelong learning requires that learners can make well informed decisions regarding the selection of a learning path, i.e. a set of learning actions that help attain particular learning goals. In recent decades a strong emphasis on lifelong learning has led educational provision to expand and to become more varied and flexible. Besides, the role of informal learning has become increasingly acknowledged. In light of these developments this thesis addresses the question: How to support learners in finding their way through all available options and selecting a learning path that best fit their needs? The thesis describes two different approaches regarding the provision of way finding support, which can be considered complementary. The first, inductive approach proposes to provide recommendations based on indirect social interaction: analysing the paths followed by other learners and feeding this information back as advice to learners facing navigational decisions. The second, prescriptive approach proposes to use a learning path specification to describe both the contents and the structure of any learning path in a formal and uniform way. This facilitates comparison and selection of learning paths across institutions and systems, but also enables automated provision of way finding support for a chosen learning path. Moreover, it facilitates automated personalisation of a learning path, i.e. adapting the learning path to the needs of a particular learner. Following the first approach a recommender system was developed and tested in an experimental setting. Results showed use of the system significantly enhanced effectiveness of learning. In line with the second approach a learning path specification was developed and validated in three successive evaluations. Firstly, an investigation of lifelong learners’ information needs. Secondly, an evaluation of the specification through a reference (sample) implementation: a tool to describe learning paths according to the specification. Finally, an evaluation of the use and purpose of this tool involving prospective end-users: study advisors and learning designers. Following the various evaluations the Learning Path Specification underwent some changes over time. Results described in this thesis show that the proposed approach of the Learning Path Specification and the reference implementation were well received by end-users.The work on this publication has been sponsored by the TENCompetence Integrated Project that is funded by the European Commission's 6th Framework Programme, priority IST/Technology Enhanced Learning. Contract 027087 [http://www.tencompetence.org
    • …
    corecore