10 research outputs found

    Towards Role Based Hypothesis Evaluation for Health Data Mining

    Get PDF
    Data mining researchers have long been concerned with the application of tools to facilitate and improve data analysis on large, complex data sets. The current challenge is to make data mining and knowledge discovery systems applicable to a wider range of domains, among them health. Early work was performed over transactional, retail based data sets, but the attraction of finding previously unknown knowledge from the ever increasing amounts of data collected from the health domain is an emerging area of interest and specialisation. The problem is finding a solution that is suitably flexible to allow for generalised application whilst being specific enough to provide functionality that caters for the nuances of each role within the domain. The need for a more granular approach to problem solving in other areas of information technology has resulted in the use of role based solutions. This paper discusses the progress to date in developing a role oriented solution to the problem of providing for the diverse requirements of health domain data miners and defining the foundation for determining what constitutes an interesting discovery in an area as complex as health

    Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

    Get PDF
    BACKGROUND: Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. METHODS: The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. RESULTS: We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. CONCLUSION: The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of knowledge, thus reducing dependence on the knowledge elicited from human expert, which is usually a rate-limiting step

    FAIRY: A Framework for Understanding Relationships between Users' Actions and their Social Feeds

    Full text link
    Users increasingly rely on social media feeds for consuming daily information. The items in a feed, such as news, questions, songs, etc., usually result from the complex interplay of a user's social contacts, her interests and her actions on the platform. The relationship of the user's own behavior and the received feed is often puzzling, and many users would like to have a clear explanation on why certain items were shown to them. Transparency and explainability are key concerns in the modern world of cognitive overload, filter bubbles, user tracking, and privacy risks. This paper presents FAIRY, a framework that systematically discovers, ranks, and explains relationships between users' actions and items in their social media feeds. We model the user's local neighborhood on the platform as an interaction graph, a form of heterogeneous information network constructed solely from information that is easily accessible to the concerned user. We posit that paths in this interaction graph connecting the user and her feed items can act as pertinent explanations for the user. These paths are scored with a learning-to-rank model that captures relevance and surprisal. User studies on two social platforms demonstrate the practical viability and user benefits of the FAIRY method.Comment: WSDM 201

    FAIRY: A Framework for Understanding Relationships between Users' Actions and their Social Feeds

    Get PDF
    Users increasingly rely on social media feeds for consuming daily information. The items in a feed, such as news, questions, songs, etc., usually result from the complex interplay of a user's social contacts, her interests and her actions on the platform. The relationship of the user's own behavior and the received feed is often puzzling, and many users would like to have a clear explanation on why certain items were shown to them. Transparency and explainability are key concerns in the modern world of cognitive overload, filter bubbles, user tracking, and privacy risks. This paper presents FAIRY, a framework that systematically discovers, ranks, and explains relationships between users' actions and items in their social media feeds. We model the user's local neighborhood on the platform as an interaction graph, a form of heterogeneous information network constructed solely from information that is easily accessible to the concerned user. We posit that paths in this interaction graph connecting the user and her feed items can act as pertinent explanations for the user. These paths are scored with a learning-to-rank model that captures relevance and surprisal. User studies on two social platforms demonstrate the practical viability and user benefits of the FAIRY method

    Enhancing explainability and scrutability of recommender systems

    Get PDF
    Our increasing reliance on complex algorithms for recommendations calls for models and methods for explainable, scrutable, and trustworthy AI. While explainability is required for understanding the relationships between model inputs and outputs, a scrutable system allows us to modify its behavior as desired. These properties help bridge the gap between our expectations and the algorithm’s behavior and accordingly boost our trust in AI. Aiming to cope with information overload, recommender systems play a crucial role in filtering content (such as products, news, songs, and movies) and shaping a personalized experience for their users. Consequently, there has been a growing demand from the information consumers to receive proper explanations for their personalized recommendations. These explanations aim at helping users understand why certain items are recommended to them and how their previous inputs to the system relate to the generation of such recommendations. Besides, in the event of receiving undesirable content, explanations could possibly contain valuable information as to how the system’s behavior can be modified accordingly. In this thesis, we present our contributions towards explainability and scrutability of recommender systems: • We introduce a user-centric framework, FAIRY, for discovering and ranking post-hoc explanations for the social feeds generated by black-box platforms. These explanations reveal relationships between users’ profiles and their feed items and are extracted from the local interaction graphs of users. FAIRY employs a learning-to-rank (LTR) method to score candidate explanations based on their relevance and surprisal. • We propose a method, PRINCE, to facilitate provider-side explainability in graph-based recommender systems that use personalized PageRank at their core. PRINCE explanations are comprehensible for users, because they present subsets of the user’s prior actions responsible for the received recommendations. PRINCE operates in a counterfactual setup and builds on a polynomial-time algorithm for finding the smallest counterfactual explanations. • We propose a human-in-the-loop framework, ELIXIR, for enhancing scrutability and subsequently the recommendation models by leveraging user feedback on explanations. ELIXIR enables recommender systems to collect user feedback on pairs of recommendations and explanations. The feedback is incorporated into the model by imposing a soft constraint for learning user-specific item representations. We evaluate all proposed models and methods with real user studies and demonstrate their benefits at achieving explainability and scrutability in recommender systems.Unsere zunehmende Abhängigkeit von komplexen Algorithmen für maschinelle Empfehlungen erfordert Modelle und Methoden für erklärbare, nachvollziehbare und vertrauenswürdige KI. Zum Verstehen der Beziehungen zwischen Modellein- und ausgaben muss KI erklärbar sein. Möchten wir das Verhalten des Systems hingegen nach unseren Vorstellungen ändern, muss dessen Entscheidungsprozess nachvollziehbar sein. Erklärbarkeit und Nachvollziehbarkeit von KI helfen uns dabei, die Lücke zwischen dem von uns erwarteten und dem tatsächlichen Verhalten der Algorithmen zu schließen und unser Vertrauen in KI-Systeme entsprechend zu stärken. Um ein Übermaß an Informationen zu verhindern, spielen Empfehlungsdienste eine entscheidende Rolle um Inhalte (z.B. Produkten, Nachrichten, Musik und Filmen) zu filtern und deren Benutzern eine personalisierte Erfahrung zu bieten. Infolgedessen erheben immer mehr In- formationskonsumenten Anspruch auf angemessene Erklärungen für deren personalisierte Empfehlungen. Diese Erklärungen sollen den Benutzern helfen zu verstehen, warum ihnen bestimmte Dinge empfohlen wurden und wie sich ihre früheren Eingaben in das System auf die Generierung solcher Empfehlungen auswirken. Außerdem können Erklärungen für den Fall, dass unerwünschte Inhalte empfohlen werden, wertvolle Informationen darüber enthalten, wie das Verhalten des Systems entsprechend geändert werden kann. In dieser Dissertation stellen wir unsere Beiträge zu Erklärbarkeit und Nachvollziehbarkeit von Empfehlungsdiensten vor. • Mit FAIRY stellen wir ein benutzerzentriertes Framework vor, mit dem post-hoc Erklärungen für die von Black-Box-Plattformen generierten sozialen Feeds entdeckt und bewertet werden können. Diese Erklärungen zeigen Beziehungen zwischen Benutzerprofilen und deren Feeds auf und werden aus den lokalen Interaktionsgraphen der Benutzer extrahiert. FAIRY verwendet eine LTR-Methode (Learning-to-Rank), um die Erklärungen anhand ihrer Relevanz und ihres Grads unerwarteter Empfehlungen zu bewerten. • Mit der PRINCE-Methode erleichtern wir das anbieterseitige Generieren von Erklärungen für PageRank-basierte Empfehlungsdienste. PRINCE-Erklärungen sind für Benutzer verständlich, da sie Teilmengen früherer Nutzerinteraktionen darstellen, die für die erhaltenen Empfehlungen verantwortlich sind. PRINCE-Erklärungen sind somit kausaler Natur und werden von einem Algorithmus mit polynomieller Laufzeit erzeugt , um präzise Erklärungen zu finden. • Wir präsentieren ein Human-in-the-Loop-Framework, ELIXIR, um die Nachvollziehbarkeit der Empfehlungsmodelle und die Qualität der Empfehlungen zu verbessern. Mit ELIXIR können Empfehlungsdienste Benutzerfeedback zu Empfehlungen und Erklärungen sammeln. Das Feedback wird in das Modell einbezogen, indem benutzerspezifischer Einbettungen von Objekten gelernt werden. Wir evaluieren alle Modelle und Methoden in Benutzerstudien und demonstrieren ihren Nutzen hinsichtlich Erklärbarkeit und Nachvollziehbarkeit von Empfehlungsdiensten

    Heuristic Measures of Interestingness

    No full text
    The tuples in a generalized relation (i.e., a summary generated from a database) are unique, and therefore, can be considered to be a population with a structure that can be described by some probability distribution. In this paper, we present and empirically compare sixteen heuristic measures that evaluate the structure of a summary to assign a single real-valued index that represents its interestingness relative to other summaries generated from the same database. The heuristics are based upon well-known measures of diversity, dispersion, dominance, and inequality used in several areas of the physical, social, ecological, management, information, and computer sciences. Their use for ranking summaries generated from databases is a new application area. All sixteen heuristics rank less complex summaries (i.e., those with few tuples and/or few non-ANY attributes) as most interesting. We demonstrate that for sample data sets, the order in which some of the measures rank summaries is highly correlated

    Evaluation of Interestingness Measures for Ranking Discovered Knowledge

    No full text
    When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate thirteen diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The thirteen diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that only four of the thirteen diversity measures satisfy all of the principles. We then analyze the distribution of the index values generated by each of the thirteen diversity measures. Empirical results, obtained using synthetic data, show that the distribution of index values generated tend to be highly skewed about the mean, median, and middle index values. The objective of this work is to gain some insight into the behaviour that can be expected from each of the measures in practice.

    Assessing the Interestingness of Discovered Knowledge Using a Principled Objective Approach

    No full text
    When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate twelve diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The twelve diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that the proposed principles define a partial order on the ranked summaries in most cases, and in some cases, define a total order. Theoretical results also show that seven of the twelve diversity measures satisfy all of the five principles. We empirically analyze the rank order of the summaries as determined by each of the twelve measures. These empirical results show that the measures tend to rank the less complex summaries as most interesting. Finally, we demonstrate a technique, based upon our principles, for visualizing the relative interestingness of summaries
    corecore