19,771 research outputs found

    Partial orders and logical concept analysis to explore patterns extracted by data mining

    Get PDF
    International audienceData mining techniques are used in order to discover emerging knowledge (patterns) in databases. The problem of such techniques is that there are, in general, too many resulting patterns for a user to explore them all by hand. Some methods try to reduce the number of patterns without a priori pruning. The number of patterns remains, nevertheless, high. Other approaches, based on a total ranking, propose to show to the user the top-k patterns with respect to a measure. Those methods do not take into account the user's knowledge and the dependencies that exist between patterns. In this paper, we propose a new way for the user to explore extracted patterns. The method is based on navigation in a partial order over the set of all patterns in the Logical Concept Analysis framework. It accommodates several kinds of patterns and the dependencies between patterns are taken into account thanks to partial orders. It allows the user to use his/her background knowledge to navigate through the partial order, without a priori pruning. We illustrate how our method can be applied on two different tasks (software engineering and natural language processing) and two different kinds of patterns (association rules and sequential patterns)

    RCA-Seq: an Original Approach for Enhancing the Analysis of Sequential Data Based on Hierarchies of Multilevel Closed Partially-Ordered Patterns

    Get PDF
    International audienceMethods for analysing sequential data generally produce a huge number of sequential patterns that have then to be evaluated and interpreted by domain experts. To diminish this number and thus the difficulty of the interpretation task, methods that directly extract a more compact representation of sequential patterns, namely closed partially-ordered patterns (CPO-patterns), were introduced. In spite of the fewer number of obtained CPO-patterns, their analysis is still a challenging task for experts since they are unorgan-ised and besides, do not provide a global view of the discovered regularities. To address these problems, we present and formalise an original approach within the framework of Relational Concept Analysis (RCA), referred to as RCA-Seq, that focuses on facilitating the interpretation task of experts. The hierarchical RCA result allows to directly obtain and organize the relationships between the extracted CPO-patterns. Moreover, a generalisation order on items is also revealed, and multilevel CPO-patterns are obtained. Therefore, a hierarchy of such CPO-patterns guides the interpretation task, helps experts in better understanding the extracted patterns, and minimises the chance of overlooking interesting CPO-patterns. RCA-Seq is compared with another approach that relies on pattern structures. In addition, we highlight the adaptability of RCA-Seq by integrating a user-defined tax-* onomy over the items, and by considering user-specified constraints on the order relations on itemsets

    Video advertisement mining for predicting revenue using random forest

    Get PDF
    Shaken by the threat of financial crisis in 2008, industries began to work on the topic of predictive analytics to efficiently control inventory levels and minimize revenue risks. In this third-generation age of web-connected data, organizations emphasized the importance of data science and leveraged the data mining techniques for gaining a competitive edge. Consider the features of Web 3.0, where semantic-oriented interaction between humans and computers can offer a tailored service or product to meet consumers\u27 needs by means of learning their preferences. In this study, we concentrate on the area of marketing science to demonstrate the correlation between TV commercial advertisements and sales achievement. Through different data mining and machine-learning methods, this research will come up with one concrete and complete predictive framework to clarify the effects of word of mouth by using open data sources from YouTube. The uniqueness of this predictive model is that we adopt the sentiment analysis as one of our predictors. This research offers a preliminary study on unstructured marketing data for further business use

    Peregrine: A Pattern-Aware Graph Mining System

    Full text link
    Graph mining workloads aim to extract structural properties of a graph by exploring its subgraph structures. General purpose graph mining systems provide a generic runtime to explore subgraph structures of interest with the help of user-defined functions that guide the overall exploration process. However, the state-of-the-art graph mining systems remain largely oblivious to the shape (or pattern) of the subgraphs that they mine. This causes them to: (a) explore unnecessary subgraphs; (b) perform expensive computations on the explored subgraphs; and, (c) hold intermediate partial subgraphs in memory; all of which affect their overall performance. Furthermore, their programming models are often tied to their underlying exploration strategies, which makes it difficult for domain users to express complex mining tasks. In this paper, we develop Peregrine, a pattern-aware graph mining system that directly explores the subgraphs of interest while avoiding exploration of unnecessary subgraphs, and simultaneously bypassing expensive computations throughout the mining process. We design a pattern-based programming model that treats "graph patterns" as first class constructs and enables Peregrine to extract the semantics of patterns, which it uses to guide its exploration. Our evaluation shows that Peregrine outperforms state-of-the-art distributed and single machine graph mining systems, and scales to complex mining tasks on larger graphs, while retaining simplicity and expressivity with its "pattern-first" programming approach.Comment: This is the full version of the paper appearing in the European Conference on Computer Systems (EuroSys), 202

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    User-centered visual analysis using a hybrid reasoning architecture for intensive care units

    Get PDF
    One problem pertaining to Intensive Care Unit information systems is that, in some cases, a very dense display of data can result. To ensure the overview and readability of the increasing volumes of data, some special features are required (e.g., data prioritization, clustering, and selection mechanisms) with the application of analytical methods (e.g., temporal data abstraction, principal component analysis, and detection of events). This paper addresses the problem of improving the integration of the visual and analytical methods applied to medical monitoring systems. We present a knowledge- and machine learning-based approach to support the knowledge discovery process with appropriate analytical and visual methods. Its potential benefit to the development of user interfaces for intelligent monitors that can assist with the detection and explanation of new, potentially threatening medical events. The proposed hybrid reasoning architecture provides an interactive graphical user interface to adjust the parameters of the analytical methods based on the users' task at hand. The action sequences performed on the graphical user interface by the user are consolidated in a dynamic knowledge base with specific hybrid reasoning that integrates symbolic and connectionist approaches. These sequences of expert knowledge acquisition can be very efficient for making easier knowledge emergence during a similar experience and positively impact the monitoring of critical situations. The provided graphical user interface incorporating a user-centered visual analysis is exploited to facilitate the natural and effective representation of clinical information for patient care

    Discovering a taste for the unusual: exceptional models for preference mining

    Get PDF
    Exceptional preferences mining (EPM) is a crossover between two subfields of data mining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where some preference relations between labels significantly deviate from the norm. It is a variant of subgroup discovery, with rankings of labels as the target concept. We employ several quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes exceptional' varies with the quality measure: two measures look for exceptional overall ranking behavior, one measure indicates whether a particular label stands out from the rest, and a fourth measure highlights subgroups with unusual pairwise label ranking behavior. We explore a few datasets and compare with existing techniques. The results confirm that the new task EPM can deliver interesting knowledge.This research has received funding from the ECSEL Joint Undertaking, the framework programme for research and innovation Horizon 2020 (2014-2020) under Grant Agreement Number 662189-MANTIS-2014-1
    corecore