13,557 research outputs found

    Large Graph Analysis in the GMine System

    Full text link
    Current applications have produced graphs on the order of hundreds of thousands of nodes and millions of edges. To take advantage of such graphs, one must be able to find patterns, outliers and communities. These tasks are better performed in an interactive environment, where human expertise can guide the process. For large graphs, though, there are some challenges: the excessive processing requirements are prohibitive, and drawing hundred-thousand nodes results in cluttered images hard to comprehend. To cope with these problems, we propose an innovative framework suited for any kind of tree-like graph visual design. GMine integrates (a) a representation for graphs organized as hierarchies of partitions - the concepts of SuperGraph and Graph-Tree; and (b) a graph summarization methodology - CEPS. Our graph representation deals with the problem of tracing the connection aspects of a graph hierarchy with sub linear complexity, allowing one to grasp the neighborhood of a single node or of a group of nodes in a single click. As a proof of concept, the visual environment of GMine is instantiated as a system in which large graphs can be investigated globally and locally

    Choosing Attribute Weights for Item Dissimilarity using Clikstream Data with an Application to a Product Catalog Map

    Get PDF
    In content- and knowledge-based recommender systems often a measure of (dis)similarity between items is used. Frequently, this measure is based on the attributes of the items. However, which attributes are important for the users of the system remains an important question to answer. In this paper, we present an approach to determine attribute weights in a dissimilarity measure using clickstream data of an e-commerce website. Counted is how many times products are sold and based on this a Poisson regression model is estimated. Estimates of this model are then used to determine the attribute weights in the dissimilarity measure. We show an application of this approach on a product catalog of MP3 players provided by Compare Group, owner of the Dutch price comparison site http://www.vergelijk.nl, and show how the dissimilarity measure can be used to improve 2D product catalog visualizations.dissimilarity measure;attribute weights;clickstream data;comparison

    Blended intelligence of FCA with FLC for knowledge representation from clustered data in medical analysis

    Get PDF
    Formal concept analysis is the process of data analysis mechanism with emergent attractiveness across various fields such as data mining, robotics, medical, big data and so on. FCA is helpful to generate the new learning ontology based techniques. In medical field, some growing kids are facing the problem of representing their knowledge from their gathered prior data which is in the form of unordered and insufficient clustered data which is not supporting them to take the right decision on right time for solving the uncertainty based questionnaires. In the approach of decision theory, many mathematical replicas such as probability-allocation, crisp set, and fuzzy based set theory were designed to deals with knowledge representation based difficulties along with their characteristic. This paper is proposing new ideological blended approach of FCA with FLC and described with major objectives: primarily the FCA analyzes the data based on relationships between the set of objects of prior-attributes and the set of attributes based prior-data, which the data is framed with data-units implicated composition which are formal statements of idea of human thinking with conversion of significant intelligible explanation. Suitable rules are generated to explore the relationship among the attributes and used the formal concept analysis from these suitable rules to explore better knowledge and most important factors affecting the decision making. Secondly how the FLC derive the fuzzification, rule-construction and defuzzification methods implicated for representing the accurate knowledge for uncertainty based questionnaires. Here the FCA is projected to expand the FCA based conception with help of the objective based item set notions considered as the target which is implicated with the expanded cardinalities along with its weights which is associated through the fuzzy based inference decision rules. This approach is more helpful for medical experts for knowing the range of patient’s memory deficiency also for people whose are facing knowledge explorer deficiency

    Modelling Grocery Retail Topic Distributions: Evaluation, Interpretability and Stability

    Get PDF
    Understanding the shopping motivations behind market baskets has high commercial value in the grocery retail industry. Analyzing shopping transactions demands techniques that can cope with the volume and dimensionality of grocery transactional data while keeping interpretable outcomes. Latent Dirichlet Allocation (LDA) provides a suitable framework to process grocery transactions and to discover a broad representation of customers' shopping motivations. However, summarizing the posterior distribution of an LDA model is challenging, while individual LDA draws may not be coherent and cannot capture topic uncertainty. Moreover, the evaluation of LDA models is dominated by model-fit measures which may not adequately capture the qualitative aspects such as interpretability and stability of topics. In this paper, we introduce clustering methodology that post-processes posterior LDA draws to summarise the entire posterior distribution and identify semantic modes represented as recurrent topics. Our approach is an alternative to standard label-switching techniques and provides a single posterior summary set of topics, as well as associated measures of uncertainty. Furthermore, we establish a more holistic definition for model evaluation, which assesses topic models based not only on their likelihood but also on their coherence, distinctiveness and stability. By means of a survey, we set thresholds for the interpretation of topic coherence and topic similarity in the domain of grocery retail data. We demonstrate that the selection of recurrent topics through our clustering methodology not only improves model likelihood but also outperforms the qualitative aspects of LDA such as interpretability and stability. We illustrate our methods on an example from a large UK supermarket chain.Comment: 20 pages, 9 figure
    corecore