1,458 research outputs found

    When is a Network a Network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks

    Full text link
    We introduce a framework for the modeling of sequential data capturing pathways of varying lengths observed in a network. Such data are important, e.g., when studying click streams in information networks, travel patterns in transportation systems, information cascades in social networks, biological pathways or time-stamped social interactions. While it is common to apply graph analytics and network analysis to such data, recent works have shown that temporal correlations can invalidate the results of such methods. This raises a fundamental question: when is a network abstraction of sequential data justified? Addressing this open question, we propose a framework which combines Markov chains of multiple, higher orders into a multi-layer graphical model that captures temporal correlations in pathways at multiple length scales simultaneously. We develop a model selection technique to infer the optimal number of layers of such a model and show that it outperforms previously used Markov order detection techniques. An application to eight real-world data sets on pathways and temporal networks shows that it allows to infer graphical models which capture both topological and temporal characteristics of such data. Our work highlights fallacies of network abstractions and provides a principled answer to the open question when they are justified. Generalizing network representations to multi-order graphical models, it opens perspectives for new data mining and knowledge discovery algorithms.Comment: 10 pages, 4 figures, 1 table, companion python package pathpy available on gitHu

    Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains

    Full text link
    Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases (ICD) as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the ICD, which is currently under active development by the WHO contains nearly 50,000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding how these stakeholders collaborate will enable us to improve editing environments that support such collaborations. We uncover how large ontology-engineering projects, such as the ICD in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users subsequently change) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain.Comment: Published in the Journal of Biomedical Informatic

    Retrospective Higher-Order Markov Processes for User Trails

    Full text link
    Users form information trails as they browse the web, checkin with a geolocation, rate items, or consume media. A common problem is to predict what a user might do next for the purposes of guidance, recommendation, or prefetching. First-order and higher-order Markov chains have been widely used methods to study such sequences of data. First-order Markov chains are easy to estimate, but lack accuracy when history matters. Higher-order Markov chains, in contrast, have too many parameters and suffer from overfitting the training data. Fitting these parameters with regularization and smoothing only offers mild improvements. In this paper we propose the retrospective higher-order Markov process (RHOMP) as a low-parameter model for such sequences. This model is a special case of a higher-order Markov chain where the transitions depend retrospectively on a single history state instead of an arbitrary combination of history states. There are two immediate computational advantages: the number of parameters is linear in the order of the Markov chain and the model can be fit to large state spaces. Furthermore, by providing a specific structure to the higher-order chain, RHOMPs improve the model accuracy by efficiently utilizing history states without risks of overfitting the data. We demonstrate how to estimate a RHOMP from data and we demonstrate the effectiveness of our method on various real application datasets spanning geolocation data, review sequences, and business locations. The RHOMP model uniformly outperforms higher-order Markov chains, Kneser-Ney regularization, and tensor factorizations in terms of prediction accuracy

    Intent-Aware Contextual Recommendation System

    Full text link
    Recommender systems take inputs from user history, use an internal ranking algorithm to generate results and possibly optimize this ranking based on feedback. However, often the recommender system is unaware of the actual intent of the user and simply provides recommendations dynamically without properly understanding the thought process of the user. An intelligent recommender system is not only useful for the user but also for businesses which want to learn the tendencies of their users. Finding out tendencies or intents of a user is a difficult problem to solve. Keeping this in mind, we sought out to create an intelligent system which will keep track of the user's activity on a web-application as well as determine the intent of the user in each session. We devised a way to encode the user's activity through the sessions. Then, we have represented the information seen by the user in a high dimensional format which is reduced to lower dimensions using tensor factorization techniques. The aspect of intent awareness (or scoring) is dealt with at this stage. Finally, combining the user activity data with the contextual information gives the recommendation score. The final recommendations are then ranked using filtering and collaborative recommendation techniques to show the top-k recommendations to the user. A provision for feedback is also envisioned in the current system which informs the model to update the various weights in the recommender system. Our overall model aims to combine both frequency-based and context-based recommendation systems and quantify the intent of a user to provide better recommendations. We ran experiments on real-world timestamped user activity data, in the setting of recommending reports to the users of a business analytics tool and the results are better than the baselines. We also tuned certain aspects of our model to arrive at optimized results.Comment: Presented at the 5th International Workshop on Data Science and Big Data Analytics (DSBDA), 17th IEEE International Conference on Data Mining (ICDM) 2017; 8 pages; 4 figures; Due to the limitation "The abstract field cannot be longer than 1,920 characters," the abstract appearing here is slightly shorter than the one in the PDF fil

    Web Page Prediction for Web Personalization: A Review

    Get PDF
    This paper proposes a survey of Web Page Ranking for web personalization. Web page prefetching has been widely used to reduce the access latency problem of the Internet. However, if most prefetched web pages are not visited by the users in their subsequent accesses, the limited network bandwidth and server resources will not be used efficiently and may worsen the access delay problem. Therefore, it is critical that we have an accurate prediction method during prefetching. The technique like Markov models have been widely used to represent and analyze user2018;s navigational behavior (usage data) in the Web graph, using the transitional probabilities between web pages, as recorded in the web logs. The recorded users2018; navigation is used to extract popular web paths and predict current users2018; next steps

    REVIEW PAPER ON WEB PAGE PREDICTION USING DATA MINING

    Get PDF
    The continuous growth of the World Wide Web imposes the need of new methods of design and determines how to access a web page in the web usage mining by performing preprocessing of the data in a web page and development of on-line information services. The need for predicting the user’s needs in order to improve the usability and user retention of a web site is more than evident now a day. Without proper guidance, a visitor often wanders aimlessly without visiting important pages, loses interest, and leaves the site sooner than expected. In proposed system focus on investigating efficient and effective sequential access pattern mining techniques for web usage data. The mined patterns are then used for matching and generating web links for online recommendations. A web page of interest application will be developed for evaluating the quality and effectiveness of the discovered knowledge.   Keyword: Webpage Prediction, Web Mining, MRF, ANN, KNN, GA

    Web Usage Mining: A Survey on Pattern Extraction from Web Logs

    Get PDF
    As the size of web increases along with number of users, it is very much essential for the website owners to better understand their customers so that they can provide better service, and also enhance the quality of the website. To achieve this they depend on the web access log files. The web access log files can be mined to extract interesting pattern so that the user behaviour can be understood. This paper presents an overview of web usage mining and also provides a survey of the pattern extraction algorithms used for web usage mining

    HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks

    Full text link
    The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM Data Mining (SDM 2020
    • …
    corecore