2,852 research outputs found

    Entity recommendation and search in heterogeneous information networks

    Get PDF
    With the rapid development of social media and information network-based web services, data mining studies on network analysis have gained increasing attention in recent years. Many early studies focus on homogeneous network mining, with the assumption that the network nodes and links are of the same type (e.g., social networks). However, real-world data in many domains and applications are often multi-typed and interconnected, forming heterogeneous information networks. The objective of my thesis is to study effective and scalable approaches to help users explore and discover useful information and knowledge in heterogeneous information networks. I also aim to advance the principles and methodologies of mining heterogeneous information networks through these studies. Specifically, I study and focus on entity recommendation and search related problems in heterogeneous information networks. I investigate and propose data mining methodologies to facilitate the construction of entity recommender systems and search engines for heterogeneous networks. In this thesis, I first propose to study entity recommendation problem in heterogeneous information network scope with implicit feedback. Second, I study a real-world large-scale entity recommendation application with commercial search engine user logs and a web-scale entity graph. Third, I combine text information and heterogeneous relationships between entities to study citation prediction and search problem in bibliographical networks. Fourth, I introduce a user-guided entity similarity search framework in information networks to integrate users' guidance into entity search process, which helps alleviate entity similarity ambiguity problem in heterogeneous networks. The methodologies proposed in this thesis are critically important for information exploration in heterogeneous information networks. The principles and theoretical findings in these studies have potential impact in other information network related research fields and can be applied in a wide range of real-world applications

    Social media intention mining for sustainable information systems: categories, taxonomy, datasets and challenges

    Get PDF
    Intention mining is a promising research area of data mining that aims to determine end-users’ intentions from their past activities stored in the logs, which note users’ interaction with the system. Search engines are a major source to infer users’ past searching activities to predict their intention, facilitating the vendors and manufacturers to present their products to the user in a promising manner. This area has been consistently getting pertinence with an increasing trend for online purchasing. Noticeable research work has been accomplished in this area for the last two decades. There is no such systematic literature review available that provides a comprehensive review in intension mining domain to the best of our knowledge. This article presents a systematic literature review based on 109 high-quality research papers selected after rigorous screening. The analysis reveals that there exist eight prominent categories of intention. Furthermore, a taxonomy of the approaches and techniques used for intention mining have been discussed in this article. Similarly, six important types of data sets used for this purpose have also been discussed in this work. Lastly, future challenges and research gaps have also been presented for the researchers working in this domain

    Temporal models for mining, ranking and recommendation in the Web

    Get PDF
    Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, heterogeneous temporal datasets i.e., the Web, collaborative knowledge bases and social networks have been emerged as gold-mines for content analytics of many sorts. In those collections, time plays an essential role in many crucial information retrieval and data mining tasks, such as from user intent understanding, document ranking to advanced recommendations. There are two semantically closed and important constituents when modeling along the time dimension, i.e., entity and event. Time is crucially served as the context for changes driven by happenings and phenomena (events) that related to people, organizations or places (so-called entities) in our social lives. Thus, determining what users expect, or in other words, resolving the uncertainty confounded by temporal changes is a compelling task to support consistent user satisfaction. In this thesis, we address the aforementioned issues and propose temporal models that capture the temporal dynamics of such entities and events to serve for the end tasks. Specifically, we make the following contributions in this thesis: (1) Query recommendation and document ranking in the Web - we address the issues for suggesting entity-centric queries and ranking effectiveness surrounding the happening time period of an associated event. In particular, we propose a multi-criteria optimization framework that facilitates the combination of multiple temporal models to smooth out the abrupt changes when transitioning between event phases for the former and a probabilistic approach for search result diversification of temporally ambiguous queries for the latter. (2) Entity relatedness in Wikipedia - we study the long-term dynamics of Wikipedia as a global memory place for high-impact events, specifically the reviving memories of past events. Additionally, we propose a neural network-based approach to measure the temporal relatedness of entities and events. The model engages different latent representations of an entity (i.e., from time, link-based graph and content) and use the collective attention from user navigation as the supervision. (3) Graph-based ranking and temporal anchor-text mining inWeb Archives - we tackle the problem of discovering important documents along the time-span ofWeb Archives, leveraging the link graph. Specifically, we combine the problems of relevance, temporal authority, diversity and time in a unified framework. The model accounts for the incomplete link structure and natural time lagging in Web Archives in mining the temporal authority. (4) Methods for enhancing predictive models at early-stage in social media and clinical domain - we investigate several methods to control model instability and enrich contexts of predictive models at the “cold-start” period. We demonstrate their effectiveness for the rumor detection and blood glucose prediction cases respectively. Overall, the findings presented in this thesis demonstrate the importance of tracking these temporal dynamics surround salient events and entities for IR applications. We show that determining such changes in time-based patterns and trends in prevalent temporal collections can better satisfy user expectations, and boost ranking and recommendation effectiveness over time

    Entity-Oriented Search

    Get PDF
    This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

    Studies on User Intent Analysis and Mining

    Get PDF
    Predicting the goals of users can be extremely useful in e-commerce, online entertainment, information retrieval, and many other online services and applications. In this thesis, we study the task of user intent understanding, trying to bridge the gap between user expressions to online services and their goals behind it. As far as we know, most of the existing user intent studies are focusing on web search and social media domain. Studies on other areas are not enough. For example, as people more and more rely our daily life on cellphone, our information needs expressing to mobile devices and related services are increasing dramatically. Studies of user intent mining on mobile devices are not much. And the intentions of using mobile devices are different from the ones we use web search engine or social network. So we cannot directly apply the existing user intention to this area. Besides, user's intents are not stable but changing over time. And different interests will impact each other. Modeling such kind of dynamic user interests can help accurately understand and predict user's intent. But there're few existing works in this area. Moreover, user intent could be explicitly or implicitly expressed by users. The implicit intent expression is more close to human's natural language and also have great value to recognize and mine. To make further studies of these challenges, we first try to answer the question of “What is the user intent?” By referring amount of previous studies, we give our definition of user intent as “User intent is a task-specific, predefined or latent concept, topic or knowledge-base that is under an expression from a user who is trying to express his goal of information or service need.“ Then, we focus on the driving scenario when a user using cellphone and study the user intent in this domain. As far as we know, it is the first time of user intent analysis and categorization in this domain. And we also build a dataset of user input and related intent category and attributes by crowdsourcing and carefully handcraft. With the user intent taxonomy and dataset in hand, we conduct a user intent classification and user intent attribute recognition by supervised machine learning models. To classify the user intent for a user intent query, we use a convolutional neural network model to build a multi-class classifier. And then we use a sequential labeling method to recognize the intent attribute in the query. The experiment results show that our proposed method outperforms several baseline models in precision, recall, and F-score. In addition, we study the implicit user intent mining method through web search log data. By using a Restricted Boltzmann Machine, we make use of the correlation of query and click information to learn the latent intent behind a user web search. We propose a user intent prediction model on online discussion forum using Multivariate Hawkes Process. It dynamically models user intentions change and interact over time.The method models both of the internal and external factors of user's online forum response motivations, and also integrated the time decay fact of user's interests. We also present a data visualization method, using an enriched domain ontology to highlight the domain-specific words and entity relations within an article.Ph.D., Information Studies -- Drexel University, 201
    • …
    corecore