108 research outputs found

    Entity recommendation and search in heterogeneous information networks

    Get PDF
    With the rapid development of social media and information network-based web services, data mining studies on network analysis have gained increasing attention in recent years. Many early studies focus on homogeneous network mining, with the assumption that the network nodes and links are of the same type (e.g., social networks). However, real-world data in many domains and applications are often multi-typed and interconnected, forming heterogeneous information networks. The objective of my thesis is to study effective and scalable approaches to help users explore and discover useful information and knowledge in heterogeneous information networks. I also aim to advance the principles and methodologies of mining heterogeneous information networks through these studies. Specifically, I study and focus on entity recommendation and search related problems in heterogeneous information networks. I investigate and propose data mining methodologies to facilitate the construction of entity recommender systems and search engines for heterogeneous networks. In this thesis, I first propose to study entity recommendation problem in heterogeneous information network scope with implicit feedback. Second, I study a real-world large-scale entity recommendation application with commercial search engine user logs and a web-scale entity graph. Third, I combine text information and heterogeneous relationships between entities to study citation prediction and search problem in bibliographical networks. Fourth, I introduce a user-guided entity similarity search framework in information networks to integrate users' guidance into entity search process, which helps alleviate entity similarity ambiguity problem in heterogeneous networks. The methodologies proposed in this thesis are critically important for information exploration in heterogeneous information networks. The principles and theoretical findings in these studies have potential impact in other information network related research fields and can be applied in a wide range of real-world applications

    A Trio Neural Model for Dynamic Entity Relatedness Ranking

    Full text link
    Measuring entity relatedness is a fundamental task for many natural language processing and information retrieval applications. Prior work often studies entity relatedness in static settings and an unsupervised manner. However, entities in real-world are often involved in many different relationships, consequently entity-relations are very dynamic over time. In this work, we propose a neural networkbased approach for dynamic entity relatedness, leveraging the collective attention as supervision. Our model is capable of learning rich and different entity representations in a joint framework. Through extensive experiments on large-scale datasets, we demonstrate that our method achieves better results than competitive baselines.Comment: In Proceedings of CoNLL 201

    Measuring Collective Attention in Online Content: Sampling, Engagement, and Network Effects

    Get PDF
    The production and consumption of online content have been increasing rapidly, whereas human attention is a scarce resource. Understanding how the content captures collective attention has become a challenge of growing importance. In this thesis, we tackle this challenge from three fronts -- quantifying sampling effects of social media data; measuring engagement behaviors towards online content; and estimating network effects induced by the recommender systems. Data sampling is a fundamental problem. To obtain a list of items, one common method is sampling based on the item prevalence in social media streams. However, social data is often noisy and incomplete, which may affect the subsequent observations. For each item, user behaviors can be conceptualized as two steps -- the first step is relevant to the content appeal, measured by the number of clicks; the second step is relevant to the content quality, measured by the post-clicking metrics, e.g., dwell time, likes, or comments. We categorize online attention (behaviors) into two classes: popularity (clicking) and engagement (watching, liking, or commenting). Moreover, modern platforms use recommender systems to present the users with a tailoring content display for maximizing satisfaction. The recommendation alters the appeal of an item by changing its ranking, and consequently impacts its popularity. Our research is enabled by the data available from the largest video hosting site YouTube. We use YouTube URLs shared on Twitter as a sampling protocol to obtain a collection of videos, and we track their prevalence from 2015 to 2019. This method creates a longitudinal dataset consisting of more than 5 billion tweets. Albeit the volume is substantial, we find Twitter still subsamples the data. Our dataset covers about 80% of all tweets with YouTube URLs. We present a comprehensive measurement study of the Twitter sampling effects across different timescales and different subjects. We find that the volume of missing tweets can be estimated by Twitter rate limit messages, true entity ranking can be inferred based on sampled observations, and sampling compromises the quality of network and diffusion models. Next, we present the first large-scale measurement study of how users collectively engage with YouTube videos. We study the time and percentage of each video being watched. We propose a duration-calibrated metric, called relative engagement, which is correlated with recognized notion of content quality, stable over time, and predictable even before a video's upload. Lastly, we examine the network effects induced by the YouTube recommender system. We construct the recommendation network for 60,740 music videos from 4,435 professional artists. An edge indicates that the target video is recommended on the webpage of source video. We discover the popularity bias -- videos are disproportionately recommended towards more popular videos. We use the bow-tie structure to characterize the network and find that the largest strongly connected component consists of 23.1% of videos while occupying 82.6% of attention. We also build models to estimate the latent influence between videos and artists. By taking into account the network structure, we can predict video popularity 9.7% better than other baselines. Altogether, we explore the collective consuming patterns of human attention towards online content. Methods and findings from this thesis can be used by content producers, hosting sites, and online users alike to improve content production, advertising strategies, and recommender systems. We expect our new metrics, methods, and observations can generalize to other multimedia platforms such as the music streaming service Spotify

    A Recommendation Engine Using Apache Spark

    Get PDF
    The volume of structured and unstructured data has grown at exponential scale in recent days. As a result of this rapid data growth, we are always inundated with plethora of choices in any product or service. It is very natural to get lost in the amazon of such choices and finding hard to make decisions. The project aims at addressing this problem by using entity recommendation. The two main aspects that the project concentrates on are implementing and presenting more accurate entity recommendations to the user and another is dealing with vast amount of data. The project aims at presenting recommendation results according to user’s query with efficiency and accuracy. Project makes use of ListNet ranking algorithm to rank the recommendation results. Query independent features and query dependent features are used to come up with ranking scores. Ranking scores decide the order in which the recommendation results are presented to the user. Project makes use of Apache Spark, a distributed bigdata processing framework. Spark gives the advantage of handling iterative and interactive algorithms with efficiency and minimal processing time as compared to traditional mapreduce paradigm. We performed the experiments for recommendation engine using DBPedia as the dataset and tested the results for movie domain. We used both queryindependent (pagerank) and querydependent (clicklogs) features for ranking purposes. We observed that ListNet algorithm performs really well by making use of Apache Spark as the RDDs provide faster way for iterative algorithms to execute. We also observed that the results of recommendation engine are accurate and the entities are well ranked

    Multiple Models for Recommending Temporal Aspects of Entities

    Full text link
    Entity aspect recommendation is an emerging task in semantic search that helps users discover serendipitous and prominent information with respect to an entity, of which salience (e.g., popularity) is the most important factor in previous work. However, entity aspects are temporally dynamic and often driven by events happening over time. For such cases, aspect suggestion based solely on salience features can give unsatisfactory results, for two reasons. First, salience is often accumulated over a long time period and does not account for recency. Second, many aspects related to an event entity are strongly time-dependent. In this paper, we study the task of temporal aspect recommendation for a given entity, which aims at recommending the most relevant aspects and takes into account time in order to improve search experience. We propose a novel event-centric ensemble ranking method that learns from multiple time and type-dependent models and dynamically trades off salience and recency characteristics. Through extensive experiments on real-world query logs, we demonstrate that our method is robust and achieves better effectiveness than competitive baselines.Comment: In proceedings of the 15th Extended Semantic Web Conference (ESWC 2018

    Improving Search Effectiveness through Query Log and Entity Mining

    Get PDF
    The Web is the largest repository of knowledge in the world. Everyday people contribute to make it bigger by generating new web data. Data never sleeps. Every minute someone writes a new blog post, uploads a video or comments on an article. Usually people rely on Web Search Engines for satisfying their information needs: they formulate their needs as text queries and they expect a list of highly relevant documents answering their requests. Being able to manage this massive volume of data, ensuring high quality and performance, is a challenging topic that we tackle in this thesis. In this dissertation we focus on the Web of Data: a recent approach, originated from the Semantic Web community, consisting in a collective effort to augment the existing Web with semistructured-data. We propose to manage the data explosion shifting from a retrieval model based on documents to a model enriched with entities, where an entity can describe a person, a product, a location, a company, through semi-structured information. In our work, we combine the Web of Data with an important source of knowledge: query logs, which record the interactions between the Web Search Engine and the users. Query log mining aims at extracting valuable knowledge that can be exploited to enhance users’ search experience. According to this vision, this dissertation aims at improving Web Search Engines toward the mutual use of query logs and entities. The contributions of this work are the following: we show how historical usage data can be exploited for improving performance during the snippet generation process. Secondly, we propose a query recommender system that, by combining entities with queries, leads to significant improvements to the quality of the suggestions. Furthermore, we develop a new technique for estimating the relatedness between two entities, i.e., their semantic similarity. Finally, we show that entities may be useful for automatically building explanatory statements that aim at helping the user to better understand if, and why, the suggested item can be of her interest

    Product Knowledge Graph Embedding for E-commerce

    Full text link
    In this paper, we propose a new product knowledge graph (PKG) embedding approach for learning the intrinsic product relations as product knowledge for e-commerce. We define the key entities and summarize the pivotal product relations that are critical for general e-commerce applications including marketing, advertisement, search ranking and recommendation. We first provide a comprehensive comparison between PKG and ordinary knowledge graph (KG) and then illustrate why KG embedding methods are not suitable for PKG learning. We construct a self-attention-enhanced distributed representation learning model for learning PKG embeddings from raw customer activity data in an end-to-end fashion. We design an effective multi-task learning schema to fully leverage the multi-modal e-commerce data. The Poincare embedding is also employed to handle complex entity structures. We use a real-world dataset from grocery.walmart.com to evaluate the performances on knowledge completion, search ranking and recommendation. The proposed approach compares favourably to baselines in knowledge completion and downstream tasks
    • …
    corecore