90 research outputs found
Time-aware topic recommendation based on micro-blogs
Topic recommendation can help users deal with the information overload issue in micro-blogging communities. This paper proposes to use the implicit information network formed by the multiple relationships among users, topics and micro-blogs, and the temporal information of micro-blogs to find semantically and temporally relevant topics of each topic, and to profile users' time-drifting topic interests. The Content based, Nearest Neighborhood based and Matrix Factorization models are used to make personalized recommendations. The effectiveness of the proposed approaches is demonstrated in the experiments conducted on a real world dataset that collected from Twitter.com
Recommended from our members
Noise-tolerant approximate blocking for dynamic real-time entity resolution
Entity resolution is the process of identifying records in one or multiple data sources that represent the same real-world entity. This process needs to deal with noisy data that contain for example wrong pronunciation or spelling errors. Many real world applications require rapid responses for entity queries on dynamic datasets. This brings challenges to existing approaches which are mainly aimed at the batch matching of records in static data. Locality sensitive hashing (LSH) is an approximate
blocking approach that hashes objects within a certain distance into the same block with high probability. How to make approximate blocking approaches scalable to large datasets and effective for entity resolution in real-time remains an open question. Targeting this problem, we propose a noise-tolerant approximate blocking approach to index records based on their distance ranges using LSH and sorting trees within large sized hash blocks. Experiments conducted on both synthetic and real-world
datasets show the effectiveness of the proposed approach
Recommended from our members
Semantic-aware blocking for entity resolution
In this paper, we propose a semantic-aware blocking framework for entity resolution (ER). The proposed framework is built using locality-sensitive hashing (LSH) techniques, which efficiently unifies both textual and semantic features into an ER blocking process. In order to understand how similarity metrics may affect the effectiveness of ER blocking, we study the robustness of similarity metrics and their properties in terms of LSH families. Then, we present how the semantic similarity of records can be captured, measured, and integrated with LSH techniques over multiple similarity spaces. In doing so, the proposed framework can support efficient similarity searches on records in both textual and semantic similarity spaces, yielding ER blocking with improved quality. We have evaluated the proposed framework over two real-world data sets, and compared it with the state-of-the-art blocking techniques. Our experimental study shows that the combination of semantic similarity and textual similarity can considerably improve the quality of blocking. Furthermore, due to the probabilistic nature of LSH, this semantic-aware blocking framework enables us to build fast and reliable blocking for performing entity resolution tasks in a large-scale data environment
Recommended from our members
DRprofiling: deep reinforcement user profiling for recommendations in heterogenous information networks
Recommender systems are popular for personalization in online communities. Users, items, and other affiliated information such as tags, item genres, and user friends of an online community form a heterogenous information network. User profiling is the foundation of personalized recommender systems. It provides the basis to discover knowledge about an individual user's interests to items. Typically, users are profiled with their direct explicit or implicit ratings, which ignored the inter-connections among users, items, and other entity nodes of the information network. This paper proposes a deep reinforcement user profiling approach for recommender systems. The user profiling process is framed as a sequential decision making problem which can be solved with a Reinforcement Learning (RL) agent. The RL agent interacts with the external heterogenous information network environment and learns a decision making policy network to decide whether there is an interest or preference path between a user and an unobserved item. To effectively train the RL agent, this paper proposes a multi-iteration training process to combine both expert and data-specific knowledge to profile users, generate meta-paths, and make recommendations. The effectiveness of the proposed approaches is demonstrated in experiments conducted on three datasets
Recommended from our members
Augmenting visual information in knowledge graphs for recommendations
Knowledge graphs (KGs) have been popularly used in recommender systems to leverage high-order connections between users and items. Typically, KGs are constructed based on semantic information derived from metadata. However, item images are also highly useful, especially for those domains where visual factors are influential such as fashion items. In this paper, we propose an approach to augment visual information extracted by popularly used image feature extraction methods into KGs. Specifically, we introduce visually-augmented KGs where the extracted information is integrated by using visual factor entities and visual relations. Moreover, to leverage the augmented KGs, a user representation learning approach is proposed to learn hybrid user profiles that combine both semantic and visual preferences. The proposed approaches have been applied in top- recommendation tasks on two real-world datasets. The results show that the augmented KGs and the representation learning approach can improve the recommendation performance. They also show that the augmented KGs are applicable in the state-of-the-art KG-based recommender system as well
Recommended from our members
Dynamic sorted neighborhood indexing for real-time entity resolution
Real-time Entity Resolution (ER) is the process of matching query records in subsecond time with records in a database that represent the same real-world entity. Indexing techniques are generally used to efficiently extract a set of candidate records from the database that are similar to a query record, and that are to be compared with the query record in more detail. The sorted neighborhood indexing method, which sorts a database and compares records within a sliding window, has been successfully used for ER of large static databases. However, because it is based on static sorted arrays and is designed for batch ER that resolves all records in a database rather than resolving those relating to a single query record, this technique is not suitable for real-time ER on dynamic databases that are constantly updated. We propose a tree-based technique that facilitates dynamic indexing based on the sorted neighborhood method, which can be used for real-time ER, and investigate both static and adaptive window approaches. We propose an approach to reduce query matching times by precalculating the similarities between attribute values stored in neighboring tree nodes. We also propose a multitree solution where different sorting keys are used to reduce the effects of errors and variations in attribute values on matching quality by building several distinct index trees. We experimentally evaluate our proposed techniques on large real datasets, as well as on synthetic data with different data quality characteristics. Our results show that as the index grows, no appreciable increase occurs in both record insertion and query times, and that using multiple trees gives noticeable improvements on matching quality with only a small increase in query time. Compared to earlier indexing techniques for real-time ER, our approach achieves significantly reduced indexing and query matching times while maintaining high matching accuracy
Recommended from our members
UoR at SemEval-2021 task 4: using pre-trained BERT Token embeddings for question answering of abstract meaning
Most question answering tasks focuses on predicting concrete answers, e.g., named entities. These tasks can be normally achieved by understanding the contexts without additional information required. In Reading Comprehension of Abstract Meaning (ReCAM) task, the abstract answers are introduced. To understand abstract meanings in the context, additional knowledge is essential. In this paper, we propose an approach that leverages the pre-trained BERT Token embeddings as a prior knowledge resource. According to the results, our approach using the pre-trained BERT outperformed the baselines. It shows that the pre-trained BERT token embeddings can be used as additional knowledge for understanding abstract meanings in question answering
Recommended from our members
Health claims unpacked: a toolkit to enhance the communication of health claims for food
Health claims are sentences on the food product packages to claim the nutrition and the benefits of the nutrition. Consumers in different European contexts often have difficulties understanding health claims, leading to increased confusion about and decreased trust in the food they buy.
Focusing on this problem, we develop a toolkit for improving the communication of health claim for consumers. The toolkit provides (1) interactive activities to disseminate knowledge about health claims to the public, and (2) an NLP-based analysis and prediction engine that food manufacturers can use to estimate how consumers like the health claims that the manufacturers created.
By using the AI-powered toolkit, consumers, manufacturers, and food safety regulators are engaged in determining the different linguistic and cultural barriers to the effective communication of health claims and formulating solutions that can be implemented on multiple levels, including regulation, enforcement, marketing, and consumer education
Recommended from our members
UoR at SemEval-2021 task 7: utilizing pre-trained DistilBert model and multi-scale CNN for humor detection
Humor detection is an interesting but difficult task in NLP. Humor might not be obvious in text because it may be embedded into context, hide behind the literal meaning of the phrase and require prior knowledge to understand. We explored different shallow and deep methods to create a humour detection classifier for task 7-1a. Models like Logistic Regression, LSTM, MLP, CNN were used, and pre-trained models like DistilBert were introduced to generate accurate vector representation for textual data. We focused on applying a multi-scale strategy on modelling, and compared different models. Our best model is the DistilBert+MultiScale CNN which used different sizes of CNN kernel to get multiple scales of features. This method achieved 93.7% F1-score and 92.1% accuracy on the test set
Recommended from our members
UoR at SemEval-2021 task 12: on crowd annotations: learning with disagreements to optimise crowd truth
Crowd sourcing has been ubiquitously used for annotating enormous collections of data. However, the major obstacles of using crowd-sourced labels are noise and errors from non-expert annotations. In this work, two approaches dealing with the noise and errors in crowd-sourced labels are proposed. The first approach uses Sharpness-Aware Minimization (SAM), an optimization technique robust to noisy labels. The other approach leverages a neural network layer called crowd layer specifically designed to learn from crowd-sourced annotations. According to the results, the proposed approaches can improve the performance of Wide Residual Network model and Multi-layer Perception model applied on two crowd-sourced datasets in image processing domain
- …