Search CORE

90 research outputs found

Time-aware topic recommendation based on micro-blogs

Author: Christen Peter
Liang Huizhi
Tjondronegoro Dian
Xu Yue
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Topic recommendation can help users deal with the information overload issue in micro-blogging communities. This paper proposes to use the implicit information network formed by the multiple relationships among users, topics and micro-blogs, and the temporal information of micro-blogs to find semantically and temporally relevant topics of each topic, and to profile users' time-drifting topic interests. The Content based, Nearest Neighborhood based and Matrix Factorization models are used to make personalized recommendations. The effectiveness of the proposed approaches is demonstrated in the experiments conducted on a real world dataset that collected from Twitter.com

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

The Australian National University

Recommended from our members

Noise-tolerant approximate blocking for dynamic real-time entity resolution

Author: Christen Peter
Gayler Ross
Liang Huizhi
Wang Yanzhe
Publication venue
Publication date: 01/01/2014
Field of study

Entity resolution is the process of identifying records in one or multiple data sources that represent the same real-world entity. This process needs to deal with noisy data that contain for example wrong pronunciation or spelling errors. Many real world applications require rapid responses for entity queries on dynamic datasets. This brings challenges to existing approaches which are mainly aimed at the batch matching of records in static data. Locality sensitive hashing (LSH) is an approximate blocking approach that hashes objects within a certain distance into the same block with high probability. How to make approximate blocking approaches scalable to large datasets and effective for entity resolution in real-time remains an open question. Targeting this problem, we propose a noise-tolerant approximate blocking approach to index records based on their distance ranges using LSH and sorting trees within large sized hash blocks. Experiments conducted on both synthetic and real-world datasets show the effectiveness of the proposed approach

Central Archive at the University of Reading

Crossref

Recommended from our members

Semantic-aware blocking for entity resolution

Author: Cui Mingyuan
Liang Huizhi
Wang Qing
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/08/2015
Field of study

In this paper, we propose a semantic-aware blocking framework for entity resolution (ER). The proposed framework is built using locality-sensitive hashing (LSH) techniques, which efficiently unifies both textual and semantic features into an ER blocking process. In order to understand how similarity metrics may affect the effectiveness of ER blocking, we study the robustness of similarity metrics and their properties in terms of LSH families. Then, we present how the semantic similarity of records can be captured, measured, and integrated with LSH techniques over multiple similarity spaces. In doing so, the proposed framework can support efficient similarity searches on records in both textual and semantic similarity spaces, yielding ER blocking with improved quality. We have evaluated the proposed framework over two real-world data sets, and compared it with the state-of-the-art blocking techniques. Our experimental study shows that the combination of semantic similarity and textual similarity can considerably improve the quality of blocking. Furthermore, due to the probabilistic nature of LSH, this semantic-aware blocking framework enables us to build fast and reliable blocking for performing entity resolution tasks in a large-scale data environment

Central Archive at the University of Reading

Crossref

The Australian National University

Recommended from our members

DRprofiling: deep reinforcement user profiling for recommendations in heterogenous information networks

Author: Liang Huizhi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/05/2020
Field of study

Recommender systems are popular for personalization in online communities. Users, items, and other affiliated information such as tags, item genres, and user friends of an online community form a heterogenous information network. User profiling is the foundation of personalized recommender systems. It provides the basis to discover knowledge about an individual user's interests to items. Typically, users are profiled with their direct explicit or implicit ratings, which ignored the inter-connections among users, items, and other entity nodes of the information network. This paper proposes a deep reinforcement user profiling approach for recommender systems. The user profiling process is framed as a sequential decision making problem which can be solved with a Reinforcement Learning (RL) agent. The RL agent interacts with the external heterogenous information network environment and learns a decision making policy network to decide whether there is an interest or preference path between a user and an unobserved item. To effectively train the RL agent, this paper proposes a multi-iteration training process to combine both expert and data-specific knowledge to profile users, generate meta-paths, and make recommendations. The effectiveness of the proposed approaches is demonstrated in experiments conducted on three datasets

Central Archive at the University of Reading

Recommended from our members

Augmenting visual information in knowledge graphs for recommendations

Author: Liang Huizhi
Markchom Thanet
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2021
Field of study

Knowledge graphs (KGs) have been popularly used in recommender systems to leverage high-order connections between users and items. Typically, KGs are constructed based on semantic information derived from metadata. However, item images are also highly useful, especially for those domains where visual factors are influential such as fashion items. In this paper, we propose an approach to augment visual information extracted by popularly used image feature extraction methods into KGs. Specifically, we introduce visually-augmented KGs where the extracted information is integrated by using visual factor entities and visual relations. Moreover, to leverage the augmented KGs, a user representation learning approach is proposed to learn hybrid user profiles that combine both semantic and visual preferences. The proposed approaches have been applied in top-

N

recommendation tasks on two real-world datasets. The results show that the augmented KGs and the representation learning approach can improve the recommendation performance. They also show that the augmented KGs are applicable in the state-of-the-art KG-based recommender system as well

Central Archive at the University of Reading

Recommended from our members

Dynamic sorted neighborhood indexing for real-time entity resolution

Author: Christen P.
Gayler R. W.
Liang Huizhi
Ramadan B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2015
Field of study

Real-time Entity Resolution (ER) is the process of matching query records in subsecond time with records in a database that represent the same real-world entity. Indexing techniques are generally used to efficiently extract a set of candidate records from the database that are similar to a query record, and that are to be compared with the query record in more detail. The sorted neighborhood indexing method, which sorts a database and compares records within a sliding window, has been successfully used for ER of large static databases. However, because it is based on static sorted arrays and is designed for batch ER that resolves all records in a database rather than resolving those relating to a single query record, this technique is not suitable for real-time ER on dynamic databases that are constantly updated. We propose a tree-based technique that facilitates dynamic indexing based on the sorted neighborhood method, which can be used for real-time ER, and investigate both static and adaptive window approaches. We propose an approach to reduce query matching times by precalculating the similarities between attribute values stored in neighboring tree nodes. We also propose a multitree solution where different sorting keys are used to reduce the effects of errors and variations in attribute values on matching quality by building several distinct index trees. We experimentally evaluate our proposed techniques on large real datasets, as well as on synthetic data with different data quality characteristics. Our results show that as the index grows, no appreciable increase occurs in both record insertion and query times, and that using multiple trees gives noticeable improvements on matching quality with only a small increase in query time. Compared to earlier indexing techniques for real-time ER, our approach achieves significantly reduced indexing and query matching times while maintaining high matching accuracy

Central Archive at the University of Reading

Crossref

The Australian National University

Recommended from our members

UoR at SemEval-2021 task 4: using pre-trained BERT Token embeddings for question answering of abstract meaning

Author: Liang Huizhi
Markchom Thanet
Publication venue
Publication date: 01/08/2021
Field of study

Most question answering tasks focuses on predicting concrete answers, e.g., named entities. These tasks can be normally achieved by understanding the contexts without additional information required. In Reading Comprehension of Abstract Meaning (ReCAM) task, the abstract answers are introduced. To understand abstract meanings in the context, additional knowledge is essential. In this paper, we propose an approach that leverages the pre-trained BERT Token embeddings as a prior knowledge resource. According to the results, our approach using the pre-trained BERT outperformed the baselines. It shows that the pre-trained BERT token embeddings can be used as additional knowledge for understanding abstract meanings in question answering

Central Archive at the University of Reading

Recommended from our members

Health claims unpacked: a toolkit to enhance the communication of health claims for food

Author: Li Xiao
Liang Huizhi
Liu Zehao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2021
Field of study

Health claims are sentences on the food product packages to claim the nutrition and the benefits of the nutrition. Consumers in different European contexts often have difficulties understanding health claims, leading to increased confusion about and decreased trust in the food they buy. Focusing on this problem, we develop a toolkit for improving the communication of health claim for consumers. The toolkit provides (1) interactive activities to disseminate knowledge about health claims to the public, and (2) an NLP-based analysis and prediction engine that food manufacturers can use to estimate how consumers like the health claims that the manufacturers created. By using the AI-powered toolkit, consumers, manufacturers, and food safety regulators are engaged in determining the different linguistic and cultural barriers to the effective communication of health claims and formulating solutions that can be implemented on multiple levels, including regulation, enforcement, marketing, and consumer education

Central Archive at the University of Reading

Recommended from our members

UoR at SemEval-2021 task 7: utilizing pre-trained DistilBert model and multi-scale CNN for humor detection

Author: Haines Carl
Liang Huizhi
Liu Zehao
Publication venue
Publication date: 01/08/2021
Field of study

Humor detection is an interesting but difficult task in NLP. Humor might not be obvious in text because it may be embedded into context, hide behind the literal meaning of the phrase and require prior knowledge to understand. We explored different shallow and deep methods to create a humour detection classifier for task 7-1a. Models like Logistic Regression, LSTM, MLP, CNN were used, and pre-trained models like DistilBert were introduced to generate accurate vector representation for textual data. We focused on applying a multi-scale strategy on modelling, and compared different models. Our best model is the DistilBert+MultiScale CNN which used different sizes of CNN kernel to get multiple scales of features. This method achieved 93.7% F1-score and 92.1% accuracy on the test set

Central Archive at the University of Reading

Recommended from our members

UoR at SemEval-2021 task 12: on crowd annotations: learning with disagreements to optimise crowd truth

Author: Liang Huizhi
Markchom Thanet
Osei-Brefo Emmanuel
Publication venue
Publication date: 01/08/2021
Field of study

Crowd sourcing has been ubiquitously used for annotating enormous collections of data. However, the major obstacles of using crowd-sourced labels are noise and errors from non-expert annotations. In this work, two approaches dealing with the noise and errors in crowd-sourced labels are proposed. The first approach uses Sharpness-Aware Minimization (SAM), an optimization technique robust to noisy labels. The other approach leverages a neural network layer called crowd layer specifically designed to learn from crowd-sourced annotations. According to the results, the proposed approaches can improve the performance of Wide Residual Network model and Multi-layer Perception model applied on two crowd-sourced datasets in image processing domain

Central Archive at the University of Reading