Search CORE

140 research outputs found

Click-aware purchase prediction with push at the top

Author: Kim Donghyun
Lee Jung-Tae
Park Chanyoung
Yang Min-Chul
Yu Hwanjo
Publication venue: 'Elsevier BV'
Publication date: 28/05/2020
Field of study

Eliciting user preferences from purchase records for performing purchase prediction is challenging because negative feedback is not explicitly observed, and because treating all non-purchased items equally as negative feedback is unrealistic. Therefore, in this study, we present a framework that leverages the past click records of users to compensate for the missing user-item interactions of purchase records, i.e., non-purchased items. We begin by formulating various model assumptions, each one assuming a different order of user preferences among purchased, clicked-but-not-purchased, and non-clicked items, to study the usefulness of leveraging click records. We implement the model assumptions using the Bayesian personalized ranking model, which maximizes the area under the curve for bipartite ranking. However, we argue that using click records for bipartite ranking needs a meticulously designed model because of the relative unreliableness of click records compared with that of purchase records. Therefore, we ultimately propose a novel learning-to-rank method, called P3Stop, for performing purchase prediction. The proposed model is customized to be robust to relatively unreliable click records by particularly focusing on the accuracy of top-ranked items. Experimental results on two real-world e-commerce datasets demonstrate that P3STop considerably outperforms the state-of-the-art implicit-feedback-based recommendation methods, especially for top-ranked items.Comment: For the final published journal version, see https://doi.org/10.1016/j.ins.2020.02.06

arXiv.org e-Print Archive

포항공과대학교

Obtaining Calibrated Probabilities with Personalized Ranking Models

Author: Kang SeongKu
Kweon Wonbin
Yu Hwanjo
Publication venue
Publication date: 22/02/2022
Field of study

For personalized ranking models, the well-calibrated probability of an item being preferred by a user has great practical value. While existing work shows promising results in image classification, probability calibration has not been much explored for personalized ranking. In this paper, we aim to estimate the calibrated probability of how likely a user will prefer an item. We investigate various parametric distributions and propose two parametric calibration methods, namely Gaussian calibration and Gamma calibration. Each proposed method can be seen as a post-processing function that maps the ranking scores of pre-trained models to well-calibrated preference probabilities, without affecting the recommendation performance. We also design the unbiased empirical risk minimization framework that guides the calibration methods to learning of true preference probability from the biased user-item interaction dataset. Extensive evaluations with various personalized ranking models on real-world datasets show that both the proposed calibration methods and the unbiased empirical risk minimization significantly improve the calibration performance.Comment: AAAI 2022 Ora

arXiv.org e-Print Archive

포항공과대학교

Mining behavior graphs for ”backtrace” of noncrashing bugs

Author: Chao Liu
Hwanjo Yu
Jiawei Han
Philip S. Yu
Xifeng Yan
Publication venue
Publication date: 01/01/2005
Field of study

Analyzing the executions of a buggy software program is essentially a data mining process. Although many interesting methods have been developed to trace crashing bugs (such as memory violation and core dumps), it is still difficult to analyze noncrashing bugs (such as logical errors). In this paper, we develop a novel method to classify the structured traces of program executions using software behavior graphs. By analyzing the correct and incorrect executions, we have made good progress at the isolation of program regions that may lead to the faulty executions. The classification framework is built on an integration of closed graph mining and SVM classification. More interestingly, suspicious regions are identified through the capture of the classification accuracy change, which is measured incrementally during program execution. Our performance study and case-based experiments show that our approach is both effective and efficient

CiteSeerX

Crossref

포항공과대학교

Processing SPARQL queries with regular expressions in RDF databases

Author: Han WS
Hune cho
Jeong-Hoon Lee
Lee J
Lee J
Minh-Duc Pham
YU HWANJO
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Background: As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. Results: In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Conclusions: Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.X113sciescopu

포항공과대학교

Learning Topology-Specific Experts for Molecular Property Prediction

Author: Kang SeongKu
Kim Su
Lee Dongha
Lee Seonghyeon
Yu Hwanjo
Publication venue
Publication date: 11/03/2023
Field of study

Recently, graph neural networks (GNNs) have been successfully applied to predicting molecular properties, which is one of the most classical cheminformatics tasks with various applications. Despite their effectiveness, we empirically observe that training a single GNN model for diverse molecules with distinct structural patterns limits its prediction performance. In this paper, motivated by this observation, we propose TopExpert to leverage topology-specific prediction models (referred to as experts), each of which is responsible for each molecular group sharing similar topological semantics. That is, each expert learns topology-specific discriminative features while being trained with its corresponding topological group. To tackle the key challenge of grouping molecules by their topological patterns, we introduce a clustering-based gating module that assigns an input molecule into one of the clusters and further optimizes the gating module with two different types of self-supervision: topological semantics induced by GNNs and molecular scaffolds, respectively. Extensive experiments demonstrate that TopExpert has boosted the performance for molecular property prediction and also achieved better generalization for new molecules with unseen scaffolds than baselines. The code is available at https://github.com/kimsu55/ToxExpert.Comment: 11 pages with 8 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Smartphone dependence classification using tensor factorization

Author: Dai-Jin Kim
Hwanjo Yu
In Hye Yook
In Young Choi
Jingyun Choi
Mi Jung Rho
Yejin Kim
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/02/2019
Field of study

Excessive smartphone use causes personal and social problems. To address this issue, we sought to derive usage patterns that were directly correlated with smartphone dependence based on usage data. This study attempted to classify smartphone dependence using a data-driven prediction algorithm. We developed a mobile application to collect smartphone usage data. A total of 41,683 logs of 48 smartphone users were collected from March 8, 2015, to January 8, 2016. The participants were classified into the control group (SUC) or the addiction group (SUD) using the Korean Smartphone Addiction Proneness Scale for Adults (S-Scale) and a face-to-face offline interview by a psychiatrist and a clinical psychologist (SUC = 23 and SUD = 25). We derived usage patterns using tensor factorization and found the following six optimal usage patterns: 1) social networking services (SNS) during daytime, 2) web surfing, 3) SNS at night, 4) mobile shopping, 5) entertainment, and 6) gaming at night. The membership vectors of the six patterns obtained a significantly better prediction performance than the raw data. For all patterns, the usage times of the SUD were much longer than those of the SUC. From our findings, we concluded that usage patterns and membership vectors were effective tools to assess and predict smartphone dependence and could provide an intervention guideline to predict and treat smartphone dependence based on usage data.112Ysciescopu

포항공과대학교

Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS

Author: B Suomela
C Burges
C Sneiderman
D States
F Radlinski
G Poulter
G Salton
H Oh
H Yu
H Yu
Hwanjo Yu
Ilhwan Ko
J Xu
Jinoh Oh
L Murphy
M Siadaty
Sungchul Kim
T Joachims
T Joachims
T Qin
Taehoon Kim
V Cherkassky
W Hersh
Wook-Shin Han
X Geng
Y Cao
Y Lin
Yoo Illhoi
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. Results: RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. Conclusions: RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.1114Nsciescopu

Crossref

Springer - Publisher Connector

PubMed Central

포항공과대학교