140 research outputs found
Click-aware purchase prediction with push at the top
Eliciting user preferences from purchase records for performing purchase
prediction is challenging because negative feedback is not explicitly observed,
and because treating all non-purchased items equally as negative feedback is
unrealistic. Therefore, in this study, we present a framework that leverages
the past click records of users to compensate for the missing user-item
interactions of purchase records, i.e., non-purchased items. We begin by
formulating various model assumptions, each one assuming a different order of
user preferences among purchased, clicked-but-not-purchased, and non-clicked
items, to study the usefulness of leveraging click records. We implement the
model assumptions using the Bayesian personalized ranking model, which
maximizes the area under the curve for bipartite ranking. However, we argue
that using click records for bipartite ranking needs a meticulously designed
model because of the relative unreliableness of click records compared with
that of purchase records. Therefore, we ultimately propose a novel
learning-to-rank method, called P3Stop, for performing purchase prediction. The
proposed model is customized to be robust to relatively unreliable click
records by particularly focusing on the accuracy of top-ranked items.
Experimental results on two real-world e-commerce datasets demonstrate that
P3STop considerably outperforms the state-of-the-art implicit-feedback-based
recommendation methods, especially for top-ranked items.Comment: For the final published journal version, see
https://doi.org/10.1016/j.ins.2020.02.06
Obtaining Calibrated Probabilities with Personalized Ranking Models
For personalized ranking models, the well-calibrated probability of an item
being preferred by a user has great practical value. While existing work shows
promising results in image classification, probability calibration has not been
much explored for personalized ranking. In this paper, we aim to estimate the
calibrated probability of how likely a user will prefer an item. We investigate
various parametric distributions and propose two parametric calibration
methods, namely Gaussian calibration and Gamma calibration. Each proposed
method can be seen as a post-processing function that maps the ranking scores
of pre-trained models to well-calibrated preference probabilities, without
affecting the recommendation performance. We also design the unbiased empirical
risk minimization framework that guides the calibration methods to learning of
true preference probability from the biased user-item interaction dataset.
Extensive evaluations with various personalized ranking models on real-world
datasets show that both the proposed calibration methods and the unbiased
empirical risk minimization significantly improve the calibration performance.Comment: AAAI 2022 Ora
Mining behavior graphs for ”backtrace” of noncrashing bugs
Analyzing the executions of a buggy software program is essentially a data mining process. Although many interesting methods have been developed to trace crashing bugs (such as memory violation and core dumps), it is still difficult to analyze noncrashing bugs (such as logical errors). In this paper, we develop a novel method to classify the structured traces of program executions using software behavior graphs. By analyzing the correct and incorrect executions, we have made good progress at the isolation of program regions that may lead to the faulty executions. The classification framework is built on an integration of closed graph mining and SVM classification. More interestingly, suspicious regions are identified through the capture of the classification accuracy change, which is measured incrementally during program execution. Our performance study and case-based experiments show that our approach is both effective and efficient
Processing SPARQL queries with regular expressions in RDF databases
Background: As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph.
Results: In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique.
Conclusions: Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.X113sciescopu
Learning Topology-Specific Experts for Molecular Property Prediction
Recently, graph neural networks (GNNs) have been successfully applied to
predicting molecular properties, which is one of the most classical
cheminformatics tasks with various applications. Despite their effectiveness,
we empirically observe that training a single GNN model for diverse molecules
with distinct structural patterns limits its prediction performance. In this
paper, motivated by this observation, we propose TopExpert to leverage
topology-specific prediction models (referred to as experts), each of which is
responsible for each molecular group sharing similar topological semantics.
That is, each expert learns topology-specific discriminative features while
being trained with its corresponding topological group. To tackle the key
challenge of grouping molecules by their topological patterns, we introduce a
clustering-based gating module that assigns an input molecule into one of the
clusters and further optimizes the gating module with two different types of
self-supervision: topological semantics induced by GNNs and molecular
scaffolds, respectively. Extensive experiments demonstrate that TopExpert has
boosted the performance for molecular property prediction and also achieved
better generalization for new molecules with unseen scaffolds than baselines.
The code is available at https://github.com/kimsu55/ToxExpert.Comment: 11 pages with 8 figure
Smartphone dependence classification using tensor factorization
Excessive smartphone use causes personal and social problems. To address this issue, we sought to derive usage patterns that were directly correlated with smartphone dependence based on usage data. This study attempted to classify smartphone dependence using a data-driven prediction algorithm. We developed a mobile application to collect smartphone usage data. A total of 41,683 logs of 48 smartphone users were collected from March 8, 2015, to January 8, 2016. The participants were classified into the control group (SUC) or the addiction group (SUD) using the Korean Smartphone Addiction Proneness Scale for Adults (S-Scale) and a face-to-face offline interview by a psychiatrist and a clinical psychologist (SUC = 23 and SUD = 25). We derived usage patterns using tensor factorization and found the following six optimal usage patterns: 1) social networking services (SNS) during daytime, 2) web surfing, 3) SNS at night, 4) mobile shopping, 5) entertainment, and 6) gaming at night. The membership vectors of the six patterns obtained a significantly better prediction performance than the raw data. For all patterns, the usage times of the SUD were much longer than those of the SUC. From our findings, we concluded that usage patterns and membership vectors were effective tools to assess and predict smartphone dependence and could provide an intervention guideline to predict and treat smartphone dependence based on usage data.112Ysciescopu
Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS
Background: Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed.
Results: RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed.
Conclusions: RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.1114Nsciescopu
- …