92,867 research outputs found
Structural Logistic Regression for Link Analysis
We present Structural Logistic Regression, an extension of logistic regression to modeling relational data. It is an integrated approach to building regression models from data stored in relational databases in which potential predictors, both boolean and real-valued, are generated by structured search in the space of queries to the database, and then tested with statistical information criteria for inclusion in a logistic regression. Using statistics and relational representation allows modeling in noisy domains with complex structure. Link prediction is a task of high interest with exactly such characteristics. Be it in the domain of scientific citations, social networks or hypertext, the underlying data are extremely noisy and the features useful for prediction are not readily available in a flat file format. We propose the application of Structural Logistic Regression to building link prediction models, and present experimental results for the task of predicting citations made in scientific literature using relational data taken from the CiteSeer search engine. This data includes the citation graph, authorship and publication venues of papers, as well as their word content
Deep Character-Level Click-Through Rate Prediction for Sponsored Search
Predicting the click-through rate of an advertisement is a critical component
of online advertising platforms. In sponsored search, the click-through rate
estimates the probability that a displayed advertisement is clicked by a user
after she submits a query to the search engine. Commercial search engines
typically rely on machine learning models trained with a large number of
features to make such predictions. This is inevitably requires a lot of
engineering efforts to define, compute, and select the appropriate features. In
this paper, we propose two novel approaches (one working at character level and
the other working at word level) that use deep convolutional neural networks to
predict the click-through rate of a query-advertisement pair. Specially, the
proposed architectures only consider the textual content appearing in a
query-advertisement pair as input, and produce as output a click-through rate
prediction. By comparing the character-level model with the word-level model,
we show that language representation can be learnt from scratch at character
level when trained on enough data. Through extensive experiments using billions
of query-advertisement pairs of a popular commercial search engine, we
demonstrate that both approaches significantly outperform a baseline model
built on well-selected text features and a state-of-the-art word2vec-based
approach. Finally, by combining the predictions of the deep models introduced
in this study with the prediction of the model in production of the same
commercial search engine, we significantly improve the accuracy and the
calibration of the click-through rate prediction of the production system.Comment: SIGIR2017, 10 page
Recommended from our members
Using internet search data to predict new HIV diagnoses in China: a modelling study.
ObjectivesInternet data are important sources of abundant information regarding HIV epidemics and risk factors. A number of case studies found an association between internet searches and outbreaks of infectious diseases, including HIV. In this research, we examined the feasibility of using search query data to predict the number of new HIV diagnoses in China.DesignWe identified a set of search queries that are associated with new HIV diagnoses in China. We developed statistical models (negative binomial generalised linear model and its Bayesian variants) to estimate the number of new HIV diagnoses by using data of search queries (Baidu) and official statistics (for the entire country and for Guangdong province) for 7 years (2010 to 2016).ResultsSearch query data were positively associated with the number of new HIV diagnoses in China and in Guangdong province. Experiments demonstrated that incorporating search query data could improve the prediction performance in nowcasting and forecasting tasks.ConclusionsBaidu data can be used to predict the number of new HIV diagnoses in China up to the province level. This study demonstrates the feasibility of using search query data to predict new HIV diagnoses. Results could potentially facilitate timely evidence-based decision making and complement conventional programmes for HIV prevention
Incorporating Clicks, Attention and Satisfaction into a Search Engine Result Page Evaluation Model
Modern search engine result pages often provide immediate value to users and
organize information in such a way that it is easy to navigate. The core
ranking function contributes to this and so do result snippets, smart
organization of result blocks and extensive use of one-box answers or side
panels. While they are useful to the user and help search engines to stand out,
such features present two big challenges for evaluation. First, the presence of
such elements on a search engine result page (SERP) may lead to the absence of
clicks, which is, however, not related to dissatisfaction, so-called "good
abandonments." Second, the non-linear layout and visual difference of SERP
items may lead to non-trivial patterns of user attention, which is not captured
by existing evaluation metrics.
In this paper we propose a model of user behavior on a SERP that jointly
captures click behavior, user attention and satisfaction, the CAS model, and
demonstrate that it gives more accurate predictions of user actions and
self-reported satisfaction than existing models based on clicks alone. We use
the CAS model to build a novel evaluation metric that can be applied to
non-linear SERP layouts and that can account for the utility that users obtain
directly on a SERP. We demonstrate that this metric shows better agreement with
user-reported satisfaction than conventional evaluation metrics.Comment: CIKM2016, Proceedings of the 25th ACM International Conference on
Information and Knowledge Management. 201
- …