33 research outputs found
Strongly Hierarchical Factorization Machines and ANOVA Kernel Regression
High-order parametric models that include terms for feature interactions are
applied to various data mining tasks, where ground truth depends on
interactions of features. However, with sparse data, the high- dimensional
parameters for feature interactions often face three issues: expensive
computation, difficulty in parameter estimation and lack of structure. Previous
work has proposed approaches which can partially re- solve the three issues. In
particular, models with factorized parameters (e.g. Factorization Machines) and
sparse learning algorithms (e.g. FTRL-Proximal) can tackle the first two issues
but fail to address the third. Regarding to unstructured parameters,
constraints or complicated regularization terms are applied such that
hierarchical structures can be imposed. However, these methods make the
optimization problem more challenging. In this work, we propose Strongly
Hierarchical Factorization Machines and ANOVA kernel regression where all the
three issues can be addressed without making the optimization problem more
difficult. Experimental results show the proposed models significantly
outperform the state-of-the-art in two data mining tasks: cold-start user
response time prediction and stock volatility prediction.Comment: 9 pages, to appear in SDM'1
An Empirical Evaluation Of Social Influence Metrics
Predicting when an individual will adopt a new behavior is an important
problem in application domains such as marketing and public health. This paper
examines the perfor- mance of a wide variety of social network based
measurements proposed in the literature - which have not been previously
compared directly. We study the probability of an individual becoming
influenced based on measurements derived from neigh- borhood (i.e. number of
influencers, personal network exposure), structural diversity, locality,
temporal measures, cascade mea- sures, and metadata. We also examine the
ability to predict influence based on choice of classifier and how the ratio of
positive to negative samples in both training and testing affect prediction
results - further enabling practical use of these concepts for social influence
applications.Comment: 8 pages, 5 figure
Toward Order-of-Magnitude Cascade Prediction
When a piece of information (microblog, photograph, video, link, etc.) starts
to spread in a social network, an important question arises: will it spread to
"viral" proportions -- where "viral" is defined as an order-of-magnitude
increase. However, several previous studies have established that cascade size
and frequency are related through a power-law - which leads to a severe
imbalance in this classification problem. In this paper, we devise a suite of
measurements based on "structural diversity" -- the variety of social contexts
(communities) in which individuals partaking in a given cascade engage. We
demonstrate these measures are able to distinguish viral from non-viral
cascades, despite the severe imbalance of the data for this problem. Further,
we leverage these measurements as features in a classification approach,
successfully predicting microblogs that grow from 50 to 500 reposts with
precision of 0.69 and recall of 0.52 for the viral class - despite this class
comprising under 2\% of samples. This significantly outperforms our baseline
approach as well as the current state-of-the-art. Our work also demonstrates
how we can tradeoff between precision and recall.Comment: 4 pages, 15 figures, ASONAM 2015 poster pape
Fair Learning to Rank with Distribution-free Risk Control
Learning to Rank (LTR) methods are vital in online economies, affecting users
and item providers. Fairness in LTR models is crucial to allocate exposure
proportionally to item relevance. The deterministic ranking model can lead to
unfair exposure distribution when items with the same relevance receive
slightly different scores. Stochastic LTR models, incorporating the
Plackett-Luce (PL) model, address fairness issues but have limitations in
computational cost and performance guarantees. To overcome these limitations,
we propose FairLTR-RC, a novel post-hoc model-agnostic method. FairLTR-RC
leverages a pretrained scoring function to create a stochastic LTR model,
eliminating the need for expensive training. Furthermore, FairLTR-RC provides
finite-sample guarantees on a user-specified utility using distribution-free
risk control framework. By additionally incorporating the Thresholded PL (TPL)
model, we are able to achieve an effective trade-off between utility and
fairness. Experimental results on several benchmark datasets demonstrate that
FairLTR-RC significantly improves fairness in widely-used deterministic LTR
models while guaranteeing a specified level of utility.Comment: 13 pages, 4 figure