6,806 research outputs found
Salience and Market-aware Skill Extraction for Job Targeting
At LinkedIn, we want to create economic opportunity for everyone in the
global workforce. To make this happen, LinkedIn offers a reactive Job Search
system, and a proactive Jobs You May Be Interested In (JYMBII) system to match
the best candidates with their dream jobs. One of the most challenging tasks
for developing these systems is to properly extract important skill entities
from job postings and then target members with matched attributes. In this
work, we show that the commonly used text-based \emph{salience and
market-agnostic} skill extraction approach is sub-optimal because it only
considers skill mention and ignores the salient level of a skill and its market
dynamics, i.e., the market supply and demand influence on the importance of
skills. To address the above drawbacks, we present \model, our deployed
\emph{salience and market-aware} skill extraction system. The proposed \model
~shows promising results in improving the online performance of job
recommendation (JYMBII) ( job apply) and skill suggestions for job
posters ( suggestion rejection rate). Lastly, we present case studies to
show interesting insights that contrast traditional skill recognition method
and the proposed \model~from occupation, industry, country, and individual
skill levels. Based on the above promising results, we deployed the \model
~online to extract job targeting skills for all M job postings served at
LinkedIn.Comment: 9 pages, to appear in KDD202
Precision and Recall Reject Curves for Classification
For some classification scenarios, it is desirable to use only those
classification instances that a trained model associates with a high certainty.
To obtain such high-certainty instances, previous work has proposed
accuracy-reject curves. Reject curves allow to evaluate and compare the
performance of different certainty measures over a range of thresholds for
accepting or rejecting classifications. However, the accuracy may not be the
most suited evaluation metric for all applications, and instead precision or
recall may be preferable. This is the case, for example, for data with
imbalanced class distributions. We therefore propose reject curves that
evaluate precision and recall, the recall-reject curve and the precision-reject
curve. Using prototype-based classifiers from learning vector quantization, we
first validate the proposed curves on artificial benchmark data against the
accuracy reject curve as a baseline. We then show on imbalanced benchmarks and
medical, real-world data that for these scenarios, the proposed precision- and
recall-curves yield more accurate insights into classifier performance than
accuracy reject curves.Comment: 11 pages, 3 figures. Updated figure label
Towards Safe Machine Learning for CPS: Infer Uncertainty from Training Data
Machine learning (ML) techniques are increasingly applied to decision-making
and control problems in Cyber-Physical Systems among which many are
safety-critical, e.g., chemical plants, robotics, autonomous vehicles. Despite
the significant benefits brought by ML techniques, they also raise additional
safety issues because 1) most expressive and powerful ML models are not
transparent and behave as a black box and 2) the training data which plays a
crucial role in ML safety is usually incomplete. An important technique to
achieve safety for ML models is "Safe Fail", i.e., a model selects a reject
option and applies the backup solution, a traditional controller or a human
operator for example, when it has low confidence in a prediction.
Data-driven models produced by ML algorithms learn from training data, and
hence they are only as good as the examples they have learnt. As pointed in
[17], ML models work well in the "training space" (i.e., feature space with
sufficient training data), but they could not extrapolate beyond the training
space. As observed in many previous studies, a feature space that lacks
training data generally has a much higher error rate than the one that contains
sufficient training samples [31]. Therefore, it is essential to identify the
training space and avoid extrapolating beyond the training space. In this
paper, we propose an efficient Feature Space Partitioning Tree (FSPT) to
address this problem. Using experiments, we also show that, a strong
relationship exists between model performance and FSPT score.Comment: Publication rights licensed to AC
- β¦