872 research outputs found
Leveraging Knowledge Graphs for Orphan Entity Allocation in Resume Processing
Significant challenges are posed in talent acquisition and recruitment by
processing and analyzing unstructured data, particularly resumes. This research
presents a novel approach for orphan entity allocation in resume processing
using knowledge graphs. Techniques of association mining, concept extraction,
external knowledge linking, named entity recognition, and knowledge graph
construction are integrated into our pipeline. By leveraging these techniques,
the aim is to automate and enhance the efficiency of the job screening process
by successfully bucketing orphan entities within resumes. This allows for more
effective matching between candidates and job positions, streamlining the
resume screening process, and enhancing the accuracy of candidate-job matching.
The approach's exceptional effectiveness and resilience are highlighted through
extensive experimentation and evaluation, ensuring that alternative measures
can be relied upon for seamless processing and orphan entity allocation in case
of any component failure. The capabilities of knowledge graphs in generating
valuable insights through intelligent information extraction and
representation, specifically in the domain of categorizing orphan entities, are
highlighted by the results of our research.Comment: In Proceedings of the 2023 IEEE International Conference on
Artificial Intelligence in Engineering and Technology (IICAIET
Break Down Resumes into Sections to Extract Data and Perform Text Analysis using Python
The objective of AI-based resume screening is to automate the screening process, and text, keyword, and named entity recognition extraction are critical. This paper discusses segmenting resumes in order to extract data and perform text analysis. The raw CV file has been imported, and the resume data cleaned to remove extra spaces, punctuation and stop words. To extract names from resumes, regular expressions are used. We have also used the spaCy library which is considered the most accurate natural language processing library. It includes already-trained models for entity recognition, parsing, and tagging. The experimental method is used with resume data sourced from Kaggle, and external Source (MTIS)
Distilling Large Language Models using Skill-Occupation Graph Context for HR-Related Tasks
Numerous HR applications are centered around resumes and job descriptions.
While they can benefit from advancements in NLP, particularly large language
models, their real-world adoption faces challenges due to absence of
comprehensive benchmarks for various HR tasks, and lack of smaller models with
competitive capabilities. In this paper, we aim to bridge this gap by
introducing the Resume-Job Description Benchmark (RJDB). We meticulously craft
this benchmark to cater to a wide array of HR tasks, including matching and
explaining resumes to job descriptions, extracting skills and experiences from
resumes, and editing resumes. To create this benchmark, we propose to distill
domain-specific knowledge from a large language model (LLM). We rely on a
curated skill-occupation graph to ensure diversity and provide context for LLMs
generation. Our benchmark includes over 50 thousand triples of job
descriptions, matched resumes and unmatched resumes. Using RJDB, we train
multiple smaller student models. Our experiments reveal that the student models
achieve near/better performance than the teacher model (GPT-4), affirming the
effectiveness of the benchmark. Additionally, we explore the utility of RJDB on
out-of-distribution data for skill extraction and resume-job description
matching, in zero-shot and weak supervision manner. We release our datasets and
code to foster further research and industry applications
Assessing Bias Removal from Word Embeddings
As machine learning becomes more influential in everyday life, we must begin addressing potential shortcomings. A current problem area is word embeddings, frameworks that transform words into numbers, allowing the algorithmic analysis of language. Without a method for filtering implicit human bias from the documents used to create these embeddings, they contain and propagate stereotypes. Previous work has shown that one commonly used and distributed word embedding model trained on articles from Google News contained prejudice between gender and occupation (Bolukbasi 2016). While unsurprising, the use of biased data in machine learning models only serves to amplify the problem. Although attempts have been made to remove or reduce these biases, a true solution has yet to be found. Hiring models, tools trained to identify well-fitting job candidates, show the impact of gender stereotypes on occupations. Companies like Amazon have abandoned these systems due to flawed decision-making, even after years of development.
I investigated whether the technique of word embedding adjustments from Bolukbasi 2016 made a difference in the results of an emulated hiring model. After collecting and cleaning resumes and job postings, I created a model that predicted whether candidates were a good fit for a job based on a training set of resumes from those already hired. To assess differences, I built the same model with different word vectors, including the original and adjusted word2vec embedding. Results were expected to show some form of bias on classification. I conclude with potential improvements and additional work being done
Automatic Job Skill Taxonomy Generation For Recruitment Systems
The goal of this thesis is to optimize the job recommendation systems by automatically extracting the skills from the job descriptions. With rapid development in technology, new skills are continuously required. This makes the skill tagging of the job descriptions a more difficult problem since a simple keyword match from an already generated skill list is not suitable. A way of automatically populating the skills list to improve the job search engines is needed. This thesis focuses on solving this problem with the help of natural language processing and neural networks. Automatic detection of skills in the unstructured job description dataset is a complex problem as it involves being robust to the ambiguity of natural language and adapting to words not seen in the historical data. This thesis solves this problem by using recurrent neural network models for capturing the context of the skill words. Based on the context captured, the new system is capable of predicting if the word in the given text is a skill or not. Neural network models like Long short-term memory and Bi-directional Long short-term memory are used to capture the long term dependencies in the sentence to identify skills present in the job descriptions. Various natural language processing techniques were utilized to improve the input feature quality to the model. Results obtained from using context before and after the skill words have shown the best results in identifying skills from textual data. This can be applied to capture skills data from job ads as well as it can be extended to extract the skill features from resume data to improve the job recommendation results in the future
Human Resources Recommender system based on discrete variables
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceNatural Language Processing and Understanding has become one of the most exciting and challenging
fields in the area of Artificial Intelligence and Machine Learning. With the rapidly changing business
environment and surroundings, the importance of having the data transformed in such a way that
makes it easy to interpret is the greatest competitive advantage a company can have. Having said this,
the purpose of this thesis dissertation is to implement a recommender system for the Human
Resources department in a company that will aid the decision-making process of filling a specific job
position with the right candidate. The recommender system fill be fed with applicants, each being
represented by their skills, and will produce a subset of most adequate candidates given a job position.
This work uses StarSpace, a novelty neural embedding model, whose aim is to represent entities in a
common vectorial space and further perform similarity measures amongst them
- …