1,402 research outputs found
Data-driven Job Search Engine Using Skills and Company Attribute Filters
According to a report online, more than 200 million unique users search for
jobs online every month. This incredibly large and fast growing demand has
enticed software giants such as Google and Facebook to enter this space, which
was previously dominated by companies such as LinkedIn, Indeed and
CareerBuilder. Recently, Google released their "AI-powered Jobs Search Engine",
"Google For Jobs" while Facebook released "Facebook Jobs" within their
platform. These current job search engines and platforms allow users to search
for jobs based on general narrow filters such as job title, date posted,
experience level, company and salary. However, they have severely limited
filters relating to skill sets such as C++, Python, and Java and company
related attributes such as employee size, revenue, technographics and
micro-industries. These specialized filters can help applicants and companies
connect at a very personalized, relevant and deeper level. In this paper we
present a framework that provides an end-to-end "Data-driven Jobs Search
Engine". In addition, users can also receive potential contacts of recruiters
and senior positions for connection and networking opportunities. The high
level implementation of the framework is described as follows: 1) Collect job
postings data in the United States, 2) Extract meaningful tokens from the
postings data using ETL pipelines, 3) Normalize the data set to link company
names to their specific company websites, 4) Extract and ranking the skill
sets, 5) Link the company names and websites to their respective company level
attributes with the EVERSTRING Company API, 6) Run user-specific search queries
on the database to identify relevant job postings and 7) Rank the job search
results. This framework offers a highly customizable and highly targeted search
experience for end users.Comment: 8 pages, 10 figures, ICDM 201
Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities
One of the major hurdles preventing the full exploitation of information from
online communities is the widespread concern regarding the quality and
credibility of user-contributed content. Prior works in this domain operate on
a static snapshot of the community, making strong assumptions about the
structure of the data (e.g., relational tables), or consider only shallow
features for text classification.
To address the above limitations, we propose probabilistic graphical models
that can leverage the joint interplay between multiple factors in online
communities --- like user interactions, community dynamics, and textual content
--- to automatically assess the credibility of user-contributed online content,
and the expertise of users and their evolution with user-interpretable
explanation. To this end, we devise new models based on Conditional Random
Fields for different settings like incorporating partial expert knowledge for
semi-supervised learning, and handling discrete labels as well as numeric
ratings for fine-grained analysis. This enables applications such as extracting
reliable side-effects of drugs from user-contributed posts in healthforums, and
identifying credible content in news communities.
Online communities are dynamic, as users join and leave, adapt to evolving
trends, and mature over time. To capture this dynamics, we propose generative
models based on Hidden Markov Model, Latent Dirichlet Allocation, and Brownian
Motion to trace the continuous evolution of user expertise and their language
model over time. This allows us to identify expert users and credible content
jointly over time, improving state-of-the-art recommender systems by explicitly
considering the maturity of users. This also enables applications such as
identifying helpful product reviews, and detecting fake and anomalous reviews
with limited information.Comment: PhD thesis, Mar 201
Solving the Difficult Problem of Topic Extraction in Thai Tweets
We tackled in this study the difficult problem of topic extraction in Thai tweets on the country’s historic flood in 2011. After using Latent Dirichlet Allocation (LDA) to extract the topics, the first difficulty that faced us was the inaccuracy the word segmentation task that affected our interpretation of the LDA result. To solve this, we refined the stop word list from the LDA result by removing uninformative words caused by the word segmentation, which resulted to a more relevant and comprehensible outcome. With the improved results, we then constructed a rule-based categorization model and used it to categorize all the collected tweets on a per-week scale to observe changes in tweeting trend. Not only did the categories reveal the most relevant and compelling topics that people raised at that time, they also allowed us to understand how people perceived the situations as they unfold over tim
Opening up to big data: computer-assisted analysis of textual data in social sciences
"Two developments in computational text analysis may change the way qualitative data analysis in social sciences is performed: 1. the availability of digital text worth to investigate is growing rapidly, and 2. the improvement of algorithmic information extraction approaches, also called text mining, allows for further bridging the gap between qualitative and quantitative text analysis. The key factor hereby is the inclusion of context into computational linguistic models which extends conventional computational content analysis towards the extraction of meaning. To clarify methodological differences of various computer-assisted text analysis approaches the article suggests a typology from the perspective of a qualitative researcher. This typology shows compatibilities between manual qualitative data analysis methods and computational, rather quantitative approaches for large scale mixed method text analysis designs." (author's abstract
Sentiment Analysis or Opinion Mining: A Review
Opinion Mining (OM) or Sentiment Analysis (SA) can be defined as the task of detecting, extracting and classifying opinions on something. It is a type of the processing of the natural language (NLP) to track the public mood to a certain law, policy, or marketing, etc. It involves a way that development for the collection and examination of comments and opinions about legislation, laws, policies, etc., which are posted on the social media. The process of information extraction is very important because it is a very useful technique but also a challenging task. That mean, to extract sentiment from an object in the web-wide, need to automate opinion-mining systems to do it. The existing techniques for sentiment analysis include machine learning (supervised and unsupervised), and lexical-based approaches. Hence, the main aim of this paper presents a survey of sentiment analysis (SA) and opinion mining (OM) approaches, various techniques used that related in this field. As well, it discusses the application areas and challenges for sentiment analysis with insight into the past researcher's works
Online Crowds Opinion-Mining it to Analyze Current Trend: A Review
Online presence of the user has increased, there is a huge growth in the number of active users and thus the volume of data created on the online social networks is massive. Much are concentrating on the Internet Lingo. Notably most of the data on the social networking sites is made public which opens doors for companies, researchers and analyst to collect and analyze the data. We have huge volume of opinioned data available on the web we have to mine it so that we could get some interesting results out of it with could enhance the decision making process. In order to analyze the current scenario of what people are thinking focus is shifted towards opinion mining. This study presents a systematic literature review that contains a comprehensive overview of components of opinion mining, subjectivity of data, sources of opinion, the process and how does it let one analyze the current tendency of the online crowd in a particular context. Different perspectives from different authors regarding the above scenario have been presented. Research challenges and different applications that were developed with the motive opinion mining are also discussed
Topic modeling in marketing: recent advances and research opportunities
Using a probabilistic approach for exploring latent patterns in high-dimensional co-occurrence data, topic models offer researchers a flexible and open framework for soft-clustering large data sets. In recent years, there has been a growing interest among marketing scholars and practitioners to adopt topic models in various marketing application domains. However, to this date, there is no comprehensive overview of this rapidly evolving field. By analyzing a set of 61 published papers along with conceptual contributions, we systematically review this highly heterogeneous area of research. In doing so, we characterize extant contributions employing topic models in marketing along the dimensions data structures and retrieval of input data, implementation and extensions of basic topic models, and model performance evaluation. Our findings confirm that there is considerable progress done in various marketing sub-areas. However, there is still scope for promising future research, in particular with respect to integrating multiple, dynamic data sources, including time-varying covariates and the combination of exploratory topic models with powerful predictive marketing models
- …