16,759 research outputs found
REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums
How can we extract useful information from a security forum? We focus on
identifying threads of interest to a security professional: (a) alerts of
worrisome events, such as attacks, (b) offering of malicious services and
products, (c) hacking information to perform malicious acts, and (d) useful
security-related experiences. The analysis of security forums is in its infancy
despite several promising recent works. Novel approaches are needed to address
the challenges in this domain: (a) the difficulty in specifying the "topics" of
interest efficiently, and (b) the unstructured and informal nature of the text.
We propose, REST, a systematic methodology to: (a) identify threads of interest
based on a, possibly incomplete, bag of words, and (b) classify them into one
of the four classes above. The key novelty of the work is a multi-step weighted
embedding approach: we project words, threads and classes in appropriate
embedding spaces and establish relevance and similarity there. We evaluate our
method with real data from three security forums with a total of 164k posts and
21K threads. First, REST robustness to initial keyword selection can extend the
user-provided keyword set and thus, it can recover from missing keywords.
Second, REST categorizes the threads into the classes of interest with superior
accuracy compared to five other methods: REST exhibits an accuracy between
63.3-76.9%. We see our approach as a first step for harnessing the wealth of
information of online forums in a user-friendly way, since the user can loosely
specify her keywords of interest
Intelligent Personalized Searching
Search engine is a very useful tool for almost everyone nowadays. People use search engine for the purpose of searching about their personal finance, restaurants, electronic products, and travel information, to name a few. As helpful as search engines are in terms of providing information, they can also manipulate people behaviors because most people trust online information without a doubt. Furthermore, ordinary users usually only pay attention the highest-ranking pages from the search results. Knowing this predictable user behavior, search engine providers such as Google and Yahoo take advantage and use it as a tool for them to generate profit. Search engine providers are enterprise companies with the goal to generate profit, and an easy way for them to do so is by ranking up particular web pages to promote the product or services of their own or their paid customers. The results from search engine could be misleading. The goal of this project is to filter the bias from search results and provide best matches on behalf of users’ interest
Recommended from our members
REST: A thread embedding approach for identifying and classifying user-specified information in security forums
Bibliometric cartography of information retrieval research by using co-word analysis
The aim of this study is to map the intellectual structure of the field of Information Retrieval (IR) during the period of 1987-1997. Co-word analysis was employed to reveal patterns and trends in the IR field by measuring the association strengths of terms representative of relevant publications or other texts produced in IR field. Data were collected from Science Citation Index (SCI) and Social Science Citation Index (SSCI) for the period of 1987-1997. In addition to the keywords added by the SCI and SSCI databases, other important keywords were extracted from titles and abstracts manually. These keywords were further standardized using vocabulary control tools. In order to trace the dynamic changes of the IR field, the whole 11-year period was further separated into two consecutive periods: 1987-1991 and 1992-1997. The results show that the IR field has some established research themes and it also changes rapidly to embrace new themes
Data-driven Job Search Engine Using Skills and Company Attribute Filters
According to a report online, more than 200 million unique users search for
jobs online every month. This incredibly large and fast growing demand has
enticed software giants such as Google and Facebook to enter this space, which
was previously dominated by companies such as LinkedIn, Indeed and
CareerBuilder. Recently, Google released their "AI-powered Jobs Search Engine",
"Google For Jobs" while Facebook released "Facebook Jobs" within their
platform. These current job search engines and platforms allow users to search
for jobs based on general narrow filters such as job title, date posted,
experience level, company and salary. However, they have severely limited
filters relating to skill sets such as C++, Python, and Java and company
related attributes such as employee size, revenue, technographics and
micro-industries. These specialized filters can help applicants and companies
connect at a very personalized, relevant and deeper level. In this paper we
present a framework that provides an end-to-end "Data-driven Jobs Search
Engine". In addition, users can also receive potential contacts of recruiters
and senior positions for connection and networking opportunities. The high
level implementation of the framework is described as follows: 1) Collect job
postings data in the United States, 2) Extract meaningful tokens from the
postings data using ETL pipelines, 3) Normalize the data set to link company
names to their specific company websites, 4) Extract and ranking the skill
sets, 5) Link the company names and websites to their respective company level
attributes with the EVERSTRING Company API, 6) Run user-specific search queries
on the database to identify relevant job postings and 7) Rank the job search
results. This framework offers a highly customizable and highly targeted search
experience for end users.Comment: 8 pages, 10 figures, ICDM 201
- …