663 research outputs found
A Model for Personalized Keyword Extraction from Web Pages using Segmentation
The World Wide Web caters to the needs of billions of users in heterogeneous
groups. Each user accessing the World Wide Web might have his / her own
specific interest and would expect the web to respond to the specific
requirements. The process of making the web to react in a customized manner is
achieved through personalization. This paper proposes a novel model for
extracting keywords from a web page with personalization being incorporated
into it. The keyword extraction problem is approached with the help of web page
segmentation which facilitates in making the problem simpler and solving it
effectively. The proposed model is implemented as a prototype and the
experiments conducted on it empirically validate the model's efficiency.Comment: 6 Pages, 2 Figure
Automatic domain ontology extraction for context-sensitive opinion mining
Automated analysis of the sentiments presented in online consumer feedbacks can facilitate both organizations’ business strategy development and individual consumers’ comparison shopping. Nevertheless, existing opinion mining methods either adopt a context-free sentiment classification approach or rely on a large number of manually annotated training examples to perform context sensitive sentiment classification. Guided by the design science research methodology, we illustrate the design, development, and evaluation of a novel fuzzy domain ontology based contextsensitive opinion mining system. Our novel ontology extraction mechanism underpinned by a variant of Kullback-Leibler divergence can automatically acquire contextual sentiment knowledge across various product domains to improve the sentiment analysis processes. Evaluated based on a benchmark dataset and real consumer reviews collected from Amazon.com, our system shows remarkable performance improvement over the context-free baseline
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
NLP-Based Techniques for Cyber Threat Intelligence
In the digital era, threat actors employ sophisticated techniques for which,
often, digital traces in the form of textual data are available. Cyber Threat
Intelligence~(CTI) is related to all the solutions inherent to data collection,
processing, and analysis useful to understand a threat actor's targets and
attack behavior. Currently, CTI is assuming an always more crucial role in
identifying and mitigating threats and enabling proactive defense strategies.
In this context, NLP, an artificial intelligence branch, has emerged as a
powerful tool for enhancing threat intelligence capabilities. This survey paper
provides a comprehensive overview of NLP-based techniques applied in the
context of threat intelligence. It begins by describing the foundational
definitions and principles of CTI as a major tool for safeguarding digital
assets. It then undertakes a thorough examination of NLP-based techniques for
CTI data crawling from Web sources, CTI data analysis, Relation Extraction from
cybersecurity data, CTI sharing and collaboration, and security threats of CTI.
Finally, the challenges and limitations of NLP in threat intelligence are
exhaustively examined, including data quality issues and ethical
considerations. This survey draws a complete framework and serves as a valuable
resource for security professionals and researchers seeking to understand the
state-of-the-art NLP-based threat intelligence techniques and their potential
impact on cybersecurity
Effective Internet Search Strategies: Internet Search Engines, Meta-Indexes and Web Directories
Searching the World Wide Web can be a daunting task. The Web has expanded at such a rapid pace that nobody knows exactly how large it is, but it is safe to say that there are many billions of web pages residing on servers all over the world. Add to this scenario the hundreds of different search tools available to choose among – including directories, search engines, meta-searchers, and specialized search engines – and the situation begins to feel overwhelming. Fortunately, learning a few essential concepts of Web searching, along with mastering a handful of the top-rated search tools, can make the picture much brighter. Simply knowing how to choose the right tool for your information need can make all the difference. This paper will first discuss basic concepts and terms you must know to be an effective searcher. Next, it will in turn examine each of the major categories of search tools, and recommend the best search engines and directories currently availabl
Sentiment Analysis using Improved Novel Convolutional Neural Network (SNCNN)
Sentiment Analysis is an important method in which many researchers are working on the automated approach for extraction and analysis of huge volumes of user achieved data, which are accessible on social networking websites. This approach helps in analyzing the direct falls under the domain of SA. SA comprises the vast field of effective classification of user-initiated text under defined polarities. The proposed work includes four major steps for solving these issues: the first step is preprocessing which holds tokenization, stop word removal, stemming, cleaning up of unwanted text information like removing of Ads from Web pages, Text normalization for converting binary format. Secondly, the Feature extraction is based on the Bag words, Word2Vec and TF-ID which is a Term Frequency-Inverse Document Frequency. Thirdly, this feature selection includes the procedure for examining semantic gaps along with source features using teaching models and this involves target task characteristic application for Improved Novel Convolutional Neural Network (INCNN). The Feature Selection accompanies the procedure of Information Gain (IG) and PCC which is a Pearson Correlation Coefficient. Finally, the classification step INCNN gives out sentiment posts and responses for the user-based post aspects which helps in enhancing the system performance. The experimental outcome proposes the INCNN algorithm and provides higher performance rather than the existing approach. The proposed INCNN classifier results in highest accuracy
CAPTCHaStar! A novel CAPTCHA based on interactive shape discovery
Over the last years, most websites on which users can register (e.g., email
providers and social networks) adopted CAPTCHAs (Completely Automated Public
Turing test to tell Computers and Humans Apart) as a countermeasure against
automated attacks. The battle of wits between designers and attackers of
CAPTCHAs led to current ones being annoying and hard to solve for users, while
still being vulnerable to automated attacks.
In this paper, we propose CAPTCHaStar, a new image-based CAPTCHA that relies
on user interaction. This novel CAPTCHA leverages the innate human ability to
recognize shapes in a confused environment. We assess the effectiveness of our
proposal for the two key aspects for CAPTCHAs, i.e., usability, and resiliency
to automated attacks. In particular, we evaluated the usability, carrying out a
thorough user study, and we tested the resiliency of our proposal against
several types of automated attacks: traditional ones; designed ad-hoc for our
proposal; and based on machine learning. Compared to the state of the art, our
proposal is more user friendly (e.g., only some 35% of the users prefer current
solutions, such as text-based CAPTCHAs) and more resilient to automated
attacks.Comment: 15 page
- …