7,590 research outputs found
Predicting Phishing Websites using Neural Network trained with Back-Propagation
Phishing is increasing dramatically with the development of modern technologies and the global worldwide computer networks. This results in the loss of customerâs confidence in e-commerce and online banking, financial damages, and identity theft. Phishing is fraudulent effort aims to acquire sensitive information from users such as credit card credentials, and social security number. In this article, we propose a model for predicting phishing attacks based on Artificial Neural Network (ANN). A Feed Forward Neural Network trained by Back Propagation algorithm is developed to classify websites as phishing or legitimate. The suggested model shows high acceptance ability for noisy data, fault tolerance and high prediction accuracy with respect to false positive and false negative rates
I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis
Revelations of large scale electronic surveillance and data mining by
governments and corporations have fueled increased adoption of HTTPS. We
present a traffic analysis attack against over 6000 webpages spanning the HTTPS
deployments of 10 widely used, industry-leading websites in areas such as
healthcare, finance, legal services and streaming video. Our attack identifies
individual pages in the same website with 89% accuracy, exposing personal
details including medical conditions, financial and legal affairs and sexual
orientation. We examine evaluation methodology and reveal accuracy variations
as large as 18% caused by assumptions affecting caching and cookies. We present
a novel defense reducing attack accuracy to 27% with a 9% traffic increase, and
demonstrate significantly increased effectiveness of prior defenses in our
evaluation context, inclusive of enabled caching, user-specific cookies and
pages within the same website
Support Vector Machine Histogram: New Analysis and Architecture Design Method of Deep Convolutional Neural Network
Deep convolutional neural network (DCNN) is a kind of hierarchical neural network models and attracts attention in recent years since it has shown high classification performance. DCNN can acquire the feature representation which is a parameter indicating the feature of the input by learning. However, its internal analysis and the design of the network architecture have many unclear points and it cannot be said that it has been sufficiently elucidated. We propose the novel DCNN analysis method âSupport vector machine (SVM) histogramâ as a prescription to deal with these problems. This is a method that examines the spatial distribution of DCNN extracted feature representation by using the decision boundary of linear SVM. We show that we can interpret DCNN hierarchical processing using this method. In addition, by using the result of SVM histogram, DCNN architecture design becomes possible. In this study, we designed the architecture of the application to large scale natural image dataset. In the result, we succeeded in showing higher accuracy than the original DCNN
The Best Answers? Think Twice: Online Detection of Commercial Campaigns in the CQA Forums
In an emerging trend, more and more Internet users search for information
from Community Question and Answer (CQA) websites, as interactive communication
in such websites provides users with a rare feeling of trust. More often than
not, end users look for instant help when they browse the CQA websites for the
best answers. Hence, it is imperative that they should be warned of any
potential commercial campaigns hidden behind the answers. However, existing
research focuses more on the quality of answers and does not meet the above
need. In this paper, we develop a system that automatically analyzes the hidden
patterns of commercial spam and raises alarms instantaneously to end users
whenever a potential commercial campaign is detected. Our detection method
integrates semantic analysis and posters' track records and utilizes the
special features of CQA websites largely different from those in other types of
forums such as microblogs or news reports. Our system is adaptive and
accommodates new evidence uncovered by the detection algorithms over time.
Validated with real-world trace data from a popular Chinese CQA website over a
period of three months, our system shows great potential towards adaptive
online detection of CQA spams.Comment: 9 pages, 10 figure
Expanding the Usage of Web Archives by Recommending Archived Webpages Using Only the URI
Web archives are a window to view past versions of webpages. When a user requests a webpage on the live Web, such as http://tripadvisor.com/where_to_t ravel/, the webpage may not be found, which results in an HyperText Transfer Protocol (HTTP) 404 response. The user then may search for the webpage in a Web archive, such as the Internet Archive. Unfortunately, if this page had never been archived, the user will not be able to view the page, nor will the user gain any information on other webpages that have similar content in the archive, such as the archived webpage http://classy-travel.net. Similarly, if the user requests the webpage http://hokiesports.com/football/ from the Internet Archive, the user will only find the requested webpage, and the user will not gain any information on other webpages that have similar content in the archive, such as the archived webpage http://techsideline.com. In this research, we will build a model for selecting and ranking possible recommended webpages at a Web archive. This is to enhance both HTTP 404 responses and HTTP 200 responses by surfacing webpages in the archive that the user may not know existed. First, we detect semantics in the requested Uniform Resource Identifier (URI). Next, we classify the URI using an ontology, such as DMOZ or any website directory. Finally, we filter and rank candidates based on several features, such as archival quality, webpage popularity, temporal similarity, and content similarity. We measure the performance of each step using different techniques, including calculating the F1 to measure of different tokenization methods and the classification. We tested the model using human evaluation to determine if we could classify and find recommendations for a sample of requests from the Internet Archiveâs Wayback Machine access log. Overall, when selecting the full categorization, reviewers agreed with 80.3% of the recommendations, which is much higher than âdo not agreeâ and âI do not knowâ. This indicates the reviewer is more likely to agree on the recommendations when selecting the full categorization. But when selecting the first level only, reviewers only agreed with 25.5% of the recommendations. This indicates that having deep level categorization improves the performance of finding relevant recommendations
BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology
This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software
Know Your Phish: Novel Techniques for Detecting Phishing Sites and Their Targets
Phishing is a major problem on the Web. Despite the significant attention it has received over the years, there has been no definitive solution. While the state-of-the-art solutions have reasonably good performance, they require a large amount of training data and are not adept at detecting phishing attacks against new targets. In this paper, we begin with two core observations: (a) although phishers try to make a phishing webpage look similar to its target, they do not have unlimited freedom in structuring the phishing webpage, and (b) a webpage can be characterized by a small set of key terms, how these key terms are used in different parts of a webpage is different in the case of legitimate and phishing webpages. Based on these observations, we develop a phishing detection system with several notable properties: it requires very little training data, scales well to much larger test data, is language-independent, fast, resilient to adaptive attacks and implemented entirely on client-side. In addition, we developed a target identification component that can identify the target website that a phishing webpage is attempting to mimic. The target detection component is faster than previously reported systems and can help minimize false positives in our phishing detection system.Peer reviewe
- âŠ