Search CORE

162 research outputs found

Fine Grained Approach for Domain Specific Seed URL Extraction

Author: Sanagavarapu Lalit Mohan
Sarangi Sourav
Varma Vasudeva
Y Raghu Reddy
Publication venue: AIS Electronic Library (AISeL)
Publication date: 03/01/2018
Field of study

Domain Specific Search Engines are expected to provide relevant search results. Availability of enormous number of URLs across subdomains improves relevance of domain specific search engines. The current methods for seed URLs can be systematic ensuring representation of subdomains. We propose a fine grained approach for automatic extraction of seed URLs at subdomain level using Wikipedia and Twitter as repositories. A SeedRel metric and a Diversity Index for seed URL relevance are proposed to measure subdomain coverage. We implemented our approach for \u27Security - Information and Cyber\u27 domain and identified 34,007 Seed URLs and 400,726 URLs across subdomains. The measured Diversity index value of 2.10 conforms that all subdomains are represented, hence, a relevant \u27Security Search Engine\u27 can be built. Our approach also extracted more URLs (seed and child) as compared to existing approaches for URL extraction

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

The dominant of Bloggers in Malaysian politics through social networks

Author: Abd. Rozan Mohd. Zaidi
M. Nasir Ahmad Nadzri
Selamat Ali
Selamat Hafiz
Publication venue
Publication date: 01/02/2009
Field of study

Every country in this world has own political issues. In Malaysia for example, political issues played an important role that can influence other factors such as social and economy. As we all know, political factor can give positive and negative effect to a situation in Malaysia. The frequent usage of computer nowadays by Malaysian people helps in spreading information and news about political situation in Malaysia through cyberspace. In this paper, we use web mining system with Artificial Immune System (AIS) to regain a small group of relevant websites and webpages on political issues in Malaysia. To analyze the relationship between website and webpages, the concept of social networks will be used. Result from the web mining system with AIS will be used to understand the impact of social network to the political situation in Malaysia

Universiti Teknologi Malaysia Institutional Repository

An Evasion Attack against ML-based Phishing URL Detectors

Author: Babar M. Ali
Gaire Raj
Sabir Bushra
Publication venue
Publication date: 18/05/2020
Field of study

Background: Over the year, Machine Learning Phishing URL classification (MLPU) systems have gained tremendous popularity to detect phishing URLs proactively. Despite this vogue, the security vulnerabilities of MLPUs remain mostly unknown. Aim: To address this concern, we conduct a study to understand the test time security vulnerabilities of the state-of-the-art MLPU systems, aiming at providing guidelines for the future development of these systems. Method: In this paper, we propose an evasion attack framework against MLPU systems. To achieve this, we first develop an algorithm to generate adversarial phishing URLs. We then reproduce 41 MLPU systems and record their baseline performance. Finally, we simulate an evasion attack to evaluate these MLPU systems against our generated adversarial URLs. Results: In comparison to previous works, our attack is: (i) effective as it evades all the models with an average success rate of 66% and 85% for famous (such as Netflix, Google) and less popular phishing targets (e.g., Wish, JBHIFI, Officeworks) respectively; (ii) realistic as it requires only 23ms to produce a new adversarial URL variant that is available for registration with a median cost of only $11.99/year. We also found that popular online services such as Google SafeBrowsing and VirusTotal are unable to detect these URLs. (iii) We find that Adversarial training (successful defence against evasion attack) does not significantly improve the robustness of these systems as it decreases the success rate of our attack by only 6% on average for all the models. (iv) Further, we identify the security vulnerabilities of the considered MLPU systems. Our findings lead to promising directions for future research. Conclusion: Our study not only illustrate vulnerabilities in MLPU systems but also highlights implications for future study towards assessing and improving these systems.Comment: Draft for ACM TOP

arXiv.org e-Print Archive

A Brief History of Web Crawlers

Author: Bochmann Gregor V.
Dinçktürk Mustafa Emre
Hooshmand Salman
Jourdan Guy-Vincent
Mirtaheri Seyed M.
Onut Iosif Viorel
Publication venue
Publication date: 04/05/2014
Field of study

Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different technique and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance of web crawlers. Based on these criteria we plot the evolution of web crawlers and compare their performanc

arXiv.org e-Print Archive

CiteSeerX

Research Directions, Challenges and Issues in Opinion Mining

Author: Hariharan Shanmugasundaram
Lu Joan
Sudhakaran Periakaruppan
Publication venue: 'Science and Engineering Research Support Society'
Publication date: 01/01/2013
Field of study

Rapid growth of Internet and availability of user reviews on the web for any product has provided a need for an effective system to analyze the web reviews. Such reviews are useful to some extent, promising both the customers and product manufacturers. For any popular product, the number of reviews can be in hundreds or even thousands. This creates difficulty for a customer to analyze them and make important decisions on whether to purchase the product or to not. Mining such product reviews or opinions is termed as opinion mining which is broadly classified into two main categories namely facts and opinions. Though there are several approaches for opinion mining, there remains a challenge to decide on the recommendation provided by the system. In this paper, we analyze the basics of opinion mining, challenges, pros & cons of past opinion mining systems and provide some directions for the future research work, focusing on the challenges and issues

Crossref

University of Huddersfield Repository