Search CORE

132 research outputs found

Web Tracking: Mechanisms, Implications, and Defenses

Author: Barlet-Ros Pere
Bujlow Tomasz
Carela-Español Valentín
Solé-Pareta Josep
Publication venue
Publication date: 01/01/2015
Field of study

This articles surveys the existing literature on the methods currently used by web services to track the user online as well as their purposes, implications, and possible user's defenses. A significant majority of reviewed articles and web resources are from years 2012-2014. Privacy seems to be the Achilles' heel of today's web. Web services make continuous efforts to obtain as much information as they can about the things we search, the sites we visit, the people with who we contact, and the products we buy. Tracking is usually performed for commercial purposes. We present 5 main groups of methods used for user tracking, which are based on sessions, client storage, client cache, fingerprinting, or yet other approaches. A special focus is placed on mechanisms that use web caches, operational caches, and fingerprinting, as they are usually very rich in terms of using various creative methodologies. We also show how the users can be identified on the web and associated with their real names, e-mail addresses, phone numbers, or even street addresses. We show why tracking is being used and its possible implications for the users (price discrimination, assessing financial credibility, determining insurance coverage, government surveillance, and identity theft). For each of the tracking methods, we present possible defenses. Apart from describing the methods and tools used for keeping the personal data away from being tracked, we also present several tools that were used for research purposes - their main goal is to discover how and by which entity the users are being tracked on their desktop computers or smartphones, provide this information to the users, and visualize it in an accessible and easy to follow way. Finally, we present the currently proposed future approaches to track the user and show that they can potentially pose significant threats to the users' privacy.Comment: 29 pages, 212 reference

arXiv.org e-Print Archive

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A survey on web tracking: mechanisms, implications, and defenses

Author: Barlet Ros Pere
Bujlow Tomasz
Carela Español Valentín
Solé Pareta Josep
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Privacy seems to be the Achilles' heel of today's web. Most web services make continuous efforts to track their users and to obtain as much personal information as they can from the things they search, the sites they visit, the people they contact, and the products they buy. This information is mostly used for commercial purposes, which go far beyond targeted advertising. Although many users are already aware of the privacy risks involved in the use of internet services, the particular methods and technologies used for tracking them are much less known. In this survey, we review the existing literature on the methods used by web services to track the users online as well as their purposes, implications, and possible user's defenses. We present five main groups of methods used for user tracking, which are based on sessions, client storage, client cache, fingerprinting, and other approaches. A special focus is placed on mechanisms that use web caches, operational caches, and fingerprinting, as they are usually very rich in terms of using various creative methodologies. We also show how the users can be identified on the web and associated with their real names, e-mail addresses, phone numbers, or even street addresses. We show why tracking is being used and its possible implications for the users. For each of the tracking methods, we present possible defenses. Some of them are specific to a particular tracking approach, while others are more universal (block more than one threat). Finally, we present the future trends in user tracking and show that they can potentially pose significant threats to the users' privacy.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Understanding Flaws in the Deployment and Implementation of Web Encryption

Author: Sivakorn Suphannee
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

In recent years, the web has switched from using the unencrypted HTTP protocol to using encrypted communications. Primarily, this resulted in increasing deployment of TLS to mitigate information leakage over the network. This development has led many web service operators to mistakenly think that migrating from HTTP to HTTPS will magically protect them from information leakage without any additional effort on their end to guar- antee the desired security properties. In reality, despite the fact that there exists enough infrastructure in place and the protocols have been “tested” (by virtue of being in wide, but not ubiquitous, use for many years), deploying HTTPS is a highly challenging task due to the technical complexity of its underlying protocols (i.e., HTTP, TLS) as well as the complexity of the TLS certificate ecosystem and this of popular client applications such as web browsers. For example, we found that many websites still avoid ubiquitous encryption and force only critical functionality and sensitive data access over encrypted connections while allowing more innocuous functionality to be accessed over HTTP. In practice, this approach is prone to flaws that can expose sensitive information or functionality to third parties. Thus, it is crucial for developers to verify the correctness of their deployments and implementations. In this dissertation, in an effort to improve users’ privacy, we highlight semantic flaws in the implementations of both web servers and clients, caused by the improper deployment of web encryption protocols. First, we conduct an in-depth assessment of major websites and explore what functionality and information is exposed to attackers that have hijacked a user’s HTTP cookies. We identify a recurring pattern across websites with partially de- ployed HTTPS, namely, that service personalization inadvertently results in the exposure of private information. The separation of functionality across multiple cookies with different scopes and inter-dependencies further complicates matters, as imprecise access control renders restricted account functionality accessible to non-secure cookies. Our cookie hijacking study reveals a number of severe flaws; for example, attackers can obtain the user’s saved address and visited websites from e.g., Google, Bing, and Yahoo allow attackers to extract the contact list and send emails from the user’s account. To estimate the extent of the threat, we run measurements on a university public wireless network for a period of 30 days and detect over 282K accounts exposing the cookies required for our hijacking attacks. Next, we explore and study security mechanisms purposed to eliminate this problem by enforcing encryption such as HSTS and HTTPS Everywhere. We evaluate each mechanism in terms of its adoption and effectiveness. We find that all mechanisms suffer from implementation flaws or deployment issues and argue that, as long as servers continue to not support ubiquitous encryption across their entire domain, no mechanism can effectively protect users from cookie hijacking and information leakage. Finally, as the security guarantees of TLS (in turn HTTPS), are critically dependent on the correct validation of X.509 server certificates, we study hostname verification, a critical component in the certificate validation process. We develop HVLearn, a novel testing framework to verify the correctness of hostname verification implementations and use HVLearn to analyze a number of popular TLS libraries and applications. To this end, we found 8 unique violations of the RFC specifications. Several of these violations are critical and can render the affected implementations vulnerable to man-in-the-middle attacks

Columbia University Academic Commons

Inner-Eye: Appearance-based Detection of Computer Scams

Author: CORBETTA JACOPO
Publication venue: 'Pisa University Press'
Publication date: 17/12/2011
Field of study

As more and more inexperienced users gain Internet access, fraudsters are attempting to take advantage of them in new ways. Instead of sophisticated exploitation techniques, simple confidence tricks can be used to create malware that is both very effective and likely to evade detection by traditional security software. Heuristics that detect complex malicious behavior are powerless against some common frauds. This work explores the use of imaging and text-matching techniques to detect typical computer scams such as pharmacy and rogue antivirus frauds. The Inner-Eye system implements the chosen approach in a scalable and efficient manner through the use of virtualization

Electronic Thesis and Dissertation Archive - Università di Pisa

Detecting Abnormal Behavior in Web Applications

Author: Chu Zi
Publication venue: W&M ScholarWorks
Publication date: 01/01/2012
Field of study

The rapid advance of web technologies has made the Web an essential part of our daily lives. However, network attacks have exploited vulnerabilities of web applications, and caused substantial damages to Internet users. Detecting network attacks is the first and important step in network security. A major branch in this area is anomaly detection. This dissertation concentrates on detecting abnormal behaviors in web applications by employing the following methodology. For a web application, we conduct a set of measurements to reveal the existence of abnormal behaviors in it. We observe the differences between normal and abnormal behaviors. By applying a variety of methods in information extraction, such as heuristics algorithms, machine learning, and information theory, we extract features useful for building a classification system to detect abnormal behaviors.;In particular, we have studied four detection problems in web security. The first is detecting unauthorized hotlinking behavior that plagues hosting servers on the Internet. We analyze a group of common hotlinking attacks and web resources targeted by them. Then we present an anti-hotlinking framework for protecting materials on hosting servers. The second problem is detecting aggressive behavior of automation on Twitter. Our work determines whether a Twitter user is human, bot or cyborg based on the degree of automation. We observe the differences among the three categories in terms of tweeting behavior, tweet content, and account properties. We propose a classification system that uses the combination of features extracted from an unknown user to determine the likelihood of being a human, bot or cyborg. Furthermore, we shift the detection perspective from automation to spam, and introduce the third problem, namely detecting social spam campaigns on Twitter. Evolved from individual spammers, spam campaigns manipulate and coordinate multiple accounts to spread spam on Twitter, and display some collective characteristics. We design an automatic classification system based on machine learning, and apply multiple features to classifying spam campaigns. Complementary to conventional spam detection methods, our work brings efficiency and robustness. Finally, we extend our detection research into the blogosphere to capture blog bots. In this problem, detecting the human presence is an effective defense against the automatic posting ability of blog bots. We introduce behavioral biometrics, mainly mouse and keyboard dynamics, to distinguish between human and bot. By passively monitoring user browsing activities, this detection method does not require any direct user participation, and improves the user experience

College of William & Mary: W&M Publish

On the Generation of Cyber Threat Intelligence: Malware and Network Traffic Analyses

Author: Boukhtouta Amine
Publication venue
Publication date: 28/04/2016
Field of study

In recent years, malware authors drastically changed their course on the subject of threat design and implementation. Malware authors, namely, hackers or cyber-terrorists perpetrate new forms of cyber-crimes involving more innovative hacking techniques. Being motivated by financial or political reasons, attackers target computer systems ranging from personal computers to organizations’ networks to collect and steal sensitive data as well as blackmail, scam people, or scupper IT infrastructures. Accordingly, IT security experts face new challenges, as they need to counter cyber-threats proactively. The challenge takes a continuous allure of a fight, where cyber-criminals are obsessed by the idea of outsmarting security defenses. As such, security experts have to elaborate an effective strategy to counter cyber-criminals. The generation of cyber-threat intelligence is of a paramount importance as stated in the following quote: “the field is owned by who owns the intelligence”. In this thesis, we address the problem of generating timely and relevant cyber-threat intelligence for the purpose of detection, prevention and mitigation of cyber-attacks. To do so, we initiate a research effort, which falls into: First, we analyze prominent cyber-crime toolkits to grasp the inner-secrets and workings of advanced threats. We dissect prominent malware like Zeus and Mariposa botnets to uncover their underlying techniques used to build a networked army of infected machines. Second, we investigate cyber-crime infrastructures, where we elaborate on the generation of a cyber-threat intelligence for situational awareness. We adapt a graph-theoretic approach to study infrastructures used by malware to perpetrate malicious activities. We build a scoring mechanism based on a page ranking algorithm to measure the badness of infrastructures’ elements, i.e., domains, IPs, domain owners, etc. In addition, we use the min-hashing technique to evaluate the level of sharing among cyber-threat infrastructures during a period of one year. Third, we use machine learning techniques to fingerprint malicious IP traffic. By fingerprinting, we mean detecting malicious network flows and their attribution to malware families. This research effort relies on a ground truth collected from the dynamic analysis of malware samples. Finally, we investigate the generation of cyber-threat intelligence from passive DNS streams. To this end, we design and implement a system that generates anomalies from passive DNS traffic. Due to the tremendous nature of DNS data, we build a system on top of a cluster computing framework, namely, Apache Spark [70]. The integrated analytic system has the ability to detect anomalies observed in DNS records, which are potentially generated by widespread cyber-threats

Concordia University Research Repository

Contextual Passive DNS Resolution

Author: Olena Marchenko
Publication venue: Czech Technical University in Prague. Computing and Information Centre.
Publication date: 26/08/2021
Field of study

Pasivní DNS je jeden z nejběžnejších nastrojů pro analyzu bezpečnostních incidentů z telemetrii, kde se vyskytují IP adresy. Bez aktivního dotazování DNS resolveru dává informaci o nejpravděpodobnějším doménovém jménu, které mohlo byt při přístupu na IP adresu použito. Tato práce navrhuje použití dodatečných informací obsažených v NetFlow telemetrii k extrakcí dodatečných přiznaků a použití metod strojového učení pro zlepšení přesnosti predikce nejpravděpodobnějšího doménového jména. Navržené řešení je porovnáno s řešením nejběžnejšího pDNS systému využívajícího pouze statisticky nejpravděpodobnejší hodnoty.Passive DNS is one of the most common tools for analyzing security incidents from telemetry where IP addresses occur. Without active querying of the DNS resolver it gives information about the most likely domain name that could be used during accessing the IP address. This work proposes the use of additional information contained in NetFlow telemetry to extract additional features and the use of machine learning methods to improve the accuracy of prediction of the most probable domain name. The proposed solution is compared with the solution of the most common pDNS system which uses only the statistically most probable values

Digital Library of the Czech Technical University in Prague

Explainable Artificial Intelligence Applications in Cyber Security: State-of-the-Art in Research

Author: Damiani Ernesto
Hamadi Hussam Al
Taher Fatma
Yeun Chan Yeob
Zhang Zhibo
Publication venue: ZU Scholars
Publication date: 05/09/2022
Field of study

This survey presents a comprehensive review of current literature on Explainable Artificial Intelligence (XAI) methods for cyber security applications. Due to the rapid development of Internet-connected systems and Artificial Intelligence in recent years, Artificial Intelligence including Machine Learning and Deep Learning has been widely utilized in the fields of cyber security including intrusion detection, malware detection, and spam filtering. However, although Artificial Intelligence-based approaches for the detection and defense of cyber attacks and threats are more advanced and efficient compared to the conventional signature-based and rule-based cyber security strategies, most Machine Learning-based techniques and Deep Learning-based techniques are deployed in the “black-box” manner, meaning that security experts and customers are unable to explain how such procedures reach particular conclusions. The deficiencies of transparencies and interpretability of existing Artificial Intelligence techniques would decrease human users’ confidence in the models utilized for the defense against cyber attacks, especially in current situations where cyber attacks become increasingly diverse and complicated. Therefore, it is essential to apply XAI in the establishment of cyber security models to create more explainable models while maintaining high accuracy and allowing human users to comprehend, trust, and manage the next generation of cyber defense mechanisms. Although there are papers reviewing Artificial Intelligence applications in cyber security areas and the vast literature on applying XAI in many fields including healthcare, financial services, and criminal justice, the surprising fact is that there are currently no survey research articles that concentrate on XAI applications in cyber security. Therefore, the motivation behind the survey is to bridge the research gap by presenting a detailed and up-to-date survey of XAI approaches applicable to issues in the cyber security field. Our work is the first to propose a clear roadmap for navigating the XAI literature in the context of applications in cyber security

ZU Scholars (Zayed University)

Using Botnet Technologies to Counteract Network Traffic Analysis

Author: Fu Yu
Publication venue: Clemson University Libraries
Publication date: 01/08/2017
Field of study

Botnets have been problematic for over a decade. They are used to launch malicious activities including DDoS (Distributed-Denial-of-Service), spamming, identity theft, unauthorized bitcoin mining and malware distribution. A recent nation-wide DDoS attacks caused by the Mirai botnet on 10/21/2016 involving 10s of millions of IP addresses took down Twitter, Spotify, Reddit, The New York Times, Pinterest, PayPal and other major websites. In response to take-down campaigns by security personnel, botmasters have developed technologies to evade detection. The most widely used evasion technique is DNS fast-flux, where the botmaster frequently changes the mapping between domain names and IP addresses of the C&C server so that it will be too late or too costly to trace the C&C server locations. Domain names generated with Domain Generation Algorithms (DGAs) are used as the \u27rendezvous\u27 points between botmasters and bots. This work focuses on how to apply botnet technologies (fast-flux and DGA) to counteract network traffic analysis, therefore protecting user privacy. A better understanding of botnet technologies also helps us be pro-active in defending against botnets. First, we proposed two new DGAs using hidden Markov models (HMMs) and Probabilistic Context-Free Grammars (PCFGs) which can evade current detection methods and systems. Also, we developed two HMM-based DGA detection methods that can detect the botnet DGA-generated domain names with/without training sets. This helps security personnel understand the botnet phenomenon and develop pro-active tools to detect botnets. Second, we developed a distributed proxy system using fast-flux to evade national censorship and surveillance. The goal is to help journalists, human right advocates and NGOs in West Africa to have a secure and free Internet. Then we developed a covert data transport protocol to transform arbitrary message into real DNS traffic. We encode the message into benign-looking domain names generated by an HMM, which represents the statistical features of legitimate domain names. This can be used to evade Deep Packet Inspection (DPI) and protect user privacy in a two-way communication. Both applications serve as examples of applying botnet technologies to legitimate use. Finally, we proposed a new protocol obfuscation technique by transforming arbitrary network protocol into another (Network Time Protocol and a video game protocol of Minecraft as examples) in terms of packet syntax and side-channel features (inter-packet delay and packet size). This research uses botnet technologies to help normal users have secure and private communications over the Internet. From our botnet research, we conclude that network traffic is a malleable and artificial construct. Although existing patterns are easy to detect and characterize, they are also subject to modification and mimicry. This means that we can construct transducers to make any communication pattern look like any other communication pattern. This is neither bad nor good for security. It is a fact that we need to accept and use as best we can

Clemson University: TigerPrints