429 research outputs found
PrivacyGuard: A VPN-Based Approach to Detect Privacy Leakages on Android Devices
The Internet is now the most important and efficient way to gain information, and mobile devices are the easiest way to access the Internet. Furthermore, wearable devices, which can be considered to be the next generation of mobile devices, are becoming popular. The more people rely on mobile devices, the more private information about these people can be gathered from their devices. If a device is lost or compromised, much private information is revealed. Although today’s smartphone operating systems are trying to provide a secure environment, they still fail to provide users with adequate control over and visibility into how third-party applications use their private data. The privacy leakage problem on mobile devices is still severe. For example, according a field study [1] done by CMU recently, Android applications track users’ location every three minutes in average.
After the PRISM program, a surveillance program done by NSA, is exposed, people are becoming increasingly aware of the mobile privacy leakages. However, there are few tools available to average users for privacy preserving. Most tools developed by recent work have some problems (details can be found in chapter 2). To address these problems, we present PrivacyGuard, an efficient way to simultaneously detect leakage of multiple types of sensitive data, such as a phone’s IMEI number or location data. PrivacyGuard provides real-time protection. It is possible to modify the leaked information and replace it with crafted data to achieve protection. PrivacyGuard is configurable, extensible and useful for other research.
We implement PrivacyGuard on the Android platform by taking advantage of the VPNService class provided by the Android SDK. PrivacyGuard does not require root per- missions to run on a device and does not require any knowledge about VPN technology from users either. The VPN server runs on the device locally. No external servers are required. According to our experiments, PrivacyGuard can effectively detect privacy leak- ages of most applications and advertisement libraries with almost no overhead on power consumption and reasonable overhead on network speed
adPerf: Characterizing the Performance of Third-party Ads
Monetizing websites and web apps through online advertising is widespread in
the web ecosystem. The online advertising ecosystem nowadays forces publishers
to integrate ads from these third-party domains. On the one hand, this raises
several privacy and security concerns that are actively studied in recent
years. On the other hand, given the ability of today's browsers to load dynamic
web pages with complex animations and Javascript, online advertising has also
transformed and can have a significant impact on webpage performance. The
performance cost of online ads is critical since it eventually impacts user
satisfaction as well as their Internet bill and device energy consumption.
In this paper, we apply an in-depth and first-of-a-kind performance
evaluation of web ads. Unlike prior efforts that rely primarily on adblockers,
we perform a fine-grained analysis on the web browser's page loading process to
demystify the performance cost of web ads. We aim to characterize the cost by
every component of an ad, so the publisher, ad syndicate, and advertiser can
improve the ad's performance with detailed guidance. For this purpose, we
develop an infrastructure, adPerf, for the Chrome browser that classifies page
loading workloads into ad-related and main-content at the granularity of
browser activities (such as Javascript and Layout). Our evaluations show that
online advertising entails more than 15% of browser page loading workload and
approximately 88% of that is spent on JavaScript. We also track the sources and
delivery chain of web ads and analyze performance considering the origin of the
ad contents. We observe that 2 of the well-known third-party ad domains
contribute to 35% of the ads performance cost and surprisingly, top news
websites implicitly include unknown third-party ads which in some cases build
up to more than 37% of the ads performance cost
Dynamic OSINT System Sourcing from Social Networks
Nowadays, the World Wide Web (WWW) is simultaneously an accumulator and a provider of huge
amounts of information, which is delivered to users through news, blogs, social networks, etc.
The exponential growth of information is a major challenge for the community in general, since
the frequent demand and correlation of news becomes a repetitive task, potentially tedious and
prone to errors. Although information scrutiny is still performed manually and on a regular basis
by most people, the emergence of Open-Source Intelligence (OSINT) systems in recent years for
monitoring, selection and extraction of textual information from social networks and the Web
promise to change the life of some of them. These systems are now very popular and useful
tools for professionals from different areas, such as the cyber-security community, where being
updated with the latest news and trends can lead to a direct impact on threat response.
This work aims to address the previously motivated problem through the implementation of a
dynamic OSINT system. For this system, two algorithms were developed: one to dynamically
add, remove and rate user accounts with relevant tweets in the computer security area; and
another one to classify the publications of those users. The relevance of a user depends not
only on how frequently he publishes, but also on his importance (status) in the social network,
as well as on the relevance of the information published by him. Text mining functions are
proposed herein to achieve the objective of measuring the relevance of text segments.
The proposed approach is innovative, involving dynamic management of the relevance of users
and their publications, thus ensuring a more reliable and important source of information framework.
Apart from the algorithms and functions on which they were build (which were also proposed
in the scope of this work), this dissertation describes several experiments and tests used
in their evaluation. The qualitative results are very interesting and demonstrate the practical
usefulness of the approach. In terms of human-machine interface, a mural of information,
generated dynamically and automatically from the social network Twitter, is provided to the
end-user. In the current version of the system, the mural is presented in the form of a web
page, highlighting the news by its relevancy (red for high relevance, yellow for moderate relevance,
and green for low relevance).
The main contributions of this work are the two proposed algorithms and their evaluation. A
fully working prototype of a system with their implementation, along with a mural for showing
selected news, is another important output of this work.Atualmente, a World Wide Web (WWW) fornece aos utilizadores enormes quantidades de informação
sob os mais diversos formatos: notícias, blogs, nas redes sociais, entre outros. O
crescimento exponencial desta informação representa um grande desafio para a comunidade
em geral, uma vez que a procura e correlação frequente de notícias acaba por se tornar numa
tarefa repetitiva, potencialmente aborrecida e sujeita a erros. Apesar da maioria das pessoas
ainda fazer o escrutínio da informação de forma manual e regularmente, têm surgido, nos
últimos anos, sistemas Open-Source Intelligence (OSINT) que efetuam a vigilância, seleção e
extração de informação textual, a partir de redes sociais e da web em geral. Estes sistemas são
hoje ferramentas muito populares e úteis aos profissionais de diversas áreas, como a da cibersegurança,
onde estar atualizado com as notícias e as tendências mais recentes pode levar a um
impacto direto na reação a ameaças.
O objetivo deste trabalho passa pela tentativa de solucionar o problema motivado anteriormente,
através da implementação de um sistema dinâmico OSINT. Para este sistema foram desenvolvidos
dois algoritmos: um para adicionar, remover e classificar, dinamicamente, contas
de utilizadores com tweets relevantes na área da segurança informática e outro para classificar
as publicações desses utilizadores. A relevância de um utilizador depende não só da sua
frequência de publicação mas também da sua importância (status) na rede social, bem como a
relevância da informação publicada. Neste último ponto, são propostas funções de prospeção
de texto que permitem medir a relevância de segmentos de texto.
A abordagem proposta é inovadora, envolvendo gestão dinâmica da relevância dos utilizadores
e das suas publicações, garantindo assim um quadro de fonte de informação mais fidedigna
e importante. Para além dos algoritmos e das funções que os compõem (também propostas
no contexto deste trabalho), esta dissertação descreve várias experiências e testes usados na
sua avaliação. Os resultados qualitativos constatados são pertinentes, denotando uma elevada
utilidade prática. Em termos de interface homem-máquina, é disponibilizado um mural de informação
contínua que vai sendo gerado dinâmica e automaticamente, a partir da rede social
Twitter, e apresentado sob a forma de uma página web, destacando as notícias apresentadas
pelo grau de relevância que possuem (vermelho para relevância elevada, amarelo para relevância
moderada e verde para relevância reduzida).
As contribuições principais deste trabalho compreendem os dois algoritmos propostos e a sua
avaliação. Um protótipo totalmente funcional de um sistema que os implementa, acompanhado
pelo mural que mostra as notícias selecionadas, constituem outro resultado importante do
trabalho
Web Archive Services Framework for Tighter Integration Between the Past and Present Web
Web archives have contained the cultural history of the web for many years, but they still have a limited capability for access. Most of the web archiving research has focused on crawling and preservation activities, with little focus on the delivery methods. The current access methods are tightly coupled with web archive infrastructure, hard to replicate or integrate with other web archives, and do not cover all the users\u27 needs. In this dissertation, we focus on the access methods for archived web data to enable users, third-party developers, researchers, and others to gain knowledge from the web archives. We build ArcSys, a new service framework that extracts, preserves, and exposes APIs for the web archive corpus. The dissertation introduces a novel categorization technique to divide the archived corpus into four levels. For each level, we will propose suitable services and APIs that enable both users and third-party developers to build new interfaces. The first level is the content level that extracts the content from the archived web data. We develop ArcContent to expose the web archive content processed through various filters. The second level is the metadata level; we extract the metadata from the archived web data and make it available to users. We implement two services, ArcLink for temporal web graph and ArcThumb for optimizing the thumbnail creation in the web archives. The third level is the URI level that focuses on using the URI HTTP redirection status to enhance the user query. Finally, the highest level in the web archiving service framework pyramid is the archive level. In this level, we define the web archive by the characteristics of its corpus and building Web Archive Profiles. The profiles are used by the Memento Aggregator for query optimization
Advanced Data Mining Techniques for Compound Objects
Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in large data collections. The most important step within the process of KDD is data mining which is concerned with the extraction of the valid patterns. KDD is necessary to analyze the steady growing amount of data caused by the enhanced performance of modern computer systems. However, with the growing amount of data the complexity of data objects increases as well. Modern methods of KDD should therefore examine more complex objects than simple feature vectors to solve real-world KDD applications adequately. Multi-instance and multi-represented objects are two important types of object representations for complex objects. Multi-instance objects consist of a set of object representations that all belong to the same feature space. Multi-represented objects are constructed as a tuple of feature representations where each feature representation belongs to a different feature space.
The contribution of this thesis is the development of new KDD methods for the classification and clustering of complex objects. Therefore, the thesis introduces solutions for real-world applications that are based on multi-instance and
multi-represented object representations. On the basis of these solutions, it is shown that a more general object representation often provides better results for many relevant KDD applications.
The first part of the thesis is concerned with two KDD problems for which employing multi-instance objects provides efficient and effective solutions. The first is the data mining in CAD parts, e.g. the use of hierarchic clustering for the automatic construction of product hierarchies. The introduced solution decomposes a single part into a set of feature vectors and compares them by using a metric on multi-instance objects. Furthermore, multi-step query processing using a novel filter step is employed, enabling the user to efficiently process similarity queries. On the basis of this similarity search system, it is possible to perform several distance based data mining algorithms like the hierarchical clustering algorithm OPTICS to derive product hierarchies.
The second important application is the classification and search for complete websites in the world wide web (WWW). A website is a set of HTML-documents that is published by the same person, group or organization and usually serves a common purpose. To perform data mining for websites, the thesis presents several methods to classify websites. After introducing naive methods modelling websites as webpages, two more sophisticated approaches to website classification are introduced. The first approach uses a preprocessing that maps single HTML-documents within each website to so-called page classes. The second approach directly compares websites as sets of word vectors and uses nearest neighbor classification. To search the WWW for new, relevant websites, a focused crawler is introduced that efficiently retrieves relevant websites. This crawler minimizes the number of HTML-documents and increases the accuracy of website retrieval.
The second part of the thesis is concerned with the data mining in multi-represented objects. An important example application for this kind of complex objects are proteins that can be represented as a tuple of a protein sequence and a text annotation. To analyze multi-represented objects, a clustering method for multi-represented objects is introduced that is based on the density based clustering algorithm DBSCAN. This method uses all representations that are provided to find a global clustering of the given data objects. However, in many applications there already exists a sophisticated class ontology for the given data objects, e.g. proteins. To map new objects into an ontology a new
method for the hierarchical classification of multi-represented objects is described. The system employs the hierarchical structure of the ontology to efficiently classify new proteins, using support vector machines
Enhancing Web Browsing Security
Web browsing has become an integral part of our lives, and we use browsers to perform many important activities almost everyday and everywhere. However, due to the vulnerabilities in Web browsers and Web applications and also due to Web users\u27 lack of security knowledge, browser-based attacks are rampant over the Internet and have caused substantial damage to both Web users and service providers. Enhancing Web browsing security is therefore of great need and importance.;This dissertation concentrates on enhancing the Web browsing security through exploring and experimenting with new approaches and software systems. Specifically, we have systematically studied four challenging Web browsing security problems: HTTP cookie management, phishing, insecure JavaScript practices, and browsing on untrusted public computers. We have proposed new approaches to address these problems, and built unique systems to validate our approaches.;To manage HTTP cookies, we have proposed an approach to automatically validate the usefulness of HTTP cookies at the client-side on behalf of users. By automatically removing useless cookies, our approach helps a user to strike an appropriate balance between maximizing usability and minimizing security risks. to protect against phishing attacks, we have proposed an approach to transparently feed a relatively large number of bogus credentials into a suspected phishing site. Using those bogus credentials, our approach conceals victims\u27 real credentials and enables a legitimate website to identify stolen credentials in a timely manner. to identify insecure JavaScript practices, we have proposed an execution-based measurement approach and performed a large-scale measurement study. Our work sheds light on the insecure JavaScript practices and especially reveals the severity and nature of insecure JavaScript inclusion and dynamic generation practices on the Web. to achieve secure and convenient Web browsing on untrusted public computers, we have proposed a simple approach that enables an extended browser on a mobile device and a regular browser on a public computer to collaboratively support a Web session. A user can securely perform sensitive interactions on the mobile device and conveniently perform other browsing interactions on the public computer
- …