429 research outputs found

    PrivacyGuard: A VPN-Based Approach to Detect Privacy Leakages on Android Devices

    Get PDF
    The Internet is now the most important and efficient way to gain information, and mobile devices are the easiest way to access the Internet. Furthermore, wearable devices, which can be considered to be the next generation of mobile devices, are becoming popular. The more people rely on mobile devices, the more private information about these people can be gathered from their devices. If a device is lost or compromised, much private information is revealed. Although today’s smartphone operating systems are trying to provide a secure environment, they still fail to provide users with adequate control over and visibility into how third-party applications use their private data. The privacy leakage problem on mobile devices is still severe. For example, according a field study [1] done by CMU recently, Android applications track users’ location every three minutes in average. After the PRISM program, a surveillance program done by NSA, is exposed, people are becoming increasingly aware of the mobile privacy leakages. However, there are few tools available to average users for privacy preserving. Most tools developed by recent work have some problems (details can be found in chapter 2). To address these problems, we present PrivacyGuard, an efficient way to simultaneously detect leakage of multiple types of sensitive data, such as a phone’s IMEI number or location data. PrivacyGuard provides real-time protection. It is possible to modify the leaked information and replace it with crafted data to achieve protection. PrivacyGuard is configurable, extensible and useful for other research. We implement PrivacyGuard on the Android platform by taking advantage of the VPNService class provided by the Android SDK. PrivacyGuard does not require root per- missions to run on a device and does not require any knowledge about VPN technology from users either. The VPN server runs on the device locally. No external servers are required. According to our experiments, PrivacyGuard can effectively detect privacy leak- ages of most applications and advertisement libraries with almost no overhead on power consumption and reasonable overhead on network speed

    adPerf: Characterizing the Performance of Third-party Ads

    Get PDF
    Monetizing websites and web apps through online advertising is widespread in the web ecosystem. The online advertising ecosystem nowadays forces publishers to integrate ads from these third-party domains. On the one hand, this raises several privacy and security concerns that are actively studied in recent years. On the other hand, given the ability of today's browsers to load dynamic web pages with complex animations and Javascript, online advertising has also transformed and can have a significant impact on webpage performance. The performance cost of online ads is critical since it eventually impacts user satisfaction as well as their Internet bill and device energy consumption. In this paper, we apply an in-depth and first-of-a-kind performance evaluation of web ads. Unlike prior efforts that rely primarily on adblockers, we perform a fine-grained analysis on the web browser's page loading process to demystify the performance cost of web ads. We aim to characterize the cost by every component of an ad, so the publisher, ad syndicate, and advertiser can improve the ad's performance with detailed guidance. For this purpose, we develop an infrastructure, adPerf, for the Chrome browser that classifies page loading workloads into ad-related and main-content at the granularity of browser activities (such as Javascript and Layout). Our evaluations show that online advertising entails more than 15% of browser page loading workload and approximately 88% of that is spent on JavaScript. We also track the sources and delivery chain of web ads and analyze performance considering the origin of the ad contents. We observe that 2 of the well-known third-party ad domains contribute to 35% of the ads performance cost and surprisingly, top news websites implicitly include unknown third-party ads which in some cases build up to more than 37% of the ads performance cost

    Dynamic OSINT System Sourcing from Social Networks

    Get PDF
    Nowadays, the World Wide Web (WWW) is simultaneously an accumulator and a provider of huge amounts of information, which is delivered to users through news, blogs, social networks, etc. The exponential growth of information is a major challenge for the community in general, since the frequent demand and correlation of news becomes a repetitive task, potentially tedious and prone to errors. Although information scrutiny is still performed manually and on a regular basis by most people, the emergence of Open-Source Intelligence (OSINT) systems in recent years for monitoring, selection and extraction of textual information from social networks and the Web promise to change the life of some of them. These systems are now very popular and useful tools for professionals from different areas, such as the cyber-security community, where being updated with the latest news and trends can lead to a direct impact on threat response. This work aims to address the previously motivated problem through the implementation of a dynamic OSINT system. For this system, two algorithms were developed: one to dynamically add, remove and rate user accounts with relevant tweets in the computer security area; and another one to classify the publications of those users. The relevance of a user depends not only on how frequently he publishes, but also on his importance (status) in the social network, as well as on the relevance of the information published by him. Text mining functions are proposed herein to achieve the objective of measuring the relevance of text segments. The proposed approach is innovative, involving dynamic management of the relevance of users and their publications, thus ensuring a more reliable and important source of information framework. Apart from the algorithms and functions on which they were build (which were also proposed in the scope of this work), this dissertation describes several experiments and tests used in their evaluation. The qualitative results are very interesting and demonstrate the practical usefulness of the approach. In terms of human-machine interface, a mural of information, generated dynamically and automatically from the social network Twitter, is provided to the end-user. In the current version of the system, the mural is presented in the form of a web page, highlighting the news by its relevancy (red for high relevance, yellow for moderate relevance, and green for low relevance). The main contributions of this work are the two proposed algorithms and their evaluation. A fully working prototype of a system with their implementation, along with a mural for showing selected news, is another important output of this work.Atualmente, a World Wide Web (WWW) fornece aos utilizadores enormes quantidades de informação sob os mais diversos formatos: notícias, blogs, nas redes sociais, entre outros. O crescimento exponencial desta informação representa um grande desafio para a comunidade em geral, uma vez que a procura e correlação frequente de notícias acaba por se tornar numa tarefa repetitiva, potencialmente aborrecida e sujeita a erros. Apesar da maioria das pessoas ainda fazer o escrutínio da informação de forma manual e regularmente, têm surgido, nos últimos anos, sistemas Open-Source Intelligence (OSINT) que efetuam a vigilância, seleção e extração de informação textual, a partir de redes sociais e da web em geral. Estes sistemas são hoje ferramentas muito populares e úteis aos profissionais de diversas áreas, como a da cibersegurança, onde estar atualizado com as notícias e as tendências mais recentes pode levar a um impacto direto na reação a ameaças. O objetivo deste trabalho passa pela tentativa de solucionar o problema motivado anteriormente, através da implementação de um sistema dinâmico OSINT. Para este sistema foram desenvolvidos dois algoritmos: um para adicionar, remover e classificar, dinamicamente, contas de utilizadores com tweets relevantes na área da segurança informática e outro para classificar as publicações desses utilizadores. A relevância de um utilizador depende não só da sua frequência de publicação mas também da sua importância (status) na rede social, bem como a relevância da informação publicada. Neste último ponto, são propostas funções de prospeção de texto que permitem medir a relevância de segmentos de texto. A abordagem proposta é inovadora, envolvendo gestão dinâmica da relevância dos utilizadores e das suas publicações, garantindo assim um quadro de fonte de informação mais fidedigna e importante. Para além dos algoritmos e das funções que os compõem (também propostas no contexto deste trabalho), esta dissertação descreve várias experiências e testes usados na sua avaliação. Os resultados qualitativos constatados são pertinentes, denotando uma elevada utilidade prática. Em termos de interface homem-máquina, é disponibilizado um mural de informação contínua que vai sendo gerado dinâmica e automaticamente, a partir da rede social Twitter, e apresentado sob a forma de uma página web, destacando as notícias apresentadas pelo grau de relevância que possuem (vermelho para relevância elevada, amarelo para relevância moderada e verde para relevância reduzida). As contribuições principais deste trabalho compreendem os dois algoritmos propostos e a sua avaliação. Um protótipo totalmente funcional de um sistema que os implementa, acompanhado pelo mural que mostra as notícias selecionadas, constituem outro resultado importante do trabalho

    QueueLinker: データストリームのための並列分散処理フレームワーク

    Get PDF
    早大学位記番号:新6373早稲田大

    Web Archive Services Framework for Tighter Integration Between the Past and Present Web

    Get PDF
    Web archives have contained the cultural history of the web for many years, but they still have a limited capability for access. Most of the web archiving research has focused on crawling and preservation activities, with little focus on the delivery methods. The current access methods are tightly coupled with web archive infrastructure, hard to replicate or integrate with other web archives, and do not cover all the users\u27 needs. In this dissertation, we focus on the access methods for archived web data to enable users, third-party developers, researchers, and others to gain knowledge from the web archives. We build ArcSys, a new service framework that extracts, preserves, and exposes APIs for the web archive corpus. The dissertation introduces a novel categorization technique to divide the archived corpus into four levels. For each level, we will propose suitable services and APIs that enable both users and third-party developers to build new interfaces. The first level is the content level that extracts the content from the archived web data. We develop ArcContent to expose the web archive content processed through various filters. The second level is the metadata level; we extract the metadata from the archived web data and make it available to users. We implement two services, ArcLink for temporal web graph and ArcThumb for optimizing the thumbnail creation in the web archives. The third level is the URI level that focuses on using the URI HTTP redirection status to enhance the user query. Finally, the highest level in the web archiving service framework pyramid is the archive level. In this level, we define the web archive by the characteristics of its corpus and building Web Archive Profiles. The profiles are used by the Memento Aggregator for query optimization

    Advanced Data Mining Techniques for Compound Objects

    Get PDF
    Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in large data collections. The most important step within the process of KDD is data mining which is concerned with the extraction of the valid patterns. KDD is necessary to analyze the steady growing amount of data caused by the enhanced performance of modern computer systems. However, with the growing amount of data the complexity of data objects increases as well. Modern methods of KDD should therefore examine more complex objects than simple feature vectors to solve real-world KDD applications adequately. Multi-instance and multi-represented objects are two important types of object representations for complex objects. Multi-instance objects consist of a set of object representations that all belong to the same feature space. Multi-represented objects are constructed as a tuple of feature representations where each feature representation belongs to a different feature space. The contribution of this thesis is the development of new KDD methods for the classification and clustering of complex objects. Therefore, the thesis introduces solutions for real-world applications that are based on multi-instance and multi-represented object representations. On the basis of these solutions, it is shown that a more general object representation often provides better results for many relevant KDD applications. The first part of the thesis is concerned with two KDD problems for which employing multi-instance objects provides efficient and effective solutions. The first is the data mining in CAD parts, e.g. the use of hierarchic clustering for the automatic construction of product hierarchies. The introduced solution decomposes a single part into a set of feature vectors and compares them by using a metric on multi-instance objects. Furthermore, multi-step query processing using a novel filter step is employed, enabling the user to efficiently process similarity queries. On the basis of this similarity search system, it is possible to perform several distance based data mining algorithms like the hierarchical clustering algorithm OPTICS to derive product hierarchies. The second important application is the classification and search for complete websites in the world wide web (WWW). A website is a set of HTML-documents that is published by the same person, group or organization and usually serves a common purpose. To perform data mining for websites, the thesis presents several methods to classify websites. After introducing naive methods modelling websites as webpages, two more sophisticated approaches to website classification are introduced. The first approach uses a preprocessing that maps single HTML-documents within each website to so-called page classes. The second approach directly compares websites as sets of word vectors and uses nearest neighbor classification. To search the WWW for new, relevant websites, a focused crawler is introduced that efficiently retrieves relevant websites. This crawler minimizes the number of HTML-documents and increases the accuracy of website retrieval. The second part of the thesis is concerned with the data mining in multi-represented objects. An important example application for this kind of complex objects are proteins that can be represented as a tuple of a protein sequence and a text annotation. To analyze multi-represented objects, a clustering method for multi-represented objects is introduced that is based on the density based clustering algorithm DBSCAN. This method uses all representations that are provided to find a global clustering of the given data objects. However, in many applications there already exists a sophisticated class ontology for the given data objects, e.g. proteins. To map new objects into an ontology a new method for the hierarchical classification of multi-represented objects is described. The system employs the hierarchical structure of the ontology to efficiently classify new proteins, using support vector machines

    Enhancing Web Browsing Security

    Get PDF
    Web browsing has become an integral part of our lives, and we use browsers to perform many important activities almost everyday and everywhere. However, due to the vulnerabilities in Web browsers and Web applications and also due to Web users\u27 lack of security knowledge, browser-based attacks are rampant over the Internet and have caused substantial damage to both Web users and service providers. Enhancing Web browsing security is therefore of great need and importance.;This dissertation concentrates on enhancing the Web browsing security through exploring and experimenting with new approaches and software systems. Specifically, we have systematically studied four challenging Web browsing security problems: HTTP cookie management, phishing, insecure JavaScript practices, and browsing on untrusted public computers. We have proposed new approaches to address these problems, and built unique systems to validate our approaches.;To manage HTTP cookies, we have proposed an approach to automatically validate the usefulness of HTTP cookies at the client-side on behalf of users. By automatically removing useless cookies, our approach helps a user to strike an appropriate balance between maximizing usability and minimizing security risks. to protect against phishing attacks, we have proposed an approach to transparently feed a relatively large number of bogus credentials into a suspected phishing site. Using those bogus credentials, our approach conceals victims\u27 real credentials and enables a legitimate website to identify stolen credentials in a timely manner. to identify insecure JavaScript practices, we have proposed an execution-based measurement approach and performed a large-scale measurement study. Our work sheds light on the insecure JavaScript practices and especially reveals the severity and nature of insecure JavaScript inclusion and dynamic generation practices on the Web. to achieve secure and convenient Web browsing on untrusted public computers, we have proposed a simple approach that enables an extended browser on a mobile device and a regular browser on a public computer to collaboratively support a Web session. A user can securely perform sensitive interactions on the mobile device and conveniently perform other browsing interactions on the public computer
    corecore