25 research outputs found

    How many eyes are spying on your shared folders?

    Full text link
    Today peer-to-peer (P2P) file sharing networks help tens of millions of users to share contents on the Internet. However, users' private files in their shared folders might become accessible to everybody inadvertently. In this paper, we investigate this kind of user privacy exposures in Kad, one of the biggest P2P file sharing networks, and try to answer two questions: Q1. Whether and to what extent does this problem exist in current systems? Q2. Are attackers aware of this privacy vulnerability and are they abusing obtained private information? We build a monitoring system called Dragonfly based on the eclipse mechanism to passively monitor sharing and downloading events in Kad. We also use the Honeyfile approach to share forged private information to observe attackers' behaviors. Based on Dragonfly and Honeyfiles, we give affirmative answers to the above two questions. Within two weeks, more than five thousand private files related to ten sensitive keywords were shared by Kad users, and over half of them come from Italy and Spain. Within one month, each honey file was downloaded for about 40 times in average, and its inner password information was exploited for 25 times. These results show that this privacy problem has become a serious threat for P2P users. Finally, we design and implement Numen, a plug-in for eMule, which can effectively protect user private files from being shared without notice. Copyright 2012 ACM.EI

    Confidential Data-Outsourcing and Self-Optimizing P2P-Networks: Coping with the Challenges of Multi-Party Systems

    Get PDF
    This work addresses the inherent lack of control and trust in Multi-Party Systems at the examples of the Database-as-a-Service (DaaS) scenario and public Distributed Hash Tables (DHTs). In the DaaS field, it is shown how confidential information in a database can be protected while still allowing the external storage provider to process incoming queries. For public DHTs, it is shown how these highly dynamic systems can be managed by facilitating monitoring, simulation, and self-adaptation

    Security for Decentralised Service Location - Exemplified with Real-Time Communication Session Establishment

    Get PDF
    Decentralised Service Location, i.e. finding an application communication endpoint based on a Distributed Hash Table (DHT), is a fairly new concept. The precise security implications of this approach have not been studied in detail. More importantly, a detailed analysis regarding the applicability of existing security solutions to this concept has not been conducted. In many cases existing client-server approaches to security may not be feasible. In addition, to understand the necessity for such an analysis, it is key to acknowledge that Decentralised Service Location has some unique security requirements compared to other P2P applications such as filesharing or live streaming. This thesis concerns the security challenges for Decentralised Service Location. The goals of our work are on the one hand to precisely understand the security requirements and research challenges for Decentralised Service Location, and on the other hand to develop and evaluate corresponding security mechanisms. The thesis is organised as follows. First, fundamentals are explained and the scope of the thesis is defined. Decentralised Service Location is defined and P2PSIP is explained technically as a prototypical example. Then, a security analysis for P2PSIP is presented. Based on this security analysis, security requirements for Decentralised Service Location and the corresponding research challenges -- i.e. security concerns not suitably mitigated by existing solutions -- are derived. Second, several decentralised solutions are presented and evaluated to tackle the security challenges for Decentralised Service Location. We present decentralised algorithms to enable availability of the DHTs lookup service in the presence of adversary nodes. These algorithms are evaluated via simulation and compared to analytical bounds. Further, a cryptographic approach based on self-certifying identities is illustrated and discussed. This approach enables decentralised integrity protection of location-bindings. Finally, a decentralised approach to assess unknown identities is introduced. The approach is based on a Web-of-Trust model. It is evaluated via prototypical implementation. Finally, the thesis closes with a summary of the main contributions and a discussion of open issues

    Proceedings of the Conference on Natural Language Processing 2010

    Get PDF
    This book contains state-of-the-art contributions to the 10th conference on Natural Language Processing, KONVENS 2010 (Konferenz zur Verarbeitung natĂĽrlicher Sprache), with a focus on semantic processing. The KONVENS in general aims at offering a broad perspective on current research and developments within the interdisciplinary field of natural language processing. The central theme draws specific attention towards addressing linguistic aspects ofmeaning, covering deep as well as shallow approaches to semantic processing. The contributions address both knowledgebased and data-driven methods for modelling and acquiring semantic information, and discuss the role of semantic information in applications of language technology. The articles demonstrate the importance of semantic processing, and present novel and creative approaches to natural language processing in general. Some contributions put their focus on developing and improving NLP systems for tasks like Named Entity Recognition or Word Sense Disambiguation, or focus on semantic knowledge acquisition and exploitation with respect to collaboratively built ressources, or harvesting semantic information in virtual games. Others are set within the context of real-world applications, such as Authoring Aids, Text Summarisation and Information Retrieval. The collection highlights the importance of semantic processing for different areas and applications in Natural Language Processing, and provides the reader with an overview of current research in this field

    Model-based Decision Support for Sepsis Endotypes

    Get PDF
    Sepsis is a high mortality syndrome characterized by organ dysfunction due to a severe and dysregulated acute inflammatory response to infection. Research into therapies for this syndrome has historically ended in failure, which has largely been attributed to the elevated levels of subject heterogeneity. What may have been previously attributed to variability in sepsis may be due to mechanistic differences between patients. Endotypes are distinct subtypes of disease, where underlying causes such as mechanistic or pathway related differences manifest into phenotypes of disease. The lack of mechanistic understanding of immune mediator dynamics and the responses they trigger necessitates a mathematical modeling approach to analyze its complexities. A transfer function model is proposed to describe and cluster the dynamics of key inflammatory mediators. Five sepsis endotypes were discovered and revealed motifs of overwhelming inflammation, various levels of immunosuppression, sustained inflammation, and immunodeficiency. An accurate clinical tool was proposed to classify subjects into endotypes using six-hour trajectories of clinical data. A physiological ordinary differential equation model of sepsis is proposed that characterizes the interactions of inflammatory signaling molecules, neutrophils, and macrophages across the bone, blood, and tissue compartments of the body. This model used to generate individual subject fits against human sepsis data. Population-level parameter analysis implicated macrophage cell death and cytokine half- dynamics in endotype-level differences. Several proof-of-concept statistical models were introduced to demonstrate that it is possible to estimate the pre-hospital time of sepsis subjects and to quantify their sepsis-induced systemic tissue damage. A nearest-neighbor-based method was verified against animal and human data and revealed that identifying infection time-zero of sepsis patients can be quickly estimated with high accuracy using commonly measured clinical features. A logistic regression ensemble model demonstrated revealed early organ dysfunction were significant contributors to systemic damage and mortality. Knowledge of time-zero and systemic damage levels, in combination with an endotype classifier, provides clinicians with a clear depiction of where a subject is located on their sepsis trajectory. Such a decision support system enables therapy timing, early organ support, and targeted therapies to guide personalized treatment and shift patients towards better outcomes in sepsis

    Online learning on the programmable dataplane

    Get PDF
    This thesis makes the case for managing computer networks with datadriven methods automated statistical inference and control based on measurement data and runtime observations—and argues for their tight integration with programmable dataplane hardware to make management decisions faster and from more precise data. Optimisation, defence, and measurement of networked infrastructure are each challenging tasks in their own right, which are currently dominated by the use of hand-crafted heuristic methods. These become harder to reason about and deploy as networks scale in rates and number of forwarding elements, but their design requires expert knowledge and care around unexpected protocol interactions. This makes tailored, per-deployment or -workload solutions infeasible to develop. Recent advances in machine learning offer capable function approximation and closed-loop control which suit many of these tasks. New, programmable dataplane hardware enables more agility in the network— runtime reprogrammability, precise traffic measurement, and low latency on-path processing. The synthesis of these two developments allows complex decisions to be made on previously unusable state, and made quicker by offloading inference to the network. To justify this argument, I advance the state of the art in data-driven defence of networks, novel dataplane-friendly online reinforcement learning algorithms, and in-network data reduction to allow classification of switchscale data. Each requires co-design aware of the network, and of the failure modes of systems and carried traffic. To make online learning possible in the dataplane, I use fixed-point arithmetic and modify classical (non-neural) approaches to take advantage of the SmartNIC compute model and make use of rich device local state. I show that data-driven solutions still require great care to correctly design, but with the right domain expertise they can improve on pathological cases in DDoS defence, such as protecting legitimate UDP traffic. In-network aggregation to histograms is shown to enable accurate classification from fine temporal effects, and allows hosts to scale such classification to far larger flow counts and traffic volume. Moving reinforcement learning to the dataplane is shown to offer substantial benefits to stateaction latency and online learning throughput versus host machines; allowing policies to react faster to fine-grained network events. The dataplane environment is key in making reactive online learning feasible—to port further algorithms and learnt functions, I collate and analyse the strengths of current and future hardware designs, as well as individual algorithms

    Revisiting Why Kad Lookup Fails

    No full text
    Abstract—Kad is one of the most popular peer-to-peer (P2P) networks deployed on today’s Internet. Its reliability is dependent on not only to the usability of the file-sharing service, but also to the capability to support other Internet services. However, Kad can only attain around a 91 % lookup success ratio today. We build a measurement system called Anthill to analyze Kad’s performance quantitatively, and find that Kad’s failures can be classified into four types: packet loss, selective Denial of Service (sDoS) nodes, search sequence miss, and publish/search space miss. The first two are due to environment changes, the third is caused by the detachment of routing and content operations in Kad, and the last one shows the limitations of the Kademlia DHT algorithm under Kad’s current configuration. Based on the analysis, we propose corresponding approaches for Kad, which achieve a success ratio of 99.8%, with only moderate communication overhead
    corecore