856 research outputs found

    Community Detection in Hypergraphen

    Get PDF
    Viele DatensĂ€tze können als Graphen aufgefasst werden, d.h. als Elemente (Knoten) und binĂ€re Verbindungen zwischen ihnen (Kanten). Unter dem Begriff der "Complex Network Analysis" sammeln sich eine ganze Reihe von Verfahren, die die Untersuchung von DatensĂ€tzen allein aufgrund solcher struktureller Eigenschaften erlauben. "Community Detection" als Untergebiet beschĂ€ftigt sich mit der Identifikation besonders stark vernetzter Teilgraphen. Über den Nutzen hinaus, den eine Gruppierung verwandter Element direkt mit sich bringt, können derartige Gruppen zu einzelnen Knoten zusammengefasst werden, was einen neuen Graphen von reduzierter KomplexitĂ€t hervorbringt, der die Makrostruktur des ursprĂŒnglichen Graphen unter UmstĂ€nden besser hervortreten lĂ€sst. Fortschritte im Bereich der "Community Detection" verbessern daher auch das VerstĂ€ndnis komplexer Netzwerke im allgemeinen. Nicht jeder Datensatz lĂ€sst sich jedoch angemessen mit binĂ€ren Relationen darstellen - Relationen höherer Ordnung fĂŒhren zu sog. Hypergraphen. Gegenstand dieser Arbeit ist die Verallgemeinerung von AnsĂ€tzen zur "Community Detection" auf derartige Hypergraphen. Im Zentrum der Aufmerksamkeit stehen dabei "Social Bookmarking"-DatensĂ€tze, wie sie von Benutzern von "Bookmarking"-Diensten erzeugt werden. Dabei ordnen Benutzer Dokumenten frei gewĂ€hlte Stichworte, sog. "Tags" zu. Dieses "Tagging" erzeugt, fĂŒr jede Tag-Zuordnung, eine ternĂ€re Verbindung zwischen Benutzer, Dokument und Tag, was zu Strukturen fĂŒhrt, die 3-partite, 3-uniforme (im folgenden 3,3-, oder allgemeiner k,k-) Hypergraphen genannt werden. Die Frage, der diese Arbeit nachgeht, ist wie diese Strukturen formal angemessen in "Communities" unterteilt werden können, und wie dies das VerstĂ€ndnis dieser DatensĂ€tze erleichtert, die potenziell sehr reich an latenten Informationen sind. ZunĂ€chst wird eine Verallgemeinerung der verbundenen Komponenten fĂŒr k,k-Hypergraphen eingefĂŒhrt. Die normale Definition verbundener Komponenten weist auf den untersuchten DatensĂ€tzen, recht uninformativ, alle Elemente einer einzelnen Riesenkomponente zu. Die verallgemeinerten, so genannten hyper-inzidenten verbundenen Komponenten hingegen zeigen auf den "Social Bookmarking"-DatensĂ€tzen eine charakteristische GrĂ¶ĂŸenverteilung, die jedoch bspw. von Spam-Verhalten zerstört wird - was eine Verbindung zwischen Verhaltensmustern und strukturellen Eigenschaften zeigt, der im folgenden weiter nachgegangen wird. Als nĂ€chstes wird das allgemeine Thema der "Community Detection" auf k,k-Hypergraphen eingefĂŒhrt. Drei Herausforderungen werden definiert, die mit der naiven Anwendung bestehender Verfahren nicht gemeistert werden können. Außerdem werden drei Familien synthetischer Hypergraphen mit "Community"-Strukturen von steigender KomplexitĂ€t eingefĂŒhrt, die prototypisch fĂŒr Situationen stehen, die ein erfolgreicher Detektionsansatz rekonstruieren können sollte. Der zentrale methodische Beitrag dieser Arbeit besteht aus der im folgenden dargestellten Entwicklung eines multipartiten (d.h. fĂŒr k,k-Hypergraphen geeigneten) Verfahrens zur Erkennung von "Communities". Es basiert auf der Optimierung von ModularitĂ€t, einem etablierten Verfahrung zur Erkennung von "Communities" auf nicht-partiten, d.h. "normalen" Graphen. Ausgehend vom einfachst möglichen Ansatz wird das Verfahren iterativ verfeinert, um den zuvor definierten sowie neuen, in der Praxis aufgetretenen Herausforderungen zu begegnen. Am Ende steht die Definition der "ausgeglichenen multi-partiten ModularitĂ€t". Schließlich wird ein interaktives Werkzeug zur Untersuchung der so gewonnenen "Community"-Zuordnungen vorgestellt. Mithilfe dieses Werkzeugs können die Vorteile der zuvor eingefĂŒhrten ModularitĂ€t demonstriert werden: So können komplexe ZusammenhĂ€nge beobachtet werden, die den einfacheren Verfahren entgehen. Diese Ergebnisse werden von einer stĂ€rker quantitativ angelegten Untersuchung bestĂ€tigt: UnĂŒberwachte QualitĂ€tsmaße, die bspw. den Kompressionsgrad berĂŒcksichtigen, können ĂŒber eine grĂ¶ĂŸere Menge von Beispielen die Vorteile der ausgeglichenen multi-partiten ModularitĂ€t gegenĂŒber den anderen Verfahren belegen. Zusammenfassend lassen sich die Ergebnisse dieser Arbeit in zwei Bereiche einteilen: Auf der praktischen Seite werden Werkzeuge zur Erforschung von "Social Bookmarking"-Daten bereitgestellt. DemgegenĂŒber stehen theoretische BeitrĂ€ge, die fĂŒr Graphen etablierte Konzepte - verbundene Komponenten und "Community Detection" - auf k,k-Hypergraphen ĂŒbertragen.Many datasets can be interpreted as graphs, i.e. as elements (nodes) and binary relations between them (edges). Under the label of complex network analysis, a vast array of graph-based methods allows the exploration of datasets purely based on such structural properties. Community detection, as a subfield of network analysis, aims to identify well-connected subparts of graphs. While the grouping of related elements is useful in itself, these groups can furthermore be collapsed into single nodes, creating a new graph of reduced complexity which may better reveal the original graph's macrostructure. Therefore, advances in community detection improve the understanding of complex networks in general. However, not every dataset can be modelled properly with binary relations - higher-order relations give rise to so-called hypergraphs. This thesis explores the generalization of community detection approaches to hypergraphs. In the focus of attention are social bookmarking datasets, created by users of online bookmarking services who assign freely chosen keywords, so-called "tags", to documents. This "tagging" creates, for each tag assignment, a ternary connection between the user, the document, and the tag, inducing particular structures called 3-partite, 3-uniform hypergraphs (henceforth called 3,3- or more generally k,k-hypergraphs). The question pursued here is how to decompose these structures in a formally adequate manner, and how this improves the understanding of these rich datasets. First, a generalization of connected components to k,k-hypergraphs is proposed. The standard definition of connected components here rather uninformatively assigns almost all elements to a single giant component. The generalized so-called hyperincident connected components, however, show a characteristic size distribution on the social bookmarking datasets that is disrupted by, e.g., spamming activity - demonstrating a link between behavioural patterns and structural features that is further explored in the following. Next, the general topic of community detection in k,k-hypergraphs is introduced. Three challenges are posited that are not met by the naive application of standard techniques, and three families of synthetic hypergraphs are introduced containing increasingly complex community setups that a successful detection approach must be able to identify. The main methodical contribution of this thesis consists of the following development of a multi-partite (i.e. suitable for k,k-hypergraphs) community detection algorithm. It is based on modularity optimization, a well-established algorithm to detect communities in non-partite, i.e. "normal" graphs. Starting from the simplest approach possible, the method is successively refined to meet the previously defined as well as empirically encountered challenges, culminating in the definition of the "balanced multi-partite modularity". Finally, an interactive tool for exploring the obtained community assignments is introduced. Using this tool, the benefits of balanced multi-partite modularity can be shown: Intricate patters can be observed that are missed by the simpler approaches. These findings are confirmed by a more quantitative examination: Unsupervised quality measures considering, e.g., compression document the advantages of this approach on a larger number of samples. To conclude, the contributions of this thesis are twofold. It provides practical tools for the analysis of social bookmarking data, complemented with theoretical contributions, the generalization of connected components and modularity from graphs to k,k-hypergraphs

    Friends for Free: Self-Organizing Artificial Social Networks for Trust and Cooperation

    Get PDF
    By harvesting friendship networks from e-mail contacts or instant message "buddy lists" Peer-to-Peer (P2P) applications can improve performance in low trust environments such as the Internet. However, natural social networks are not always suitable, reliable or available. We propose an algorithm (SLACER) that allows peer nodes to create and manage their own friendship networks. We evaluate performance using a canonical test application, requiring cooperation between peers for socially optimal outcomes. The Artificial Social Networks (ASN) produced are connected, cooperative and robust - possessing many of the disable properties of human friendship networks such as trust between friends (directly linked peers) and short paths linking everyone via a chain of friends. In addition to new application possibilities, SLACER could supply ASN to P2P applications that currently depend on human social networks thus transforming them into fully autonomous, self-managing systems

    The Political Nature of TCP/IP

    Get PDF
    Despite the importance of the Internet in the modern world, many users and even policy makers don’t have a necessary historical or technical grasp of the technology behind it. In the spirit of addressing this issue, this thesis attempts to shed light on the historical, political, and technical context of TCP/IP. TCP/IP is the Internet Protocol Suite, a primary piece of Internet architecture with a well-documented history. After at technical overview, detailing the main function of TCP/IP, I examine aspects of the social and developmental record of this technology using STS theoretical approaches such as Hughesian systems theory, Social Construction of Technology (SCOT), and Langdon Winner’s brand of technological determinism. Key points in TCP/IP evolution, when viewed from an STS perspective, illuminate the varied reasons behind decisions and development of the technology. For example, as detailed in this paper, both technical and political motivations were behind the architectural politics built into TCP/IP in the 1970s, and similar motivations spurred the rejection of OSI protocols by Internet developers two decades later. Armed with resultant contextual understanding of previous TCP/IP developments, a few possible directions (both political and technical) in contemporary and future Internet development are then explored, such as the slow migration to IPv6 and the meaning of network neutrality

    Public Policy and Technology: Advancing Civilization at the Expense of Individual Privacy

    Get PDF
    Technological advances have created a new existence, providing an unforeseen level of interaction and transaction between parties that have never physically met. Preliminary thinking was that these advances would create a previously unimaginable level of privacy and anonymity. While a surface examination suggests an abundance of privacy in modern society, a more thorough examination reveals different results. Advances in technology and changes in public policy have produced a world in which a startling amount of information is available regarding a given individual. Rather than experiencing an increase in individual privacy, modern societies suffer from rapidly decreasing individual privacy

    Categorizing Blog Spam

    Get PDF
    The internet has matured into the focal point of our era. Its ecosystem is vast, complex, and in many regards unaccounted for. One of the most prevalent aspects of the internet is spam. Similar to the rest of the internet, spam has evolved from simply meaning ‘unwanted emails’ to a blanket term that encompasses any unsolicited or illegitimate content that appears in the wide range of media that exists on the internet. Many forms of spam permeate the internet, and spam architects continue to develop tools and methods to avoid detection. On the other side, cyber security engineers continue to develop more sophisticated detection tools to curb the harmful effects that come with spam. This virtual arms race has no end in sight. Most efforts thus far have been toward accurately detecting spam from ham, and rightfully so since initial detection is essential. However, research is lacking in understanding the current ecosystem of spam, spam campaigns, and the behavior of the botnets that drive the majority of spam traffic. This thesis focuses on characterizing spam, particularly the spam that appears in forums, where the spam is delivered by bots posing as legitimate users. Forum spam is used primarily to push advertisements or to boost other websites’ perceived popularity by including HTTP links in the content of the post. We conduct an experiment to collect a sample of the blog posts and network activity of the spambots that exist in the internet. We then present a corpora available to conduct analysis on and proceed with our own analysis. We cluster associated groups of users and IP addresses into entities, which we accept as a model of the underlying botnets that interact with our honeypots. We use Natural Language Processing (NLP) and Machine Learning (ML) to determine that creating semantic-based models of botnets are sufficient for distinguishing them from one another. We also find that the syntactic structure of posts has little variation from botnet to botnet. Finally we confirm that to a large degree botnet behavior and content hold across different domains

    The Benefits and Costs of Online Privacy Legislation

    Get PDF
    Many people are concerned that information about their private life is more readily available and more easily captured on the Internet as compared to offline technologies. Specific concerns include unwanted email, credit card fraud, identity theft, and harassment. This paper analyzes key issues surrounding the protection of online privacy. It makes three important contributions: First, it provides the most comprehensive assessment to date of the estimated benefits and costs of regulating online privacy. Second, it provides the most comprehensive evaluation of legislation and legislative proposals in the U.S. aimed at protecting online privacy. Finally, it offers some policy prescriptions for the regulation of online privacy and suggests areas for future research. After analyzing the current debate on online privacy and assessing the potential costs and benefits of proposed regulations, our specific recommendations concerning the government's involvement in protecting online privacy include the following: The government should fund research that evaluates the effectiveness of existing privacy legislation before considering new regulations. The government should not generally regulate matters of privacy differently based on whether an issue arises online or offline. The government should not require a Web site to provide notification of its privacy policy because the vast majority of commercial U.S.-based Web sites already do so. The government should distinguish between how it regulates the use and dissemination of highly sensitive information, such as certain health records or Social Security numbers, versus more general information, such as consumer name and purchasing habits. The government should not require companies to provide consumers broad access to the personal information that is collected online for marketing purposes because the benefits do not appear to be significant and the costs could be quite high. The government should make it easier for the public to obtain information on online privacy and the tools available for consumers to protect their own privacy. The message of this paper is not that online privacy should be unregulated, but rather that policy makers should think through their options carefully, weighing the likely costs and benefits of each proposal.

    From Social Data Mining to Forecasting Socio-Economic Crisis

    Full text link
    Socio-economic data mining has a great potential in terms of gaining a better understanding of problems that our economy and society are facing, such as financial instability, shortages of resources, or conflicts. Without large-scale data mining, progress in these areas seems hard or impossible. Therefore, a suitable, distributed data mining infrastructure and research centers should be built in Europe. It also appears appropriate to build a network of Crisis Observatories. They can be imagined as laboratories devoted to the gathering and processing of enormous volumes of data on both natural systems such as the Earth and its ecosystem, as well as on human techno-socio-economic systems, so as to gain early warnings of impending events. Reality mining provides the chance to adapt more quickly and more accurately to changing situations. Further opportunities arise by individually customized services, which however should be provided in a privacy-respecting way. This requires the development of novel ICT (such as a self- organizing Web), but most likely new legal regulations and suitable institutions as well. As long as such regulations are lacking on a world-wide scale, it is in the public interest that scientists explore what can be done with the huge data available. Big data do have the potential to change or even threaten democratic societies. The same applies to sudden and large-scale failures of ICT systems. Therefore, dealing with data must be done with a large degree of responsibility and care. Self-interests of individuals, companies or institutions have limits, where the public interest is affected, and public interest is not a sufficient justification to violate human rights of individuals. Privacy is a high good, as confidentiality is, and damaging it would have serious side effects for society.Comment: 65 pages, 1 figure, Visioneer White Paper, see http://www.visioneer.ethz.c

    Network security

    Get PDF
    In a variety of settings, some payoff-relevant item spreads along a network of connected individuals. In some cases, the item will benefit those who receive it (for example, a music download, a stock tip, news about a new research funding source, etc.) while in other cases the impact may be negative (for example, viruses, both biological and electronic, financial contagion, and so on). Often, good and bad items may propagate along the same networks, so individuals must weigh the costs and benefits of being more or less connected to the network. The situation becomes more complicated (and more interesting) if individuals can also put effort into security, where security can be thought of as a screening technology that allows an individual to keep getting the benefits of network connectivity while blocking out the bad items. Drawing on the network literatures in economics, epidemiology, and applied math, we formulate a model of network security that can be used to study individual incentives to expand and secure networks and characterize properties of a symmetric equilibrium.social networks; network security; network robustness; contagion; random graphs

    Data Mining in Social Networks

    Get PDF
    The objective of the study is to examine the idea of Big Data and its applications in data mining. The data in the universe is expanding step by step every year and turns into large data. These significant data can be determined to utilize a few data mining undertakings. In short, Big Data can be called as an “asset” and data mining is a technique that is employed to give useful results. This paper implements an HACE algorithm that analysis the structure of big data and presents an efficient data mining technique. This framework model incorporates a mixture of information sources, mining techniques, customer interest, security, and data protection system. The study also analyzes and presents the challenges and issues faced in the big data model

    Analysis of Distributed Denial of Service Attacks and Countermeasures

    Get PDF
    Network technology has experienced explosive growth in the past two decades. The vast connectivity of networks all over the world poses monumental risks. The generally accepted philosophy in the security world is that no system or network is completely secure [1] which makes network security a critical concern. The work done in this thesis focuses on Distributed Denial of Service Attacks (DDoS) where legitimate users are prevented from accessing network services. Although a lot of research has been done in this field, these attacks remain one of the most common threats affecting network performance. One defense against DDoS attacks is to make attacks infeasible for an attacker, by increasing either the amount of attack traffic needed to disable a link or the number of attackers needed to disable the network. Theoretical work has been done previously which focused on quantifying the attack traffic required to disable a set of mincut arcs in a network. In this thesis, we experimentally verify the validity of the analysis performed by running simulations using the SSFNet network simulator. A Distributed Denial of Service attack is simulated by flooding the mincut arcs in the network. From the results, we analyze the minimum number of zombie processors (attack sources) required to disable a set of arcs and the minimum attack traffic volume required to disable the arcs
    • 

    corecore