72 research outputs found

    A Brief History of Web Crawlers

    Full text link
    Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different technique and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance of web crawlers. Based on these criteria we plot the evolution of web crawlers and compare their performanc

    Over Cite : a cooperative digital research library

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (leaves 47-50).CiteSeer is a well-known online resource for the computer science research community, allowing users to search and browse a large archive of research papers. Unfortunately, its current centralized incarnation is costly to run. Although members of the community would presumably be willing to donate hardware and bandwidth at their own sites to assist CiteSeer, the current architecture does not facilitate such distribution of resources. OverCite is a design for a new architecture for a distributed and cooperative research library based on a distributed hash table (DHT). The new architecture harnesses donated resources at many sites to provide document search and retrieval service to researchers worldwide. A preliminary evaluation of an initial OverCite prototype shows that it can service more queries per second than a centralized system, and that it increases total storage capacity by a factor of n/4 in a system of n nodes. OverCite can exploit these additional resources by supporting new features such as document alerts, and by scaling to larger data sets.by Jeremy Stribling.S.M

    Empirical and Analytical Perspectives on the Robustness of Blockchain-related Peer-to-Peer Networks

    Get PDF
    Die Erfindung von Bitcoin hat ein großes Interesse an dezentralen Systemen geweckt. Eine häufige Zuschreibung an dezentrale Systeme ist dabei, dass eine Dezentralisierung automatisch zu einer höheren Sicherheit und Widerstandsfähigkeit gegenüber Angriffen führt. Diese Dissertation widmet sich dieser Zuschreibung, indem untersucht wird, ob dezentralisierte Anwendungen tatsächlich so robust sind. Dafür werden exemplarisch drei Systeme untersucht, die häufig als Komponenten in komplexen Blockchain-Anwendungen benutzt werden: Ethereum als Infrastruktur, IPFS zur verteilten Datenspeicherung und schließlich "Stablecoins" als Tokens mit Wertstabilität. Die Sicherheit und Robustheit dieser einzelnen Komponenten bestimmt maßgeblich die Sicherheit des Gesamtsystems in dem sie verwendet werden; darüber hinaus erlaubt der Fokus auf Komponenten Schlussfolgerungen über individuelle Anwendungen hinaus. Für die entsprechende Analyse bedient sich diese Arbeit einer empirisch motivierten, meist Netzwerklayer-basierten Perspektive -- angereichert mit einer ökonomischen im Kontext von Wertstabilen Tokens. Dieses empirische Verständnis ermöglicht es Aussagen über die inhärenten Eigenschaften der studierten Systeme zu treffen. Ein zentrales Ergebnis dieser Arbeit ist die Entdeckung und Demonstration einer "Eclipse-Attack" auf das Ethereum Overlay. Mittels eines solchen Angriffs kann ein Angreifer die Verbreitung von Transaktionen und Blöcken behindern und Netzwerkteilnehmer aus dem Overlay ausschließen. Des weiteren wird das IPFS-Netzwerk umfassend analysiert und kartografiert mithilfe (1) systematischer Crawls der DHT sowie (2) des Mitschneidens von Anfragenachrichten für Daten. Erkenntlich wird hierbei, dass die hybride Overlay-Struktur von IPFS Segen und Fluch zugleich ist, da das Gesamtsystem zwar robust gegen Angriffe ist, gleichzeitig aber eine umfassende Überwachung der Netzwerkteilnehmer ermöglicht wird. Im Rahmen der wertstabilen Kryptowährungen wird ein Klassifikations-Framework vorgestellt und auf aktuelle Entwicklungen im Gebiet der "Stablecoins" angewandt. Mit diesem Framework wird somit (1) der aktuelle Zustand der Stablecoin-Landschaft sortiert und (2) ein Mittel zur Verfügung gestellt, um auch zukünftige Designs einzuordnen und zu verstehen.The inception of Bitcoin has sparked a large interest in decentralized systems. In particular, popular narratives imply that decentralization automatically leads to a high security and resilience against attacks, even against powerful adversaries. In this thesis, we investigate whether these ascriptions are appropriate and if decentralized applications are as robust as they are made out to be. To this end, we exemplarily analyze three widely-used systems that function as building blocks for blockchain applications: Ethereum as basic infrastructure, IPFS for distributed storage and lastly "stablecoins" as tokens with a stable value. As reoccurring building blocks for decentralized applications these examples significantly determine the security and resilience of the overall application. Furthermore, focusing on these building blocks allows us to look past individual applications and focus on inherent systemic properties. The analysis is driven by a strong empirical, mostly network-layer based perspective; enriched with an economic point of view in the context of monetary stabilization. The resulting practical understanding allows us to delve into the systems' inherent properties. The fundamental results of this thesis include the demonstration of a network-layer Eclipse attack on the Ethereum overlay which can be leveraged to impede the delivery of transaction and blocks with dire consequences for applications built on top of Ethereum. Furthermore, we extensively map the IPFS network through (1) systematic crawling of its DHT, as well as (2) monitoring content requests. We show that while IPFS' hybrid overlay structure renders it quite robust against attacks, this virtue of the overlay is simultaneously a curse, as it allows for extensive monitoring of participating peers and the data they request. Lastly, we exchange the network-layer perspective for a mostly economic one in the context of monetary stabilization. We present a classification framework to (1) map out the stablecoin landscape and (2) provide means to pigeon-hole future system designs. With our work we not only scrutinize ascriptions attributed to decentral technologies; we also reached out to IPFS and Ethereum developers to discuss results and remedy potential attack vectors

    Ethereum’s Peer-to-Peer Network Monitoring and Sybil Attack Prevention

    Get PDF
    International audiencePublic blockchains, like Ethereum, rely on an underlying peer-to-peer (P2P) network to disseminate transactions and blocks between nodes. With the rise of blockchain applications and cryptocurrencies values, they have become critical infrastructures which still lack comprehensive studies. In this paper, we propose to investigate the reliability of the Ethereum P2P network. We developed our own dependable crawler to collect information about the peers composing the network. Our data analysis regarding the geographical distribution of peers and the churn rate shows good network properties while the network can exhibit a sudden and major increase in size and peers are highly concentrated on a few ASes. In a second time, we investigate suspicious patterns that can denote a Sybil attack. We find that many nodes hold numerous identities in the network and could become a threat. To mitigate future Sybil attacks, we propose an architecture to detect suspicious nodes and revoke them. It is based on a monitoring system, a smart contract to propagate the information and an external revocation tool to help clients remove their connections to suspicious peers. Our experiment on Ethereum's Test network proved that our solution is effective

    A Survey on Content Retrieval on the Decentralised Web

    Get PDF
    The control, governance, and management of the web have become increasingly centralised, resulting in security, privacy, and censorship concerns. Decentralised initiatives have emerged to address these issues, beginning with decentralised file systems. These systems have gained popularity, with major platforms serving millions of content requests daily. Complementing the file systems are decentralised search engines and name registry infrastructures, together forming the basis of a decentralised web . This survey paper analyses research trends and emerging technologies for content retrieval on the decentralised web, encompassing both academic literature and industrial projects. Several challenges hinder the realisation of a fully decentralised web. Achieving comparable performance to centralised systems without compromising decentralisation is a key challenge. Hybrid infrastructures, blending centralised components with verifiability mechanisms, show promise to improve decentralised initiatives. While decentralised file systems have seen more mature deployments, they still face challenges such as usability, performance, privacy, and content moderation. Integrating these systems with decentralised name-registries offers a potential for improved usability with human-readable and persistent names for content. Further research is needed to address security concerns in decentralised name-registries and enhance governance and crypto-economic incentive mechanisms

    Advances in modern botnet understanding and the accurate enumeration of infected hosts

    Get PDF
    Botnets remain a potent threat due to evolving modern architectures, inadequate remediation methods, and inaccurate measurement techniques. In response, this re- search exposes the architectures and operations of two advanced botnets, techniques to enumerate infected hosts, and pursues the scientific refinement of infected-host enu- meration data by recognizing network structures which distort measurement. This effort is motivated by the desire to reveal botnet behavior and trends for future mit- igation, methods to discover infected hosts for remediation in real time and threat assessment, and the need to reveal the inaccuracy in population size estimation when only counting IP addresses. Following an explanation of theoretical enumeration techniques, the architectures, deployment methodologies, and malicious output for the Storm and Waledac botnets are presented. Several tools developed to enumerate these botnets are then assessed in terms of performance and yield. Finally, this study documents methods that were developed to discover the boundaries and impact of NAT and DHCP blocks in network populations along with a footprint measurement based on relative entropy which better describes how uniformly infections communi- cate through their IP addresses. Population data from the Waledac botnet was used to evaluate these techniqu
    corecore