123,934 research outputs found

    Automated Discovery of Internet Censorship by Web Crawling

    Full text link
    Censorship of the Internet is widespread around the world. As access to the web becomes increasingly ubiquitous, filtering of this resource becomes more pervasive. Transparency about specific content that citizens are denied access to is atypical. To counter this, numerous techniques for maintaining URL filter lists have been proposed by various individuals and organisations that aim to empirical data on censorship for benefit of the public and wider censorship research community. We present a new approach for discovering filtered domains in different countries. This method is fully automated and requires no human interaction. The system uses web crawling techniques to traverse between filtered sites and implements a robust method for determining if a domain is filtered. We demonstrate the effectiveness of the approach by running experiments to search for filtered content in four different censorship regimes. Our results show that we perform better than the current state of the art and have built domain filter lists an order of magnitude larger than the most widely available public lists as of Jan 2018. Further, we build a dataset mapping the interlinking nature of blocked content between domains and exhibit the tightly networked nature of censored web resources

    FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs

    Full text link
    Various methods have been proposed for creating and maintaining lists of potentially filtered URLs to allow for measurement of ongoing internet censorship around the world. Whilst testing a known resource for evidence of filtering can be relatively simple, given appropriate vantage points, discovering previously unknown filtered web resources remains an open challenge. We present a new framework for automating the process of discovering filtered resources through the use of adaptive queries to well-known search engines. Our system applies information retrieval algorithms to isolate characteristic linguistic patterns in known filtered web pages; these are then used as the basis for web search queries. The results of these queries are then checked for evidence of filtering, and newly discovered filtered resources are fed back into the system to detect further filtered content. Our implementation of this framework, applied to China as a case study, shows that this approach is demonstrably effective at detecting significant numbers of previously unknown filtered web pages, making a significant contribution to the ongoing detection of internet filtering as it develops. Our tool is currently deployed and has been used to discover 1355 domains that are poisoned within China as of Feb 2017 - 30 times more than are contained in the most widely-used public filter list. Of these, 759 are outside of the Alexa Top 1000 domains list, demonstrating the capability of this framework to find more obscure filtered content. Further, our initial analysis of filtered URLs, and the search terms that were used to discover them, gives further insight into the nature of the content currently being blocked in China.Comment: To appear in "Network Traffic Measurement and Analysis Conference 2017" (TMA2017

    Automated detection of block falls in the north polar region of Mars

    Full text link
    We developed a change detection method for the identification of ice block falls using NASA's HiRISE images of the north polar scarps on Mars. Our method is based on a Support Vector Machine (SVM), trained using Histograms of Oriented Gradients (HOG), and on blob detection. The SVM detects potential new blocks between a set of images; the blob detection, then, confirms the identification of a block inside the area indicated by the SVM and derives the shape of the block. The results from the automatic analysis were compared with block statistics from visual inspection. We tested our method in 6 areas consisting of 1000x1000 pixels, where several hundreds of blocks were identified. The results for the given test areas produced a true positive rate of ~75% for blocks with sizes larger than 0.7 m (i.e., approx. 3 times the available ground pixel size) and a false discovery rate of ~8.5%. Using blob detection we also recover the size of each block within 3 pixels of their actual size

    Web Service Discovery in a Semantically Extended UDDI Registry: the Case of FUSION

    Get PDF
    Service-oriented computing is being adopted at an unprecedented rate, making the effectiveness of automated service discovery an increasingly important challenge. UDDI has emerged as a de facto industry standard and fundamental building block within SOA infrastructures. Nevertheless, conventional UDDI registries lack means to provide unambiguous, semantically rich representations of Web service capabilities, and the logic inference power required for facilitating automated service discovery. To overcome this important limitation, a number of approaches have been proposed towards augmenting Web service discovery with semantics. This paper discusses the benefits of semantically extending Web service descriptions and UDDI registries, and presents an overview of the approach put forward in project FUSION, towards semantically-enhanced publication and discovery of services based on SAWSDL

    Nanoscale integration of single cell biologics discovery processes using optofluidic manipulation and monitoring.

    Get PDF
    The new and rapid advancement in the complexity of biologics drug discovery has been driven by a deeper understanding of biological systems combined with innovative new therapeutic modalities, paving the way to breakthrough therapies for previously intractable diseases. These exciting times in biomedical innovation require the development of novel technologies to facilitate the sophisticated, multifaceted, high-paced workflows necessary to support modern large molecule drug discovery. A high-level aspiration is a true integration of "lab-on-a-chip" methods that vastly miniaturize cellulmical experiments could transform the speed, cost, and success of multiple workstreams in biologics development. Several microscale bioprocess technologies have been established that incrementally address these needs, yet each is inflexibly designed for a very specific process thus limiting an integrated holistic application. A more fully integrated nanoscale approach that incorporates manipulation, culture, analytics, and traceable digital record keeping of thousands of single cells in a relevant nanoenvironment would be a transformative technology capable of keeping pace with today's rapid and complex drug discovery demands. The recent advent of optical manipulation of cells using light-induced electrokinetics with micro- and nanoscale cell culture is poised to revolutionize both fundamental and applied biological research. In this review, we summarize the current state of the art for optical manipulation techniques and discuss emerging biological applications of this technology. In particular, we focus on promising prospects for drug discovery workflows, including antibody discovery, bioassay development, antibody engineering, and cell line development, which are enabled by the automation and industrialization of an integrated optoelectronic single-cell manipulation and culture platform. Continued development of such platforms will be well positioned to overcome many of the challenges currently associated with fragmented, low-throughput bioprocess workflows in biopharma and life science research

    Log Skeletons: A Classification Approach to Process Discovery

    Get PDF
    To test the effectiveness of process discovery algorithms, a Process Discovery Contest (PDC) has been set up. This PDC uses a classification approach to measure this effectiveness: The better the discovered model can classify whether or not a new trace conforms to the event log, the better the discovery algorithm is supposed to be. Unfortunately, even the state-of-the-art fully-automated discovery algorithms score poorly on this classification. Even the best of these algorithms, the Inductive Miner, scored only 147 correct classified traces out of 200 traces on the PDC of 2017. This paper introduces the rule-based log skeleton model, which is closely related to the Declare constraint model, together with a way to classify traces using this model. This classification using log skeletons is shown to score better on the PDC of 2017 than state-of-the-art discovery algorithms: 194 out of 200. As a result, one can argue that the fully-automated algorithm to construct (or: discover) a log skeleton from an event log outperforms existing state-of-the-art fully-automated discovery algorithms.Comment: 16 pages with 9 figures, followed by an appendix of 14 pages with 17 figure
    corecore