123,934 research outputs found
Automated Discovery of Internet Censorship by Web Crawling
Censorship of the Internet is widespread around the world. As access to the
web becomes increasingly ubiquitous, filtering of this resource becomes more
pervasive. Transparency about specific content that citizens are denied access
to is atypical. To counter this, numerous techniques for maintaining URL filter
lists have been proposed by various individuals and organisations that aim to
empirical data on censorship for benefit of the public and wider censorship
research community.
We present a new approach for discovering filtered domains in different
countries. This method is fully automated and requires no human interaction.
The system uses web crawling techniques to traverse between filtered sites and
implements a robust method for determining if a domain is filtered. We
demonstrate the effectiveness of the approach by running experiments to search
for filtered content in four different censorship regimes. Our results show
that we perform better than the current state of the art and have built domain
filter lists an order of magnitude larger than the most widely available public
lists as of Jan 2018. Further, we build a dataset mapping the interlinking
nature of blocked content between domains and exhibit the tightly networked
nature of censored web resources
FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs
Various methods have been proposed for creating and maintaining lists of
potentially filtered URLs to allow for measurement of ongoing internet
censorship around the world. Whilst testing a known resource for evidence of
filtering can be relatively simple, given appropriate vantage points,
discovering previously unknown filtered web resources remains an open
challenge.
We present a new framework for automating the process of discovering filtered
resources through the use of adaptive queries to well-known search engines. Our
system applies information retrieval algorithms to isolate characteristic
linguistic patterns in known filtered web pages; these are then used as the
basis for web search queries. The results of these queries are then checked for
evidence of filtering, and newly discovered filtered resources are fed back
into the system to detect further filtered content.
Our implementation of this framework, applied to China as a case study, shows
that this approach is demonstrably effective at detecting significant numbers
of previously unknown filtered web pages, making a significant contribution to
the ongoing detection of internet filtering as it develops.
Our tool is currently deployed and has been used to discover 1355 domains
that are poisoned within China as of Feb 2017 - 30 times more than are
contained in the most widely-used public filter list. Of these, 759 are outside
of the Alexa Top 1000 domains list, demonstrating the capability of this
framework to find more obscure filtered content. Further, our initial analysis
of filtered URLs, and the search terms that were used to discover them, gives
further insight into the nature of the content currently being blocked in
China.Comment: To appear in "Network Traffic Measurement and Analysis Conference
2017" (TMA2017
Automated detection of block falls in the north polar region of Mars
We developed a change detection method for the identification of ice block
falls using NASA's HiRISE images of the north polar scarps on Mars. Our method
is based on a Support Vector Machine (SVM), trained using Histograms of
Oriented Gradients (HOG), and on blob detection. The SVM detects potential new
blocks between a set of images; the blob detection, then, confirms the
identification of a block inside the area indicated by the SVM and derives the
shape of the block. The results from the automatic analysis were compared with
block statistics from visual inspection. We tested our method in 6 areas
consisting of 1000x1000 pixels, where several hundreds of blocks were
identified. The results for the given test areas produced a true positive rate
of ~75% for blocks with sizes larger than 0.7 m (i.e., approx. 3 times the
available ground pixel size) and a false discovery rate of ~8.5%. Using blob
detection we also recover the size of each block within 3 pixels of their
actual size
Web Service Discovery in a Semantically Extended UDDI Registry: the Case of FUSION
Service-oriented computing is being adopted at an unprecedented rate, making the effectiveness of automated service discovery an increasingly important challenge. UDDI has emerged as a de facto industry standard and fundamental building block within SOA infrastructures. Nevertheless, conventional UDDI registries lack means to provide unambiguous, semantically rich representations of Web service capabilities, and the logic inference power required for facilitating automated service discovery. To overcome this important limitation, a number of approaches have been proposed towards augmenting Web service discovery with semantics. This paper discusses the benefits of semantically extending Web service descriptions and UDDI registries, and presents an overview of the approach put forward in project FUSION, towards semantically-enhanced publication and discovery of services based on SAWSDL
Nanoscale integration of single cell biologics discovery processes using optofluidic manipulation and monitoring.
The new and rapid advancement in the complexity of biologics drug discovery has been driven by a deeper understanding of biological systems combined with innovative new therapeutic modalities, paving the way to breakthrough therapies for previously intractable diseases. These exciting times in biomedical innovation require the development of novel technologies to facilitate the sophisticated, multifaceted, high-paced workflows necessary to support modern large molecule drug discovery. A high-level aspiration is a true integration of "lab-on-a-chip" methods that vastly miniaturize cellulmical experiments could transform the speed, cost, and success of multiple workstreams in biologics development. Several microscale bioprocess technologies have been established that incrementally address these needs, yet each is inflexibly designed for a very specific process thus limiting an integrated holistic application. A more fully integrated nanoscale approach that incorporates manipulation, culture, analytics, and traceable digital record keeping of thousands of single cells in a relevant nanoenvironment would be a transformative technology capable of keeping pace with today's rapid and complex drug discovery demands. The recent advent of optical manipulation of cells using light-induced electrokinetics with micro- and nanoscale cell culture is poised to revolutionize both fundamental and applied biological research. In this review, we summarize the current state of the art for optical manipulation techniques and discuss emerging biological applications of this technology. In particular, we focus on promising prospects for drug discovery workflows, including antibody discovery, bioassay development, antibody engineering, and cell line development, which are enabled by the automation and industrialization of an integrated optoelectronic single-cell manipulation and culture platform. Continued development of such platforms will be well positioned to overcome many of the challenges currently associated with fragmented, low-throughput bioprocess workflows in biopharma and life science research
Log Skeletons: A Classification Approach to Process Discovery
To test the effectiveness of process discovery algorithms, a Process
Discovery Contest (PDC) has been set up. This PDC uses a classification
approach to measure this effectiveness: The better the discovered model can
classify whether or not a new trace conforms to the event log, the better the
discovery algorithm is supposed to be. Unfortunately, even the state-of-the-art
fully-automated discovery algorithms score poorly on this classification. Even
the best of these algorithms, the Inductive Miner, scored only 147 correct
classified traces out of 200 traces on the PDC of 2017. This paper introduces
the rule-based log skeleton model, which is closely related to the Declare
constraint model, together with a way to classify traces using this model. This
classification using log skeletons is shown to score better on the PDC of 2017
than state-of-the-art discovery algorithms: 194 out of 200. As a result, one
can argue that the fully-automated algorithm to construct (or: discover) a log
skeleton from an event log outperforms existing state-of-the-art
fully-automated discovery algorithms.Comment: 16 pages with 9 figures, followed by an appendix of 14 pages with 17
figure
- …