76,858 research outputs found
Discovering Links for Metadata Enrichment on Computer Science Papers
At the very beginning of compiling a bibliography, usually only basic
information, such as title, authors and publication date of an item are known.
In order to gather additional information about a specific item, one typically
has to search the library catalog or use a web search engine. This look-up
procedure implies a manual effort for every single item of a bibliography. In
this technical report we present a proof of concept which utilizes Linked Data
technology for the simple enrichment of sparse metadata sets. This is done by
discovering owl:sameAs links be- tween an initial set of computer science
papers and resources from external data sources like DBLP, ACM and the Semantic
Web Conference Corpus. In this report, we demonstrate how the link discovery
tool Silk is used to detect additional information and to enrich an initial set
of records in the computer science domain. The pros and cons of silk as link
discovery tool are summarized in the end.Comment: 22 pages, 4 figures, 7 listings, presented at SWIB1
Automated Discovery of Internet Censorship by Web Crawling
Censorship of the Internet is widespread around the world. As access to the
web becomes increasingly ubiquitous, filtering of this resource becomes more
pervasive. Transparency about specific content that citizens are denied access
to is atypical. To counter this, numerous techniques for maintaining URL filter
lists have been proposed by various individuals and organisations that aim to
empirical data on censorship for benefit of the public and wider censorship
research community.
We present a new approach for discovering filtered domains in different
countries. This method is fully automated and requires no human interaction.
The system uses web crawling techniques to traverse between filtered sites and
implements a robust method for determining if a domain is filtered. We
demonstrate the effectiveness of the approach by running experiments to search
for filtered content in four different censorship regimes. Our results show
that we perform better than the current state of the art and have built domain
filter lists an order of magnitude larger than the most widely available public
lists as of Jan 2018. Further, we build a dataset mapping the interlinking
nature of blocked content between domains and exhibit the tightly networked
nature of censored web resources
Workflows and service discovery: a mobile device approach
Bioinformatics has moved from command-line standalone
programs to web-service based environments. Such trend has resulted
in an enormous amount of online resources which can be hard to find
and identify, let alone execute and exploit. Furthermore, these resources
are aimed -in general- to solve specific tasks. Usually, this tasks need to
be combined in order to achieve the desired results. In this line, finding
the appropriate set of tools to build up a workflow to solve a problem
with the services available in a repository is itself a complex exercise. Issues
such as services discovering, composition and representation appear.
On the technological side, mobile devices have experienced an incredible
growth in the number of users and technical capabilities. Starting from
this reality, in the present paper, we propose a solution for service discovering
and workflow generation while distinct approaches of representing
workflows in a mobile environment are reviewed and discussed. As a
proof of concept, a specific use case has been developed: we have embedded
an expanded version of our Magallanes search engine into mORCA,
our mobile client for bioinformatics. Such composition delivers a powerful
and ubiquitous solution that provides the user with a handy tool for
not only generate and represent workflows, but also services, data types,
operations and service types discoveryUniversidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
Implementation and Deployment of a Distributed Network Topology Discovery Algorithm
In the past few years, the network measurement community has been interested
in the problem of internet topology discovery using a large number (hundreds or
thousands) of measurement monitors. The standard way to obtain information
about the internet topology is to use the traceroute tool from a small number
of monitors. Recent papers have made the case that increasing the number of
monitors will give a more accurate view of the topology. However, scaling up
the number of monitors is not a trivial process. Duplication of effort close to
the monitors wastes time by reexploring well-known parts of the network, and
close to destinations might appear to be a distributed denial-of-service (DDoS)
attack as the probes converge from a set of sources towards a given
destination. In prior work, authors of this report proposed Doubletree, an
algorithm for cooperative topology discovery, that reduces the load on the
network, i.e., router IP interfaces and end-hosts, while discovering almost as
many nodes and links as standard approaches based on traceroute. This report
presents our open-source and freely downloadable implementation of Doubletree
in a tool we call traceroute@home. We describe the deployment and validation of
traceroute@home on the PlanetLab testbed and we report on the lessons learned
from this experience. We discuss how traceroute@home can be developed further
and discuss ideas for future improvements
- …