3,610 research outputs found
The Hierarchic treatment of marine ecological information from spatial networks of benthic platforms
Measuring biodiversity simultaneously in different locations, at different temporal scales, and over wide spatial scales is of strategic importance for the improvement of our understanding of the functioning of marine ecosystems and for the conservation of their biodiversity. Monitoring networks of cabled observatories, along with other docked autonomous systems (e.g., Remotely Operated Vehicles [ROVs], Autonomous Underwater Vehicles [AUVs], and crawlers), are being conceived and established at a spatial scale capable of tracking energy fluxes across benthic and pelagic compartments, as well as across geographic ecotones. At the same time, optoacoustic imaging is sustaining an unprecedented expansion in marine ecological monitoring, enabling the acquisition of new biological and environmental data at an appropriate spatiotemporal scale. At this stage, one of the main problems for an effective application of these technologies is the processing, storage, and treatment of the acquired complex ecological information. Here, we provide a conceptual overview on the technological developments in the multiparametric generation, storage, and automated hierarchic treatment of biological and environmental information required to capture the spatiotemporal complexity of a marine ecosystem. In doing so, we present a pipeline of ecological data acquisition and processing in different steps and prone to automation. We also give an example of population biomass, community richness and biodiversity data computation (as indicators for ecosystem functionality) with an Internet Operated Vehicle (a mobile crawler). Finally, we discuss the software requirements for that automated data processing at the level of cyber-infrastructures with sensor calibration and control, data banking, and ingestion into large data portals.Peer ReviewedPostprint (published version
iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling
Researchers in the Digital Humanities and journalists need to monitor,
collect and analyze fresh online content regarding current events such as the
Ebola outbreak or the Ukraine crisis on demand. However, existing focused
crawling approaches only consider topical aspects while ignoring temporal
aspects and therefore cannot achieve thematically coherent and fresh Web
collections. Especially Social Media provide a rich source of fresh content,
which is not used by state-of-the-art focused crawlers. In this paper we
address the issues of enabling the collection of fresh and relevant Web and
Social Web content for a topic of interest through seamless integration of Web
and Social Media in a novel integrated focused crawler. The crawler collects
Web and Social Media content in a single system and exploits the stream of
fresh Social Media content for guiding the crawler.Comment: Published in the Proceedings of the 15th ACM/IEEE-CS Joint Conference
on Digital Libraries 201
Global-Scale Resource Survey and Performance Monitoring of Public OGC Web Map Services
One of the most widely-implemented service standards provided by the Open
Geospatial Consortium (OGC) to the user community is the Web Map Service (WMS).
WMS is widely employed globally, but there is limited knowledge of the global
distribution, adoption status or the service quality of these online WMS
resources. To fill this void, we investigated global WMSs resources and
performed distributed performance monitoring of these services. This paper
explicates a distributed monitoring framework that was used to monitor 46,296
WMSs continuously for over one year and a crawling method to discover these
WMSs. We analyzed server locations, provider types, themes, the spatiotemporal
coverage of map layers and the service versions for 41,703 valid WMSs.
Furthermore, we appraised the stability and performance of basic operations for
1210 selected WMSs (i.e., GetCapabilities and GetMap). We discuss the major
reasons for request errors and performance issues, as well as the relationship
between service response times and the spatiotemporal distribution of client
monitoring sites. This paper will help service providers, end users and
developers of standards to grasp the status of global WMS resources, as well as
to understand the adoption status of OGC standards. The conclusions drawn in
this paper can benefit geospatial resource discovery, service performance
evaluation and guide service performance improvements.Comment: 24 pages; 15 figure
Web Video in Numbers - An Analysis of Web-Video Metadata
Web video is often used as a source of data in various fields of study. While
specialized subsets of web video, mainly earmarked for dedicated purposes, are
often analyzed in detail, there is little information available about the
properties of web video as a whole. In this paper we present insights gained
from the analysis of the metadata associated with more than 120 million videos
harvested from two popular web video platforms, vimeo and YouTube, in 2016 and
compare their properties with the ones found in commonly used video
collections. This comparison has revealed that existing collections do not (or
no longer) properly reflect the properties of web video "in the wild".Comment: Dataset available from http://download-dbis.dmi.unibas.ch/WWIN
Archiving the Relaxed Consistency Web
The historical, cultural, and intellectual importance of archiving the web
has been widely recognized. Today, all countries with high Internet penetration
rate have established high-profile archiving initiatives to crawl and archive
the fast-disappearing web content for long-term use. As web technologies
evolve, established web archiving techniques face challenges. This paper
focuses on the potential impact of the relaxed consistency web design on
crawler driven web archiving. Relaxed consistent websites may disseminate,
albeit ephemerally, inaccurate and even contradictory information. If captured
and preserved in the web archives as historical records, such information will
degrade the overall archival quality. To assess the extent of such quality
degradation, we build a simplified feed-following application and simulate its
operation with synthetic workloads. The results indicate that a non-trivial
portion of a relaxed consistency web archive may contain observable
inconsistency, and the inconsistency window may extend significantly longer
than that observed at the data store. We discuss the nature of such quality
degradation and propose a few possible remedies.Comment: 10 pages, 6 figures, CIKM 201
DeepSQLi: Deep Semantic Learning for Testing SQL Injection
Security is unarguably the most serious concern for Web applications, to
which SQL injection (SQLi) attack is one of the most devastating attacks.
Automatically testing SQLi vulnerabilities is of ultimate importance, yet is
unfortunately far from trivial to implement. This is because the existence of a
huge, or potentially infinite, number of variants and semantic possibilities of
SQL leading to SQLi attacks on various Web applications. In this paper, we
propose a deep natural language processing based tool, dubbed DeepSQLi, to
generate test cases for detecting SQLi vulnerabilities. Through adopting deep
learning based neural language model and sequence of words prediction, DeepSQLi
is equipped with the ability to learn the semantic knowledge embedded in SQLi
attacks, allowing it to translate user inputs (or a test case) into a new test
case, which is semantically related and potentially more sophisticated.
Experiments are conducted to compare DeepSQLi with SQLmap, a state-of-the-art
SQLi testing automation tool, on six real-world Web applications that are of
different scales, characteristics and domains. Empirical results demonstrate
the effectiveness and the remarkable superiority of DeepSQLi over SQLmap, such
that more SQLi vulnerabilities can be identified by using a less number of test
cases, whilst running much faster
- …