3,144 research outputs found
An Optimal Trade-off between Content Freshness and Refresh Cost
Caching is an effective mechanism for reducing bandwidth usage and
alleviating server load. However, the use of caching entails a compromise
between content freshness and refresh cost. An excessive refresh allows a high
degree of content freshness at a greater cost of system resource. Conversely, a
deficient refresh inhibits content freshness but saves the cost of resource
usages. To address the freshness-cost problem, we formulate the refresh
scheduling problem with a generic cost model and use this cost model to
determine an optimal refresh frequency that gives the best tradeoff between
refresh cost and content freshness. We prove the existence and uniqueness of an
optimal refresh frequency under the assumptions that the arrival of content
update is Poisson and the age-related cost monotonically increases with
decreasing freshness. In addition, we provide an analytic comparison of system
performance under fixed refresh scheduling and random refresh scheduling,
showing that with the same average refresh frequency two refresh schedulings
are mathematically equivalent in terms of the long-run average cost
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
CLEAR: a credible method to evaluate website archivability
Web archiving is crucial to ensure that cultural, scientific
and social heritage on the web remains accessible and usable
over time. A key aspect of the web archiving process is optimal data extraction from target websites. This procedure is
difficult for such reasons as, website complexity, plethora of
underlying technologies and ultimately the open-ended nature of the web. The purpose of this work is to establish
the notion of Website Archivability (WA) and to introduce
the Credible Live Evaluation of Archive Readiness (CLEAR)
method to measure WA for any website. Website Archivability captures the core aspects of a website crucial in diagnosing whether it has the potentiality to be archived with completeness and accuracy. An appreciation of the archivability
of a web site should provide archivists with a valuable tool
when assessing the possibilities of archiving material and in-
uence web design professionals to consider the implications
of their design decisions on the likelihood could be archived.
A prototype application, archiveready.com, has been established to demonstrate the viabiity of the proposed method
for assessing Website Archivability
A Brief History of Web Crawlers
Web crawlers visit internet applications, collect data, and learn about new
web pages from visited pages. Web crawlers have a long and interesting history.
Early web crawlers collected statistics about the web. In addition to
collecting statistics about the web and indexing the applications for search
engines, modern crawlers can be used to perform accessibility and vulnerability
checks on the application. Quick expansion of the web, and the complexity added
to web applications have made the process of crawling a very challenging one.
Throughout the history of web crawling many researchers and industrial groups
addressed different issues and challenges that web crawlers face. Different
solutions have been proposed to reduce the time and cost of crawling.
Performing an exhaustive crawl is a challenging question. Additionally
capturing the model of a modern web application and extracting data from it
automatically is another open question. What follows is a brief history of
different technique and algorithms used from the early days of crawling up to
the recent days. We introduce criteria to evaluate the relative performance of
web crawlers. Based on these criteria we plot the evolution of web crawlers and
compare their performanc
Cloud WorkBench - Infrastructure-as-Code Based Cloud Benchmarking
To optimally deploy their applications, users of Infrastructure-as-a-Service
clouds are required to evaluate the costs and performance of different
combinations of cloud configurations to find out which combination provides the
best service level for their specific application. Unfortunately, benchmarking
cloud services is cumbersome and error-prone. In this paper, we propose an
architecture and concrete implementation of a cloud benchmarking Web service,
which fosters the definition of reusable and representative benchmarks. In
distinction to existing work, our system is based on the notion of
Infrastructure-as-Code, which is a state of the art concept to define IT
infrastructure in a reproducible, well-defined, and testable way. We
demonstrate our system based on an illustrative case study, in which we measure
and compare the disk IO speeds of different instance and storage types in
Amazon EC2
Design and Analysis of a Dynamically Configured Log-based Distributed Security Event Detection Methodology
Military and defense organizations rely upon the security of data stored in, and communicated through, their cyber infrastructure to fulfill their mission objectives. It is essential to identify threats to the cyber infrastructure in a timely manner, so that mission risks can be recognized and mitigated. Centralized event logging and correlation is a proven method for identifying threats to cyber resources. However, centralized event logging is inflexible and does not scale well, because it consumes excessive network bandwidth and imposes significant storage and processing requirements on the central event log server. In this paper, we present a flexible, distributed event correlation system designed to overcome these limitations by distributing the event correlation workload across the network of event-producing systems. To demonstrate the utility of the methodology, we model and simulate centralized, decentralized, and hybrid log analysis environments over three accountability levels and compare their performance in terms of detection capability, network bandwidth utilization, database query efficiency, and configurability. The results show that when compared to centralized event correlation, dynamically configured distributed event correlation provides increased flexibility, a significant reduction in network traffic in low and medium accountability environments, and a decrease in database query execution time in the high-accountability case
An integrating text retrieval framework for Digital Ecosystems Paradigm
The purpose of the research is to provide effective information retrieval services for digital ?organisms? in a digital ecosystem by leveraging the power of Web searching technology. A novel integrating digital ecosystem search framework (a new digital organism) is proposed which employs the Web search technology and traditional database searching techniques to provide economic organisms with comprehensive, dynamic, and organization-oriented information retrieval ranging from the Internet to personal (semantic) desktop
Carbon information disclosure of enterprises and their value creation through market liquidity and cost of equity capital
Purpose: Drawing on asymmetric information and stakeholder theories, this paper investigates
two mechanisms, namely market liquidity and cost of equity capital, by which the carbon
information disclosure of enterprises can benefit their value creation.
Design/methodology/approach: In this research, web crawler technology is employed to
study the link between carbon information disclosure and enterprises value creation,and the
carbon information data are provided by all companies listed in Chinese A-share market
Findings: The results show that carbon information disclosure have significant positive
influence on enterprise value creation, which is embodied in the relationship between carbon
information disclosure quantity, depth and enterprise value creation, and market liquidity and
cost of equity capital play partially mediating role in it, while the influence of carbon
information disclosure quality and concentration on enterprise value creation are not significant
in statistics.
Research limitations/implications: This paper explains the influence path and mechanism
between carbon information disclosure and enterprise value creation deeply, answers the
question of whether carbon information disclosure affects enterprise value creation or not in
China.Practical implications: This paper finds that carbon information disclosure contributes
positively to enterprise value creation suggests that managers can reap more financial benefits
by disclosing more carbon information and investing carbon emissions management. So,
managers in the enterprises should strengthen the management of carbon information
disclosure behavior.
Originality/value: The paper gives a different perspective on the influence of carbon
information disclosure on enterprise value creation, and suggests a new direction to understand
carbon information disclosure behavior.Peer Reviewe
- …