Search CORE

2,510 research outputs found

Service broker based on cloud service description language

Author: Barrie Peter
Razaq Abdul
Tianfield Huaglory
Yue Hong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/04/2017
Field of study

Cloud WorkBench - Infrastructure-as-Code Based Cloud Benchmarking

Author: Cito Jurgen
Gall Harald
Leitner Philipp
Scheuner Joel
Publication venue
Publication date: 20/08/2014
Field of study

To optimally deploy their applications, users of Infrastructure-as-a-Service clouds are required to evaluate the costs and performance of different combinations of cloud configurations to find out which combination provides the best service level for their specific application. Unfortunately, benchmarking cloud services is cumbersome and error-prone. In this paper, we propose an architecture and concrete implementation of a cloud benchmarking Web service, which fosters the definition of reusable and representative benchmarks. In distinction to existing work, our system is based on the notion of Infrastructure-as-Code, which is a state of the art concept to define IT infrastructure in a reproducible, well-defined, and testable way. We demonstrate our system based on an illustrative case study, in which we measure and compare the disk IO speeds of different instance and storage types in Amazon EC2

arXiv.org e-Print Archive

Crossref

ZORA

Automatic supervised information extraction of structured web data

Author: Pérez Ruiz David
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/06/2019
Field of study

The overall purpose of this project is, in short words, to create a system able to extract vital information from product web pages just like a human would. Information like the name of the product, its description, price tag, company that produces it, and so on. At a first glimpse, this may not seem extraordinary or technically difficult, since web scraping techniques exist from long ago (like the python library Beautiful Soup for instance, an HTML parser1 released in 2004). But let us think for a second on what it actually means being able to extract desired information from any given web source: the way information is displayed can be extremely varied, not only visually, but also semantically. For instance, some hotel booking web pages display at once all prices for the different room types, while medium-sized consumer products in websites like Amazon offer the main product in detail and then more small-sized product recommendations further down the page, being the latter the preferred way of displaying assets by most retail companies. And each with its own styling and search engines. With the above said, the task of mining valuable data from the web now does not sound as easy as it first seemed. Hence the purpose of this project is to shine some light on the Automatic Supervised Information Extraction of Structured Web Data problem. It is important to think if developing such a solution is really valuable at all. Such an endeavour both in time and computing resources should lead to a useful end result, at least on paper, to justify it. The opinion of this author is that it does lead to a potentially valuable result. The targeted extraction of information of publicly available consumer-oriented content at large scale in an accurate, reliable and future proof manner could provide an incredibly useful and large amount of data. This data, if kept updated, could create endless opportunities for Business Intelligence, although exactly which ones is beyond the scope of this work. A simple metaphor explains the potential value of this work: if an oil company were to be told where are all the oil reserves in the planet, it still should need to invest in machinery, workers and time to successfully exploit them, but half of the job would have already been done2. As the reader will see in this work, the way the issue is tackled is by building a somehow complex architecture that ends in an Artificial Neural Network3. A quick overview of such architecture is as follows: first find the URLs that lead to the product pages that contain the desired data that is going to be extracted inside a given site (like URLs that lead to ”action figure” products inside the site ebay.com); second, per each URL passed, extract its HTML and make a screenshot of the page, and store this data in a suitable and scalable fashion; third, label the data that will be fed to the NN4; fourth, prepare the aforementioned data to be input in an NN; fifth, train the NN; and sixth, deploy the NN to make [hopefully accurate] predictions

UPCommons. Portal del coneixement obert de la UPC

Archiving the Relaxed Consistency Web

Author: Jordan Ramiro
Liu Jinyang
Van de Sompel Herbert
van Reenen Johann
Xie Zhiwu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

The historical, cultural, and intellectual importance of archiving the web has been widely recognized. Today, all countries with high Internet penetration rate have established high-profile archiving initiatives to crawl and archive the fast-disappearing web content for long-term use. As web technologies evolve, established web archiving techniques face challenges. This paper focuses on the potential impact of the relaxed consistency web design on crawler driven web archiving. Relaxed consistent websites may disseminate, albeit ephemerally, inaccurate and even contradictory information. If captured and preserved in the web archives as historical records, such information will degrade the overall archival quality. To assess the extent of such quality degradation, we build a simplified feed-following application and simulate its operation with synthetic workloads. The results indicate that a non-trivial portion of a relaxed consistency web archive may contain observable inconsistency, and the inconsistency window may extend significantly longer than that observed at the data store. We discuss the nature of such quality degradation and propose a few possible remedies.Comment: 10 pages, 6 figures, CIKM 201

arXiv.org e-Print Archive

Crossref