Search CORE

532 research outputs found

REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS

Author: Gede Pramudya Ananta
Hussin Burairah
Shafei Suhailan
Shakir I.
Shibghatullah Abdul Samad
Publication venue: Little Lion Scientific Islamabad Pakistan
Publication date: 10/09/2014
Field of study

Web crawler visits websites for the purpose of indexing. The dynamic nature of today’s web makes the crawling process harder than before as web contents are continuously updated. In addition, crawling speed is important considering tsunami of big data that need to be indexed among competitive search engines. This research project is aimed to provide survey of current problems in distributed web crawlers. It then investigate the best crawling speed between dynamic globally unique identifiers (GUIDs) and the traditional static identifiers (IDs). Experiment are done by implementing Arachnot.net web crawlers to index up to 20000 locally generated URLs using both techniques. The results shown that URLs crawling time can be reduced up to 7% by using GUIDs technique instead of using IDs

Universiti Teknikal Malaysia Melaka (UTeM) Repository

REALIZATION OF A SYSTEM OF EFFICIENT QUERYING OF HIERARCHICAL DATA TRANSFORMED INTO A QUASI-RELATIONAL MODEL

Author: Furmanek Adam
Marcjan Robert
Siwik Leszek
Tokaj Jakub
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 23/09/2016
Field of study

Extensible Markup Language was mainly designed to easily represent documents; however, it has evolved and is now widely used for the representation of arbitrary data structures. There are many Application Programming Interfaces (APIs) to aid software developers with processing XML data. There are also many languages for querying and transforming XML, such as XPath or XQuery, which are widely used in this field. However, because of the great flexibility of XML documents, there are no unified data storing and processing standards, tools, or systems.On the other hand, a relational model is still the most-commonly and widely used standard for storing and querying data. Many Database Management Systems consist of components for loading and transforming hierarchical data. DB2 pureXML or Oracle SQLX are some of the most-recognized examples. Unfortunately, all of them require knowledge of additional tools, standards, and languages dedicated to accessing hierarchical data (for example, XPath or XQuery). Transforming XML documents into a (quasi)relational model and then querying (transformed) documents with SQL or SQL–like queries would significantly simplify the development of data-oriented systems and applications.In this paper, an implementation of the SQLxD query system is proposed. The XML documents are converted into a quasi-relational model (preserving their hierarchical structure), and the SQL–like language based on SQL-92 allows for efficient data querying

Computer Science Journal (AGH University of Science and Technology, Krakow)

Recommended from our members

INFERENCE-BASED FORENSICS FOR EXTRACTING INFORMATION FROM DIVERSE SOURCES

Author: Walls Robert J
Publication venue: ScholarWorks@UMass Amherst
Publication date: 13/11/2014
Field of study

Digital forensics is tasked with the examination and extraction of evidence from a diverse set of devices and information sources. While digital forensics has long been synonymous with file recovery, this label no longer adequately describes the science’s role in modern investigations. Spurred by evolving technologies and online crime, law enforcement is shifting the focus of digital forensics from its traditional role in the final stages of an investigation to assisting investigators in the earliest phases — often before a suspect has been identified and a warrant served. Investigators need new forensic techniques to investigate online crimes, such as child pornography trafficking on peer-to-peer networks (p2p), and to extract evidence from new information sources, such as mobile phones. The traditional approach of developing tools tailored specifically to each source is no longer tenable given the diversity, volume of storage, and introduction rate of new devices and network applications. Instead, we propose the adoption of flexible, inference-based techniques to extract evidence from any format. Such techniques can be readily applied to a wide variety of different evidence sources without requiring significant manual work on the investigator’s part. The primary contribution of my dissertation is a set of novel forensic techniques for extracting information from diverse data sources. We frame the evaluation using two different, but increasingly important, forensic scenarios: mobile phone triage and network-based investigations. Via probabilistic descriptions of typical data structures, and using a classic dynamic programming algorithm, our phone triage techniques are able to identify user information in phones across varied models and manufacturers. We also show how to incorporate feedback from the investigator to improve the usability of extracted information. For network-based investigations, we quantify and characterize the extent of contraband trafficking on peer-to-peer networks. We suggest various techniques for prioritizing law enforcement’s limited resources. We finally investigate techniques that use system logs to generate and then analyze a finite state model of a protocol’s implementation. The objective is to infer behavior that an investigator can leverage to further law enforcement objectives. We evaluate all of our techniques using the real-world legal constraints and restrictions of investigators

ScholarWorks@UMass Amherst

The LCG POOL Project, General Overview and Project Structure

Author: Duellmann Dirk
Publication venue
Publication date: 16/06/2003
Field of study

The POOL project has been created to implement a common persistency framework for the LHC Computing Grid (LCG) application area. POOL is tasked to store experiment data and meta data in the multi Petabyte area in a distributed and grid enabled way. First production use of new framework is expected for summer 2003. The project follows a hybrid approach combining C++ Object streaming technology such as ROOT I/O for the bulk data with a transactionally safe relational database (RDBMS) store such as MySQL. POOL is based a strict component approach - as laid down in the LCG persistency and blue print RTAG documents - providing navigational access to distributed data without exposing details of the particular storage technology. This contribution describes the project breakdown into work packages, the high level interaction between the main pool components and summarizes current status and plans.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 5 pages. PSN MOKT00

arXiv.org e-Print Archive

CERN Document Server

Diff-based model synchronization in an industrial MDD process

Author: Kindler Ekkart
Könemann Patrick
Unland Ludger
Publication venue: Technical University of Denmark, DTU Informatics, Building 321
Publication date: 01/01/2008
Field of study

Online Research Database In Technology

User's and Administrator's Manual of AMGA Metadata Catalog v 2.4.0 (EMI-3)

Author: Hwang Soon Wook
Publication venue
Publication date: 27/05/2013
Field of study

User's and Administrator's Manual of AMGA Metadata Catalog v 2.4.0 (EMI-3

ZENODO

Automatic Crash Recovery: Internet Explorer\u27s black box

Author: Moran John
Orr Douglas
Publication venue: (Print) 1558-7215
Publication date: 01/01/2012
Field of study

A good portion of today\u27s investigations include, at least in part, an examination of the user\u27s web history. Although it has lost ground over the past several years, Microsoft\u27s Internet Explorer still accounts for a large portion of the web browser market share. Most users are now aware that Internet Explorer will save browsing history, user names, passwords and form history. Consequently some users seek to eliminate these artifacts, leaving behind less evidence for examiners to discover during investigations. However, most users, and probably a good portion of examiners are unaware Automatic Crash Recovery can leave a gold mine of recent browsing history in spite of the users attempts to delete historical artifacts. As investigators, we must continually be looking for new sources of evidence; Automatic Crash Recovery is it

Crossref

Embry-Riddle Aeronautical University