532 research outputs found

    REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS

    Get PDF
    Web crawler visits websites for the purpose of indexing. The dynamic nature of today’s web makes the crawling process harder than before as web contents are continuously updated. In addition, crawling speed is important considering tsunami of big data that need to be indexed among competitive search engines. This research project is aimed to provide survey of current problems in distributed web crawlers. It then investigate the best crawling speed between dynamic globally unique identifiers (GUIDs) and the traditional static identifiers (IDs). Experiment are done by implementing Arachnot.net web crawlers to index up to 20000 locally generated URLs using both techniques. The results shown that URLs crawling time can be reduced up to 7% by using GUIDs technique instead of using IDs

    REALIZATION OF A SYSTEM OF EFFICIENT QUERYING OF HIERARCHICAL DATA TRANSFORMED INTO A QUASI-RELATIONAL MODEL

    Get PDF
    Extensible Markup Language was mainly designed to easily represent documents; however, it has evolved and is now widely used for the representation of arbitrary data structures. There are many Application Programming Interfaces (APIs) to aid software developers with processing XML data. There are also many languages for querying and transforming XML, such as XPath or XQuery, which are widely used in this field. However, because of the great flexibility of XML documents, there are no unified data storing and processing standards, tools, or systems.On the other hand, a relational model is still the most-commonly and widely used standard for storing and querying data. Many Database Management Systems consist of components for loading and transforming hierarchical data. DB2 pureXML or Oracle SQLX are some of the most-recognized examples. Unfortunately, all of them require knowledge of additional tools, standards, and languages dedicated to accessing hierarchical data (for example, XPath or XQuery). Transforming XML documents into a (quasi)relational model and then querying (transformed) documents with SQL or SQL–like queries would significantly simplify the development of data-oriented systems and applications.In this paper, an implementation of the SQLxD query system is proposed. The XML documents are converted into a quasi-relational model (preserving their hierarchical structure), and the SQL–like language based on SQL-92 allows for efficient data querying

    The LCG POOL Project, General Overview and Project Structure

    Full text link
    The POOL project has been created to implement a common persistency framework for the LHC Computing Grid (LCG) application area. POOL is tasked to store experiment data and meta data in the multi Petabyte area in a distributed and grid enabled way. First production use of new framework is expected for summer 2003. The project follows a hybrid approach combining C++ Object streaming technology such as ROOT I/O for the bulk data with a transactionally safe relational database (RDBMS) store such as MySQL. POOL is based a strict component approach - as laid down in the LCG persistency and blue print RTAG documents - providing navigational access to distributed data without exposing details of the particular storage technology. This contribution describes the project breakdown into work packages, the high level interaction between the main pool components and summarizes current status and plans.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 5 pages. PSN MOKT00

    Diff-based model synchronization in an industrial MDD process

    Get PDF

    User's and Administrator's Manual of AMGA Metadata Catalog v 2.4.0 (EMI-3)

    Get PDF
    User's and Administrator's Manual of AMGA Metadata Catalog v 2.4.0 (EMI-3

    Automatic Crash Recovery: Internet Explorer\u27s black box

    Get PDF
    A good portion of today\u27s investigations include, at least in part, an examination of the user\u27s web history. Although it has lost ground over the past several years, Microsoft\u27s Internet Explorer still accounts for a large portion of the web browser market share. Most users are now aware that Internet Explorer will save browsing history, user names, passwords and form history. Consequently some users seek to eliminate these artifacts, leaving behind less evidence for examiners to discover during investigations. However, most users, and probably a good portion of examiners are unaware Automatic Crash Recovery can leave a gold mine of recent browsing history in spite of the users attempts to delete historical artifacts. As investigators, we must continually be looking for new sources of evidence; Automatic Crash Recovery is it
    • …
    corecore