532 research outputs found
REDUCING DISTRIBUTED URLS CRAWLING TIME : A COMPARISON OF GUIDS AND IDS
Web crawler visits websites for the purpose of indexing. The dynamic nature of today’s web makes the
crawling process harder than before as web contents are continuously updated. In addition, crawling speed
is important considering tsunami of big data that need to be indexed among competitive search engines.
This research project is aimed to provide survey of current problems in distributed web crawlers. It then
investigate the best crawling speed between dynamic globally unique identifiers (GUIDs) and the
traditional static identifiers (IDs). Experiment are done by implementing Arachnot.net web crawlers to
index up to 20000 locally generated URLs using both techniques. The results shown that URLs crawling
time can be reduced up to 7% by using GUIDs technique instead of using IDs
REALIZATION OF A SYSTEM OF EFFICIENT QUERYING OF HIERARCHICAL DATA TRANSFORMED INTO A QUASI-RELATIONAL MODEL
Extensible Markup Language was mainly designed to easily represent documents; however, it has evolved and is now widely used for the representation of arbitrary data structures. There are many Application Programming Interfaces (APIs) to aid software developers with processing XML data. There are also many languages for querying and transforming XML, such as XPath or XQuery, which are widely used in this field. However, because of the great flexibility of XML documents, there are no unified data storing and processing standards, tools, or systems.On the other hand, a relational model is still the most-commonly and widely used standard for storing and querying data. Many Database Management Systems consist of components for loading and transforming hierarchical data. DB2 pureXML or Oracle SQLX are some of the most-recognized examples. Unfortunately, all of them require knowledge of additional tools, standards, and languages dedicated to accessing hierarchical data (for example, XPath or XQuery). Transforming XML documents into a (quasi)relational model and then querying (transformed) documents with SQL or SQL–like queries would significantly simplify the development of data-oriented systems and applications.In this paper, an implementation of the SQLxD query system is proposed. The XML documents are converted into a quasi-relational model (preserving their hierarchical structure), and the SQL–like language based on SQL-92 allows for efficient data querying
Recommended from our members
INFERENCE-BASED FORENSICS FOR EXTRACTING INFORMATION FROM DIVERSE SOURCES
Digital forensics is tasked with the examination and extraction of evidence from a diverse set of devices and information sources. While digital forensics has long been synonymous with file recovery, this label no longer adequately describes the science’s role in modern investigations. Spurred by evolving technologies and online crime, law enforcement is shifting the focus of digital forensics from its traditional role in the final stages of an investigation to assisting investigators in the earliest phases — often before a suspect has been identified and a warrant served. Investigators need new forensic techniques to investigate online crimes, such as child pornography trafficking on peer-to-peer networks (p2p), and to extract evidence from new information sources, such as mobile phones. The traditional approach of developing tools tailored specifically to each source is no longer tenable given the diversity, volume of storage, and introduction rate of new devices and network applications. Instead, we propose the adoption of flexible, inference-based techniques to extract evidence from any format. Such techniques can be readily applied to a wide variety of different evidence sources without requiring significant manual work on the investigator’s part. The primary contribution of my dissertation is a set of novel forensic techniques for extracting information from diverse data sources. We frame the evaluation using two different, but increasingly important, forensic scenarios: mobile phone triage and network-based investigations.
Via probabilistic descriptions of typical data structures, and using a classic dynamic programming algorithm, our phone triage techniques are able to identify user information in phones across varied models and manufacturers. We also show how to incorporate feedback from the investigator to improve the usability of extracted information.
For network-based investigations, we quantify and characterize the extent of contraband trafficking on peer-to-peer networks. We suggest various techniques for prioritizing law enforcement’s limited resources. We finally investigate techniques that use system logs to generate and then analyze a finite state model of a protocol’s implementation. The objective is to infer behavior that an investigator can leverage to further law enforcement objectives.
We evaluate all of our techniques using the real-world legal constraints and restrictions of investigators
The LCG POOL Project, General Overview and Project Structure
The POOL project has been created to implement a common persistency framework
for the LHC Computing Grid (LCG) application area. POOL is tasked to store
experiment data and meta data in the multi Petabyte area in a distributed and
grid enabled way. First production use of new framework is expected for summer
2003. The project follows a hybrid approach combining C++ Object streaming
technology such as ROOT I/O for the bulk data with a transactionally safe
relational database (RDBMS) store such as MySQL. POOL is based a strict
component approach - as laid down in the LCG persistency and blue print RTAG
documents - providing navigational access to distributed data without exposing
details of the particular storage technology. This contribution describes the
project breakdown into work packages, the high level interaction between the
main pool components and summarizes current status and plans.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics
(CHEP03), La Jolla, Ca, USA, March 2003, 5 pages. PSN MOKT00
User's and Administrator's Manual of AMGA Metadata Catalog v 2.4.0 (EMI-3)
User's and Administrator's Manual of AMGA Metadata Catalog v 2.4.0 (EMI-3
Automatic Crash Recovery: Internet Explorer\u27s black box
A good portion of today\u27s investigations include, at least in part, an examination of the user\u27s web history. Although it has lost ground over the past several years, Microsoft\u27s Internet Explorer still accounts for a large portion of the web browser market share. Most users are now aware that Internet Explorer will save browsing history, user names, passwords and form history. Consequently some users seek to eliminate these artifacts, leaving behind less evidence for examiners to discover during investigations. However, most users, and probably a good portion of examiners are unaware Automatic Crash Recovery can leave a gold mine of recent browsing history in spite of the users attempts to delete historical artifacts. As investigators, we must continually be looking for new sources of evidence; Automatic Crash Recovery is it
- …