120,826 research outputs found

    Indexing methods for web archives

    Get PDF
    There have been numerous efforts recently to digitize previously published content and preserving born-digital content leading to the widespread growth of large text reposi- tories. Web archives are such continuously growing text collections which contain ver- sions of documents spanning over long time periods. Web archives present many op- portunities for historical, cultural and political analyses. Consequently there is a grow- ing need for tools which can efficiently access and search them. In this work, we are interested in indexing methods for supporting text-search work- loads over web archives like time-travel queries and phrase queries. To this end we make the following contributions: ‱ Time-travel queries are keyword queries with a temporal predicate, e.g., “mpii saarland” @ [06/2009], which return versions of documents in the past. We in- troduce a novel index organization strategy, called index sharding, for efficiently supporting time-travel queries without incurring additional index-size blowup. We also propose index-maintenance approaches which scale to such continuously growing collections. ‱ We develop query-optimization techniques for time-travel queries called partition selection which maximizes recall at any given query-execution stage. ‱ We propose indexing methods to support phrase queries, e.g., “to be or not to be that is the question”. We index multi-word sequences and devise novel query- optimization methods over the indexed sequences to efficiently answer phrase queries. We demonstrate the superior performance of our approaches over existing methods by extensive experimentation on real-world web archives.In der jĂŒngsten Vergangenheit gab es zahlreiche BemĂŒhungen zuvor veröffentlichte Inhalte zu digitalisieren und elektronisch erstellte Inhalte zu erhalten. Dies fĂŒhrte zu einem weit verbreitenden Anstieg großer TextdatenbestĂ€nde. Webarchive sind eine solche Art konstant ansteigender Textdatensammlung. Sie enthalten mehrere Versionen von Dokumenten, welche sich ĂŒber lĂ€ngere ZeitrĂ€ume erstrecken. DarĂŒber hinaus bieten sie viele Möglichkeiten fĂŒr historische, kulturelle und politische Analysen. Infolgedessen gibt es einen wachsenden Bedarf an Werkzeugen, die eine effiziente Suche in Webarchiven und einen effizienten Zugriff auf die Daten erlauben. Der Fokus dieser Arbeit liegt auf Indexierungsverfahren, um die Arbeitslast von Textsuche auf Webarchiven zu unterstĂŒtzen, wie zum Beispiel time-travel queries oder phrase queries. Zu diesem Zweck leisten wir folgende BeitrĂ€ge: ‱ Time-travel queries sind Suchwortanfragen mit einem temporalen PrĂ€dikat. Zum Beispiel liefert die Anfrage “mpii saarland” @ [06/2009] Versionen des Dokuments aus der Vergangenheit als Ergebnis. Zur effizienten UnterstĂŒtzung solcher Anfragen ohne die IndexgrĂ¶ĂŸe aufzublasen, stellen wir eine neue Strategie zur Organisation von Indizes dar, so genanntes index sharding. Des Weiteren schlagen wir Wartungsverfahren fĂŒr Indizes vor, die fĂŒr solch konstant wachsende DatensĂ€tze skalieren. ‱ WirentwickelnTechnikenzurAnfrageoptimierungvontime-travelqueries, nachstehend partition selection genannt. Diese maximieren den Recall in jeder Phase der Anfrageverarbeitung. ‱ Wir stellen Indexierungsmethoden vor, die phrase queries unterstĂŒtzen, z. B. “Sein oder Nichtsein, das ist hier die Frage”. Wir indexieren Sequenzen bestehend aus mehreren Wörtern und entwerfen neue Optimierungsverfahren fĂŒr die indexierten Sequenzen, um phrase queries effizient zu beantworten. Die Performanz dieser Verfahren wird anhand von ausfĂŒhrlichen Experimenten auf realen Webarchiven demonstriert

    Location-based indexing for mobile context-aware access to a digital library

    Get PDF
    Mobile information systems need to collaborate with each other to provide seamless information access to the user. Information about the user and their context provides the points of contact between the systems. Location is the most basic user context. TIP is a mobile tourist information system that provides location-based access to documents in the digital library Greenstone. This paper identifies the challenges for providing effcient access to location-based information using the various access modes a tourist requires on their travels. We discuss our extended 2DR-tree approach to meet these challenges

    Semantic-driven matchmaking of web services using case-based reasoning

    Get PDF
    With the rapid proliferation of Web services as the medium of choice to securely publish application services beyond the firewall, the importance of accurate, yet flexible matchmaking of similar services gains importance both for the human user and for dynamic composition engines. In this paper, we present a novel approach that utilizes the case based reasoning methodology for modelling dynamic Web service discovery and matchmaking. Our framework considers Web services execution experiences in the decision making process and is highly adaptable to the service requester constraints. The framework also utilises OWL semantic descriptions extensively for implementing both the components of the CBR engine and the matchmaking profile of the Web services

    Development of an intelligent hypertext manual for the space shuttle hazardous gas detection system

    Get PDF
    A computer-based Integrated Knowledge System (IKS), the Intelligent Hypertext Manual (IHM), is being developed for the Space Shuttle Hazardous Gas Detection System (HGDS) at the Huntsville Operations Support Center (HOSC). The IHM stores all HGDS related knowledge and presents them in an interactive and intuitive manner. The IHM's purpose is to provide HGDS personnel with the capabilities of: enhancing the interpretation of real time data; recognizing and identifying possible faults in the Space Shuttle sub-system related to hazardous gas detections; locating applicable documentation related to procedures, constraints, and previous fault histories; and assisting in the training of personnel

    Issues Related to the Emergence of the Information Superhighway and California Societal Changes, IISTPS Report 96-4

    Get PDF
    The Norman Y. Mineta International Institute for Surface Transportation Policy Studies (IISTPS) at San JosĂ© State University (SJSU) conducted this project to review the continuing development of the Internet and the Information Superhighway. Emphasis was placed on an examination of the impact on commuting and working patterns in California, and an analysis of how public transportation agencies, including Caltrans, might take advantage of the new communications technologies. The document reviews the technology underlying the current Internet “structure” and examines anticipated developments. It is important to note that much of the research for this limited-scope project was conducted during 1995, and the topic is so rapidly evolving that some information is almost automatically “dated.” The report also examines how transportation agencies are basically similar in structure and function to other business entities, and how they can continue to utilize the emerging technologies to improve internal and external communications. As part of a detailed discussion of specific transportation agency functions, it is noted that the concept of a “Roundtable Forum,” growing out of developments in Concurrent Engineering, can provide an opportunity for representatives from multiple jurisdictions to utilize the Internet for more coordinated decision-making. The report also included an extensive analysis of demographic trends in California in recent years, such as commute and recreational activities, and identifies how the emerging technologies may impact future changes

    Special Libraries, Spring 1995

    Get PDF
    Volume 86, Issue 2https://scholarworks.sjsu.edu/sla_sl_1995/1001/thumbnail.jp

    Special Libraries, April 1956

    Get PDF
    Volume 47, Issue 4https://scholarworks.sjsu.edu/sla_sl_1956/1003/thumbnail.jp
    • 

    corecore