24 research outputs found

    A parallel framework for in-memory construction of term-partitioned inverted indexes

    Get PDF
    Cataloged from PDF version of article.With the advances in cloud computing and huge RAMs provided by 64-bit architectures, it is possible to tackle large problems using memory-based solutions. Construction of term-based, partitioned, parallel inverted indexes is a communication intensive task and suitable for memory-based modeling. In this paper, we provide an efficient parallel framework for in-memory construction of term-based partitioned, inverted indexes. We show that, by utilizing an efficient bucketing scheme, we can eliminate the need for the generation of a global vocabulary. We propose and investigate assignment schemes that can reduce the communication overheads while minimizing the storage and final query processing imbalance. We also present a study on how communication among processors should be carried out with limited communication memory in order to reduce the total inversion time. We present several different communication-memory organizations and discuss their advantages and shortcomings. The conducted experiments indicate promising results. © 2012 The Author. Published by Oxford University Press on behalf of The British Computer Society

    Memory resident parallel inverted index construction

    Get PDF
    Advances in cloud computing, 64-bit architectures and huge RAMs enable performing many search related tasks in memory.We argue that term-based partitioned parallel inverted index construction is among such tasks, and provide an efficient parallel framework that achieves this task. We show that by utilizing an efficient bucketing scheme we can eliminate the need for the generation of a global index and reduce the communication overhead without disturbing balancing constraint. We also propose and investigate assignment schemes that can further reduce communication overheads without disturbing balancing constraints. The conducted experiments indicate promising results. © 2012 Springer-Verlag London Limited

    A machine learning approach for result caching in web search engines

    Get PDF
    A commonly used technique for improving search engine performance is result caching. In result caching, precomputed results (e.g., URLs and snippets of best matching pages) of certain queries are stored in a fast-access storage. The future occurrences of a query whose results are already stored in the cache can be directly served by the result cache, eliminating the need to process the query using costly computing resources. Although other performance metrics are possible, the main performance metric for evaluating the success of a result cache is hit rate. In this work, we present a machine learning approach to improve the hit rate of a result cache by facilitating a large number of features extracted from search engine query logs. We then apply the proposed machine learning approach to static, dynamic, and static-dynamic caching. Compared to the previous methods in the literature, the proposed approach improves the hit rate of the result cache up to 0.66%, which corresponds to 9.60% of the potential room for improvement. © 2017 Elsevier Lt

    Criminal Information Mining

    Get PDF
    In the previous chapters, the different aspects of the authorship analysis problem were discussed. This chapter will propose a framework for extracting criminal information from the textual content of suspicious online messages. Archives of online messages, including chat logs, e-mails, web forums, and blogs, often contain an enormous amount of forensically relevant information about potential suspects and their illegitimate activities. Such information is usually found in either the header or body of an online document. The IP addresses, hostnames, sender and recipient addresses contained in the e-mail header, the user ID used in chats, and the screen names used in web-based communication help reveal information at the user or application level. For instance, information extracted from a suspicious e-mail corpus helps us to learn who the senders and recipients are, how often they communicate, and how many types of communities/cliques there are in a dataset. Such information also gives us an insight into the inter and intra-community patterns of communication. A clique or a community is a group of users who have an online communication link between them. Header content or user-level information is easy to extract and straightforward to use for the purposes of investigation

    SE4SEE: A grid-enabled search engine for South-East Europe

    Get PDF
    Search Engine for South-East Europe (SE4SEE) is an application project aiming to develop a grid-enabled search engine that specifically targets the countries in the South-East Europe. It is one of the two selected regional applications currently implemented in the SEE-GRID FP6 project. This paper describes the design details of SE4SEE and provides an architectural overview of the application

    Authorship Analysis Approaches

    Get PDF
    This chapter presents an overview of authorship analysis from multiple standpoints. It includes historical perspective, description of stylometric features, and authorship analysis techniques and their limitations

    Chat mining: Predicting user and message attributes in computer-mediated communication

    Get PDF
    Cataloged from PDF version of article.The focus of this paper is to investigate the possibility of predicting several user and message attributes in text-based, real-time, online messaging services. For this purpose, a large collection of chat messages is examined. The applicability of various supervised classification techniques for extracting information from the chat messages is evaluated. Two competing models are used for defining the chat mining problem. A term-based approach is used to investigate the user and message attributes in the context of vocabulary use while a style-based approach is used to examine the chat messages according to the variations in the authors' writing styles. Among 100 authors, the identity of an author is correctly predicted with 99.7% accuracy. Moreover, the reverse problem is exploited, and the effect of author attributes on computer-mediated communications is discussed. © 2008 Elsevier Ltd. All rights reserved

    Effect of different modes of erbium:Yttrium aluminum garnet laser on shear bond strength to dentin

    Get PDF
    Objectives: The aim of this study was to evaluate the effect of different surface treatments on the shear bond strength (SBS) of resin composites to dentin using total etch dentin bonding adhesives. Materials and Methods: Sixty extracted human molars were flattened to obtain dentin surfaces. The samples were divided into three groups (n = 20): Group I: 37% phosphoric acid + optibond FL + resin composite; Group II: Erbium:yttrium aluminum garnet (Er:YAG) laser (medium short pulse [MSP] mode, 120 mJ/10 Hz) + optibond FL + resin composite; Group III: Er:YAG laser (quantum square pulse [QSP] mode, 120 mJ/10 Hz) + optibond FL + resin composite. After the specimens were prepared, the SBS test was performed at a crosshead speed of 0.5 mm/min. The fractured specimens were examined under a stereomicroscope to evaluate the fracture pattern. Statistical analyses were performed with one‑way ANOVA and Tukey’s honestly significant difference tests. One sample of treated dentin surface from each group was sputter‑coated with gold, and scanning electron microscope (SEM) images were captured.Results: Acid etching showed significantly higher SBS than the other groups (P < 0.05). However, the difference between Er:YAG MSP and QSP mode groups was not statistically significant (P > 0.05). SEM images of the acid‑etched dentin surface showed opened dentinal tubule with a regular surface, but Er:YAG MSP mode treated surface was irregular. The surface treated with Er:YAG QSP mode represented wide dentinal tubules with a clean and flat surface.Conclusion: Using different modes (MSP and QSP) of Er:YAG laser for dentin surface treatment before application of total etch adhesives is still not an sufficient alternative compared to acid etching.Keywords: Acid etching, dentin conditioning, erbium:yttrium aluminum garnet laser, quantum square pulse mod

    Architecture of a grid-enabled Web search engine

    Get PDF
    Cataloged from PDF version of article.Search Engine for South-East Europe (SE4SEE) is a socio-cultural search engine running on the grid infrastructure. It offers a personalized, on-demand, country-specific, category-based Web search facility. The main goal of SE4SEE is to attack the page freshness problem by performing the search on the original pages residing on the Web, rather than on the previously fetched copies as done in the traditional search engines. SE4SEE also aims to obtain high download rates in Web crawling by making use of the geographically distributed nature of the grid. In this work, we present the architectural design issues and implementation details of this search engine. We conduct various experiments to illustrate performance results obtained on a grid infrastructure and justify the use of the search strategy employed in SE4SEE. © 2006 Elsevier Ltd. All rights reserved
    corecore