7,102 research outputs found

    Efficient similarity computations on parallel machines using data shaping

    Get PDF
    Similarity computation is a fundamental operation in all forms of data. Big Data is, typically, characterized by attributes such as volume, velocity, variety, veracity, etc. In general, Big Data variety appears as structured, semi-structured or unstructured forms. The volume of Big Data in general, and semi-structured data in particular, is increasing at a phenomenal rate. Big Data phenomenon is posing new set of challenges to similarity computation problems occurring in semi-structured data. Technology and processor architecture trends suggest very strongly that future processors shall have ten\u27s of thousands of cores (hardware threads). Another crucial trend is that ratio between on-chip and off-chip memory to core counts is decreasing. State-of-the-art parallel computing platforms such as General Purpose Graphics Processors (GPUs) and MICs are promising for high performance as well high throughput computing. However, processing semi-structured component of Big Data efficiently using parallel computing systems (e.g. GPUs) is challenging. Reason being most of the emerging platforms (e.g. GPUs) are organized as Single Instruction Multiple Thread/Data machines which are highly structured, where several cores (streaming processors) operate in lock-step manner, or they require a high degree of task-level parallelism. We argue that effective and efficient solutions to key similarity computation problems need to operate in a synergistic manner with the underlying computing hardware. Moreover, semi-structured form input data needs to be shaped or reorganized with the goal to exploit the enormous computing power of \textit{state-of-the-art} highly threaded architectures such as GPUs. For example, shaping input data (via encoding) with minimal data-dependence can facilitate flexible and concurrent computations on high throughput accelerators/co-processors such as GPU, MIC, etc. We consider various instances of traditional and futuristic problems occurring in intersection of semi-structured data and data analytics. Preprocessing is an operation common at initial stages of data processing pipelines. Typically, the preprocessing involves operations such as data extraction, data selection, etc. In context of semi-structured data, twig filtering is used in identifying (and extracting) data of interest. Duplicate detection and record linkage operations are useful in preprocessing tasks such as data cleaning, data fusion, and also useful in data mining, etc., in order to find similar tree objects. Likewise, tree edit is a fundamental metric used in context of tree problems; and similarity computation between trees another key problem in context of Big Data. This dissertation makes a case for platform-centric data shaping as a potent mechanism to tackle the data- and architecture-borne issues in context of semi-structured data processing on GPU and GPU-like parallel architecture machines. In this dissertation, we propose several data shaping techniques for tree matching problems occurring in semi-structured data. We experiment with real world datasets. The experimental results obtained reveal that the proposed platform-centric data shaping approach is effective for computing similarities between tree objects using GPGPUs. The techniques proposed result in performance gains up to three orders of magnitude, subject to problem and platform

    Factors shaping the evolution of electronic documentation systems

    Get PDF
    The main goal is to prepare the space station technical and managerial structure for likely changes in the creation, capture, transfer, and utilization of knowledge. By anticipating advances, the design of Space Station Project (SSP) information systems can be tailored to facilitate a progression of increasingly sophisticated strategies as the space station evolves. Future generations of advanced information systems will use increases in power to deliver environmentally meaningful, contextually targeted, interconnected data (knowledge). The concept of a Knowledge Base Management System is emerging when the problem is focused on how information systems can perform such a conversion of raw data. Such a system would include traditional management functions for large space databases. Added artificial intelligence features might encompass co-existing knowledge representation schemes; effective control structures for deductive, plausible, and inductive reasoning; means for knowledge acquisition, refinement, and validation; explanation facilities; and dynamic human intervention. The major areas covered include: alternative knowledge representation approaches; advanced user interface capabilities; computer-supported cooperative work; the evolution of information system hardware; standardization, compatibility, and connectivity; and organizational impacts of information intensive environments

    COSPO/CENDI Industry Day Conference

    Get PDF
    The conference's objective was to provide a forum where government information managers and industry information technology experts could have an open exchange and discuss their respective needs and compare them to the available, or soon to be available, solutions. Technical summaries and points of contact are provided for the following sessions: secure products, protocols, and encryption; information providers; electronic document management and publishing; information indexing, discovery, and retrieval (IIDR); automated language translators; IIDR - natural language capabilities; IIDR - advanced technologies; IIDR - distributed heterogeneous and large database support; and communications - speed, bandwidth, and wireless

    Massive Spread of OXA-48 Carbapenemase-Producing Enterobacteriaceae in the Environment of a Swiss Companion Animal Clinic

    Full text link
    Background: Companion animal clinics contribute to the spread of antimicrobial resistant microorganisms (ARM) and outbreaks with ARM of public health concern have been described. Methods: As part of a project to assess infection prevention and control (IPC) standards in companion animal clinics in Switzerland, a total of 200 swabs from surfaces and 20 hand swabs from employees were collected during four days in a medium-sized clinic and analyzed for extended spectrum beta-lactamase-producing Enterobacteriaceae (ESBL-E), carbapenemase-producing Enterobacteriaceae (CPE), vancomycin-resistant enterococci (VRE), and methicillin-resistant staphylococci (MRS). Results: A total of 22 (11.0%) environmental specimen yielded CPE, 14 (7.0%) ESBL-E, and 7 (3.5%) MRS; MR Staphylococcus aureus were isolated from two (10.0%) hand swabs. The CPE isolates comprised Escherichia coli, Klebsiella pneumoniae, Enterobacter hormaechei, Citrobacter braakii, and Serratia marcescens. Whole genome sequencing revealed that all CPE carried closely related blaOXA-48 plasmids, suggesting a plasmidic spread within the clinic. The clinic exhibited major deficits in surface disinfection, hand hygiene infrastructure, and hand hygiene compliance. CPE were present in various areas, including those without patient contact. The study documented plasmidic dissemination of blaOXA-48 in a companion animal clinic with low IPC standards. This poses a worrisome threat to public health and highlights the need to foster IPC standards in veterinary clinics to prevent the spread of ARM into the community

    Massive Spread of OXA-48 Carbapenemase-Producing Enterobacteriaceae in the Environment of a Swiss Companion Animal Clinic.

    Get PDF
    BACKGROUND Companion animal clinics contribute to the spread of antimicrobial resistant microorganisms (ARM) and outbreaks with ARM of public health concern have been described. METHODS As part of a project to assess infection prevention and control (IPC) standards in companion animal clinics in Switzerland, a total of 200 swabs from surfaces and 20 hand swabs from employees were collected during four days in a medium-sized clinic and analyzed for extended spectrum beta-lactamase-producing Enterobacteriaceae (ESBL-E), carbapenemase-producing Enterobacteriaceae (CPE), vancomycin-resistant enterococci (VRE), and methicillin-resistant staphylococci (MRS). RESULTS A total of 22 (11.0%) environmental specimen yielded CPE, 14 (7.0%) ESBL-E, and 7 (3.5%) MRS; MR Staphylococcus aureus were isolated from two (10.0%) hand swabs. The CPE isolates comprised Escherichia coli, Klebsiella pneumoniae, Enterobacter hormaechei, Citrobacter braakii, and Serratia marcescens. Whole genome sequencing revealed that all CPE carried closely related blaOXA-48 plasmids, suggesting a plasmidic spread within the clinic. The clinic exhibited major deficits in surface disinfection, hand hygiene infrastructure, and hand hygiene compliance. CPE were present in various areas, including those without patient contact. The study documented plasmidic dissemination of blaOXA-48 in a companion animal clinic with low IPC standards. This poses a worrisome threat to public health and highlights the need to foster IPC standards in veterinary clinics to prevent the spread of ARM into the community

    Объектная модель системы избирательного распространения информации

    Get PDF
    The author examines the object model of the selective dissemination of information (SDI) that forms the basis for designing user information awareness system of the RAS Library for natural Sciences. The SDI system users are offered two types of services: the first is to provide awareness on the list of journals defined by the user, and the second is awareness building on thematic enquiries being formed with the keywords that the user supplies. The suggested SDI system differs from the traditional systems of the kind: designers do not limit the list of journals or the list of primary sources to be addressed within thematic search with the list of full-text resources available by the library’s subscription. This object model and the corresponding SDI system prototype have been operating since 2016. Its successful operation enables to suggest the described object model as a standard model for SDI system.Представлена объектная модель системы избирательного распространения информации (ИРИ), которая легла в основу разработки прототипа системы избирательного информирования пользователей в Библиотеке по естественным наукам РАН. Пользователям системы ИРИ предлагаются два вида обслуживания: оповещения по определяемому самим пользователем перечню научных журналов и оповещения по тематическим запросам, сформированным на основе сообщённых пользователем ключевых слов. Разработанная система избирательного информирования имеет важное отличие от традиционных для библиотечных учреждений систем подобного рода: мы не ограничиваем ни перечень журналов, сообщаемых пользователем, ни перечень первоисточников, обрабатываемых в рамках тематических запросов, теми ресурсами, к которым библиотека имеет полнотекстовый доступ по подписке. Подчёркнуто, что пользователи системы ИРИ получают персональные оповещения посредством рассылки по электронной почте, которые включают обычные поля библиографических описаний и ряд полей для расширения навигационных функций оповещения.Отмечено, что рассмотренная объектная модель и разработанный на её основе прототип системы ИРИ эксплуатировались в Библиотеке по естественным наукам РАН начиная с 2016 г. Успешная эксплуатация позволяет рекомендовать эту объектную модель как типовую для систем ИРИ

    Advanced space system concepts and their orbital support needs (1980 - 2000). Volume 1: Executive summary

    Get PDF
    The likely system concepts which might be representative of NASA and DoD space programs in the 1980-2000 time period were studied along with the programs' likely needs for major space transportation vehicles, orbital support vehicles, and technology developments which could be shared by the military and civilian space establishments in that time period. Such needs could then be used by NASA as an input in determining the nature of its long-range development plan. The approach used was to develop a list of possible space system concepts (initiatives) in parallel with a list of needs based on consideration of the likely environments and goals of the future. The two lists thus obtained represented what could be done, regardless of need; and what should be done, regardless of capability, respectively. A set of development program plans for space application concepts was then assembled, matching needs against capabilities, and the requirements of the space concepts for support vehicles, transportation, and technology were extracted. The process was pursued in parallel for likely military and civilian programs, and the common support needs thus identified

    Recursive n-gram hashing is pairwise independent, at best

    Get PDF
    Many applications use sequences of n consecutive symbols (n-grams). Hashing these n-grams can be a performance bottleneck. For more speed, recursive hash families compute hash values by updating previous values. We prove that recursive hash families cannot be more than pairwise independent. While hashing by irreducible polynomials is pairwise independent, our implementations either run in time O(n) or use an exponential amount of memory. As a more scalable alternative, we make hashing by cyclic polynomials pairwise independent by ignoring n-1 bits. Experimentally, we show that hashing by cyclic polynomials is is twice as fast as hashing by irreducible polynomials. We also show that randomized Karp-Rabin hash families are not pairwise independent.Comment: See software at https://github.com/lemire/rollinghashcp
    corecore