8,674 research outputs found

    Main Memory Adaptive Indexing for Multi-core Systems

    Full text link
    Adaptive indexing is a concept that considers index creation in databases as a by-product of query processing; as opposed to traditional full index creation where the indexing effort is performed up front before answering any queries. Adaptive indexing has received a considerable amount of attention, and several algorithms have been proposed over the past few years; including a recent experimental study comparing a large number of existing methods. Until now, however, most adaptive indexing algorithms have been designed single-threaded, yet with multi-core systems already well established, the idea of designing parallel algorithms for adaptive indexing is very natural. In this regard only one parallel algorithm for adaptive indexing has recently appeared in the literature: The parallel version of standard cracking. In this paper we describe three alternative parallel algorithms for adaptive indexing, including a second variant of a parallel standard cracking algorithm. Additionally, we describe a hybrid parallel sorting algorithm, and a NUMA-aware method based on sorting. We then thoroughly compare all these algorithms experimentally; along a variant of a recently published parallel version of radix sort. Parallel sorting algorithms serve as a realistic baseline for multi-threaded adaptive indexing techniques. In total we experimentally compare seven parallel algorithms. Additionally, we extensively profile all considered algorithms. The initial set of experiments considered in this paper indicates that our parallel algorithms significantly improve over previously known ones. Our results suggest that, although adaptive indexing algorithms are a good design choice in single-threaded environments, the rules change considerably in the parallel case. That is, in future highly-parallel environments, sorting algorithms could be serious alternatives to adaptive indexing.Comment: 26 pages, 7 figure

    An adaptive hierarchical domain decomposition method for parallel contact dynamics simulations of granular materials

    Full text link
    A fully parallel version of the contact dynamics (CD) method is presented in this paper. For large enough systems, 100% efficiency has been demonstrated for up to 256 processors using a hierarchical domain decomposition with dynamic load balancing. The iterative scheme to calculate the contact forces is left domain-wise sequential, with data exchange after each iteration step, which ensures its stability. The number of additional iterations required for convergence by the partially parallel updates at the domain boundaries becomes negligible with increasing number of particles, which allows for an effective parallelization. Compared to the sequential implementation, we found no influence of the parallelization on simulation results.Comment: 19 pages, 15 figures, published in Journal of Computational Physics (2011

    Complexity vs. performance in granular embedding spaces for graph classification

    Get PDF
    The most distinctive trait in structural pattern recognition in graph domain is the ability to deal with the organization and relations between the constituent entities of the pattern. Even if this can be convenient and/or necessary in many contexts, most of the state-of the art classi\ufb01cation techniques can not be deployed directly in the graph domain without \ufb01rst embedding graph patterns towards a metric space. Granular Computing is a powerful information processing paradigm that can be employed in order to drive the synthesis of automatic embedding spaces from structured domains. In this paper we investigate several classi\ufb01cation techniques starting from Granular Computing-based embedding procedures and provide a thorough overview in terms of model complexity, embedding space complexity and performances on several open-access datasets for graph classi\ufb01cation. We witness that certain classi\ufb01cation techniques perform poorly both from the point of view of complexity and learning performances as the case of non-linear SVM, suggesting that high dimensionality of the synthesized embedding space can negatively affect the effectiveness of these approaches. On the other hand, linear support vector machines, neuro-fuzzy networks and nearest neighbour classi\ufb01ers have comparable performances in terms of accuracy, with second being the most competitive in terms of structural complexity and the latter being the most competitive in terms of embedding space dimensionality

    A granular approach to web search result presentation

    Get PDF
    In this paper we propose and evaluate interfaces for presenting the results of web searches. Sentences, taken from the top retrieved documents, are used as fine-grained representations of document content and, when combined in a ranked list, to provide a query-specific overview of the set of retrieved documents. Current search engine interfaces assume users examine such results document-by-document. In contrast our approach groups, ranks and presents the contents of the top ranked document set. We evaluate our hypotheses that the use of such an approach can lead to more effective web searching and to increased user satisfaction. Our evaluation, with real users and different types of information seeking scenario, showed, with statistical significance, that these hypotheses hold

    Mistakes in medical ontologies: Where do they come from and how can they be detected?

    Get PDF
    We present the details of a methodology for quality assurance in large medical terminologies and describe three algorithms that can help terminology developers and users to identify potential mistakes. The methodology is based in part on linguistic criteria and in part on logical and ontological principles governing sound classifications. We conclude by outlining the results of applying the methodology in the form of a taxonomy different types of errors and potential errors detected in SNOMED-CT

    MorphoSys: efficient colocation of QoS-constrained workloads in the cloud

    Full text link
    In hosting environments such as IaaS clouds, desirable application performance is usually guaranteed through the use of Service Level Agreements (SLAs), which specify minimal fractions of resource capacities that must be allocated for unencumbered use for proper operation. Arbitrary colocation of applications with different SLAs on a single host may result in inefficient utilization of the host’s resources. In this paper, we propose that periodic resource allocation and consumption models -- often used to characterize real-time workloads -- be used for a more granular expression of SLAs. Our proposed SLA model has the salient feature that it exposes flexibilities that enable the infrastructure provider to safely transform SLAs from one form to another for the purpose of achieving more efficient colocation. Towards that goal, we present MORPHOSYS: a framework for a service that allows the manipulation of SLAs to enable efficient colocation of arbitrary workloads in a dynamic setting. We present results from extensive trace-driven simulations of colocated Video-on-Demand servers in a cloud setting. These results show that potentially-significant reduction in wasted resources (by as much as 60%) are possible using MORPHOSYS.National Science Foundation (0720604, 0735974, 0820138, 0952145, 1012798

    Security and Privacy Issues of Big Data

    Get PDF
    This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.Comment: In book Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence, IGI Global, 201

    On Information Granulation via Data Filtering for Granular Computing-Based Pattern Recognition: A Graph Embedding Case Study

    Get PDF
    Granular Computing is a powerful information processing paradigm, particularly useful for the synthesis of pattern recognition systems in structured domains (e.g., graphs or sequences). According to this paradigm, granules of information play the pivotal role of describing the underlying (possibly complex) process, starting from the available data. Under a pattern recognition viewpoint, granules of information can be exploited for the synthesis of semantically sound embedding spaces, where common supervised or unsupervised problems can be solved via standard machine learning algorithms. In this companion paper, we follow our previous paper (Martino et al. in Algorithms 15(5):148, 2022) in the context of comparing different strategies for the automatic synthesis of information granules in the context of graph classification. These strategies mainly differ on the specific topology adopted for subgraphs considered as candidate information granules and the possibility of using or neglecting the ground-truth class labels in the granulation process and, conversely, to our previous work, we employ a filtering-based approach for the synthesis of information granules instead of a clustering-based one. Computational results on 6 open-access data sets corroborate the robustness of our filtering-based approach with respect to data stratification, if compared to a clustering-based granulation stage

    Analyzing Methods and Opportunities in Software-Defined (SDN) Networks for Data Traffic Optimizations

    Get PDF
    Computer networks are dynamic and require constant updating and monitoring of operations to meet the growing volume of data trafficked. This generates a number of cost issues as well as performance management and tuning to deliver granular quality of service (QoS), balancing data load, and controlling the occurrence of bottlenecks. As an alternative, a new programmable network paradigm has been used under the name of Software Defined Networks (SDN). The SDN consists of decoupling the data plane and controlling the network, where a programmable controller is responsible for managing rules for routing the data to various devices. Thus, the hardware that remains in the network data stream simply addresses the routing of the packets quickly according to these rules. In this context, this article conducts a study on different methods and approaches that are being used in the literature to solve problems in the optimization of data traffic in the network through the use of SDN. In particular, this study differs from other reviews of SDN because it focuses on issues such as QoS, load balancing, and congestion control. Finally, in addition to the review of the SDN's state-of-the-art in the areas mentioned, a survey of future challenges and research opportunities in the area is also presented. load balancing and congestion control. Finally, in addition to the review of the SDN's state-of-the-art in the areas mentioned, a survey of future challenges and research opportunities in the area is also presented. load balancing and congestion control. Finally, in addition to the review of the SDN's state-of-the-art in the areas mentioned, a survey of future challenges and research opportunities in the area is also presented
    • …
    corecore