24,302 research outputs found

    Towards information profiling: data lake content metadata management

    Get PDF
    There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

    Tuning the Level of Concurrency in Software Transactional Memory: An Overview of Recent Analytical, Machine Learning and Mixed Approaches

    Get PDF
    Synchronization transparency offered by Software Transactional Memory (STM) must not come at the expense of run-time efficiency, thus demanding from the STM-designer the inclusion of mechanisms properly oriented to performance and other quality indexes. Particularly, one core issue to cope with in STM is related to exploiting parallelism while also avoiding thrashing phenomena due to excessive transaction rollbacks, caused by excessively high levels of contention on logical resources, namely concurrently accessed data portions. A means to address run-time efficiency consists in dynamically determining the best-suited level of concurrency (number of threads) to be employed for running the application (or specific application phases) on top of the STM layer. For too low levels of concurrency, parallelism can be hampered. Conversely, over-dimensioning the concurrency level may give rise to the aforementioned thrashing phenomena caused by excessive data contention—an aspect which has reflections also on the side of reduced energy-efficiency. In this chapter we overview a set of recent techniques aimed at building “application-specific” performance models that can be exploited to dynamically tune the level of concurrency to the best-suited value. Although they share some base concepts while modeling the system performance vs the degree of concurrency, these techniques rely on disparate methods, such as machine learning or analytic methods (or combinations of the two), and achieve different tradeoffs in terms of the relation between the precision of the performance model and the latency for model instantiation. Implications of the different tradeoffs in real-life scenarios are also discussed

    TMbarrier: speculative barriers using hardware transactional memory

    Get PDF
    Barrier is a very common synchronization method used in parallel programming. Barriers are used typically to enforce a partial thread execution order, since there may be dependences between code sections before and after the barrier. This work proposes TMbarrier, a new design of a barrier intended to be used in transactional applications. TMbarrier allows threads to continue executing speculatively after the barrier assuming that there are not dependences with safe threads that have not yet reached the barrier. Our design leverages transactional memory (TM) (specifically, the implementation offered by the IBM POWER8 processor) to hold the speculative updates and to detect possible conflicts between speculative and safe threads. Despite the limitations of the best-effort hardware TM implementation present in current processors, experiments show a reduction in wasted time due to synchronization compared to standard barriers.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Self-* overload control for distributed web systems

    Full text link
    Unexpected increases in demand and most of all flash crowds are considered the bane of every web application as they may cause intolerable delays or even service unavailability. Proper quality of service policies must guarantee rapid reactivity and responsiveness even in such critical situations. Previous solutions fail to meet common performance requirements when the system has to face sudden and unpredictable surges of traffic. Indeed they often rely on a proper setting of key parameters which requires laborious manual tuning, preventing a fast adaptation of the control policies. We contribute an original Self-* Overload Control (SOC) policy. This allows the system to self-configure a dynamic constraint on the rate of admitted sessions in order to respect service level agreements and maximize the resource utilization at the same time. Our policy does not require any prior information on the incoming traffic or manual configuration of key parameters. We ran extensive simulations under a wide range of operating conditions, showing that SOC rapidly adapts to time varying traffic and self-optimizes the resource utilization. It admits as many new sessions as possible in observance of the agreements, even under intense workload variations. We compared our algorithm to previously proposed approaches highlighting a more stable behavior and a better performance.Comment: The full version of this paper, titled "Self-* through self-learning: overload control for distributed web systems", has been published on Computer Networks, Elsevier. The simulator used for the evaluation of the proposed algorithm is available for download at the address: http://www.dsi.uniroma1.it/~novella/qos_web
    • …
    corecore