93 research outputs found

    Adaptive query execution for data management in the cloud

    Get PDF
    A major component of many cloud services is query processing on data stored in the underlying cloud cluster. The traditional techniques for query processing on a cluster are those offered by parallel DBMS. These techniques however, cannot guarantee high performance for cloud; parallel DBMS lack adequate fault tolerance mechanisms in order to deal with non-negligible software and hardware failures. MapReduce, on the other hand, allows query processing solutions that are fault tolerant, but imposes substantial overheads. In this paper, we propose an adaptive software architecture which can effortlessly switch between MapReduce and parallel DBMS in order to efficiently process queries regardless of their response times. Switching between the two architectures is performed in a transparent manner based on an intuitive cost model, which computes the expected execution time in presence of failures. The experimental results show that the adaptive architecture achieves the lowest possible query execution time for various scenarios

    Towards Predicting the Runtime of Iterative Analytics with PREDIcT

    Get PDF
    Machine learning algorithms are widely used today for analytical tasks such as data cleaning, data categorization, or data filtering. At the same time, the rise of social media motivates recent uptake in large scale graph processing. Both categories of algorithms are dominated by iterative subtasks, i.e., processing steps which are executed repetitively until a convergence condition is met. Optimizing cluster resource allocations among multiple workloads of iterative algorithms motivates the need for estimating their resource requirements and runtime, which in turn requires: i) predicting the number of iterations, and ii) predicting the processing time of each iteration. As both parameters depend on the characteristics of the dataset and on the convergence function, estimating their values before execution is difficult. This paper proposes PREDIcT, an experimental methodology for predicting the runtime of iterative algorithms. PREDIcT uses sample runs for capturing the algorithm's convergence trend and per-iteration key input features that are well correlated with the actual processing requirements of the complete input dataset. Using this combination of characteristics we predict the runtime of iterative algorithms, including algorithms with very different runtime patterns among subsequent iterations. Our experimental evaluation of multiple algorithms on scale-free graphs shows a relative prediction error of 10%-30% for predicting runtime, including algorithms with up to 100x runtime variability among consecutive iterations

    A Case for Specialized Processors for Scale-Out Workloads

    Get PDF
    Emerging scale-out workloads need extensive amounts of computational resources. However, datacenters using modern server hardware face physical constraints in space and power, limiting further expansion and requiring improvements in the computational density per server and in the per-operation energy. Continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency. In this work, we demonstrate that modern server processors are highly inefficient for running cloud workloads. To address this problem, we investigate the microarchitectural behavior of scale-out workloads and present opportunities to enable specialized processor designs that closely match the needs of the cloud

    Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware

    Get PDF
    Emerging scale-out cloud applications need extensive amounts of computational resources. However, data centers using modern server hardware face physical constraints in space and power, limiting further expansion and calling for improvements in the computational density per server and in the per-operation energy use. Therefore, continuing to improve the computational resources of the cloud while staying within physical constraints mandates optimizing server efficiency to ensure that server hardware closely matches the needs of scale-out cloud applications. We use performance counters on modern servers to study a wide range of cloud applications, finding that today’s predominant processor architecture is inefficient for running these workloads. We find that inefficiency comes from the mismatch between the application needs and modern processors, particularly in the organization of instruction and data memory systems and the processor core architecture. Moreover, while today’s predominant architectures are inefficient when executing scale-out cloud applications, we find that the current hardware trends further exacerbate the mismatch. In this work, we identify the key micro-architectural needs of cloud applications, calling for a change in the trajectory of server processors that would lead to improved computational density and power efficiency in data centers

    ImageCLEF 2022: Multimedia Retrieval in Medical, Nature, Fusion, and Internet Applications

    Get PDF
    ImageCLEF is part of the Conference and Labs of the Evaluation Forum (CLEF) since 2003. CLEF 2022 will take place in Bologna, Italy. ImageCLEF is an ongoing evaluation initiative which promotes the evaluation of technologies for annotation, indexing, and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In its 20th edition, ImageCLEF will have four main tasks: (i) a Medical task addressing concept annotation, caption prediction, and tuberculosis detection; (ii) a Coral task addressing the annotation and localisation of substrates in coral reef images; (iii) an Aware task addressing the prediction of real-life consequences of online photo sharing; and (iv) a new Fusion task addressing late fusion techniques based on the expertise of the pool of classifiers. In 2021, over 100 research groups registered at ImageCLEF with 42 groups submitting more than 250 runs. These numbers show that, despite the COVID-19 pandemic, there is strong interest in the evaluation campaign

    Overview of the ImageCLEF 2021: Multimedia Retrieval in Medical, Nature, Internet and Social Media Applications

    Get PDF
    This paper presents an overview of the ImageCLEF 2021 lab that was organized as part of the Conference and Labs of the Evaluation Forum – CLEF Labs 2021. ImageCLEF is an ongoing evaluation initiative (first run in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2021, the 19th edition of ImageCLEF runs four main tasks: (i) a medical task that groups three previous tasks, i.e., caption analysis, tuberculosis prediction, and medical visual question answering and question generation, (ii) a nature coral task about segmenting and labeling collections of coral reef images, (iii) an Internet task addressing the problems of identifying hand-drawn and digital user interface components, and (iv) a new social media aware task on estimating potential real-life effects of online image sharing. Despite the current pandemic situation, the benchmark campaign received a strong participation with over 38 groups submitting more than 250 runs

    The 2021 ImageCLEF Benchmark: Multimedia Retrieval in Medical, Nature, Internet and Social Media Applications

    Get PDF
    This paper presents the ideas for the 2021 ImageCLEF lab that will be organized as part of the Conference and Labs of the Evaluation Forum — CLEF Labs 2021 in Bucharest, Romania. ImageCLEF is an ongoing evaluation initiative (active since 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2021, the 19th edition of ImageCLEF will organize four main tasks: (i) a Medical task addressing visual question answering, a concept annotation and a tuberculosis classification task, (ii) a Coral task addressing the annotation and localisation of substrates in coral reef images, (iii) a DrawnUI task addressing the creation of websites from either a drawing or a screenshot by detecting the different elements present on the design and a new (iv) Aware task addressing the prediction of real-life consequences of online photo sharing. The strong participation in 2020, despite the COVID pandemic, with over 115 research groups registering and 40 submitting over 295 runs for the tasks shows an important interest in this benchmarking campaign. We expect the new tasks to attract at least as many researchers for 2021
    corecore