193 research outputs found

    Fine-grained Benchmark Subsetting for System Selection

    Get PDF
    ABSTRACT System selection aims at finding the best architecture for a set of programs and workloads. It traditionally requires long running benchmarks. We propose a method to reduce the cost of system selection. We break down benchmarks into elementary fragments of source code, called codelets. Then, we identify two causes of redundancy: first, similar codelets; second, codelets called repeatedly. The key idea is to minimize redundancy inside the benchmark suite to speed it up. For each group of similar codelets, only one representative is kept. For codelets called repeatedly and for which the performance does not vary across calls, the number of invocations is reduced. Given an initial benchmark suite, our method produces a set of reduced benchmarks that can be used in place of the original one for system selection. We evaluate our method on the NAS SER benchmarks, producing a reduced benchmark suite 30 times faster in average than the original suite, with a maximum of 44 times. The reduced suite predicts the execution time on three target architectures with a median error between 3.9% and 8%

    Making a case for an ARM Cortex-A9 CPU interlay replacing the NEON SIMD unit

    Get PDF

    Subset reasoning for event-based systems

    Get PDF
    In highly dynamic domains such as the Internet of Things (IoT), smart industries, smart manufacturing, pervasive health or social media, data is being continuously generated. By combining this generated data with background knowledge and performing expressive reasoning upon this combination, meaningful decisions can be made. Furthermore, this continuously generated data typically originates from multiple heterogeneous sources. Ontologies are ideal for modeling the domain and facilitates the integration of heterogeneous produced data with background knowledge. Furthermore, expressive ontology reasoning allows to infer implicit facts and enables intelligent decision making. The data produced in these domains is often volatile. Time-critical systems, such as IoT Nurse Call systems, require timely processing of the produced IoT data. However, there is still a mismatch between volatile data and expressive ontology reasoning, since the incoming data frequency is often higher than the reasoning time. For this reason, we present an approximation technique that allows to extract a subset of data to speed-up the reasoning process. We demonstrate this technique in a Nurse Call proof of concept where the locations of the nurses are tracked and the most suited nurse is selected when the patient launches a call and in an extension of an existing benchmark. We managed to speed up the reasoning process up to 10 times for small datasets and up to more than 1000 times for large datasets

    A Multi-level Approach for Identifying Process Change in Cancer Pathways

    Get PDF
    An understudied challenge within process mining is the area of process change over time. This is a particular concern in healthcare, where patterns of care emerge and evolve in response to individual patient needs and through complex interactions between people, process, technology and changing organisational structure. We propose a structured approach to analyse process change over time suitable for the complex domain of healthcare. Our approach applies a qualitative process comparison at three levels of abstraction: a holistic perspective summariz-ing patient pathways (process model level), a middle level perspective based on activity sequences for individuals (trace level), and a fine-grained detail focus on activities (activity level). Our aim is to identify points in time where a process changed (detection), to localise and characterise the change (localisation and characterisation), and to understand process evolution (unravelling). We illus-trate the approach using a case study of cancer pathways in Leeds Cancer Centre where we found evidence of agreement in process change identified at the pro-cess model and activity levels, but not at the trace level. In the experiment we show that this qualitative approach provides a useful understanding of process change over time. Examining change at the three levels provides confirmatory ev-idence of process change where perspectives agree, while contradictory evidence can lead to focused discussions with domain experts. The approach should be of interest to others dealing with processes that undergo complex change over time

    Incremental elasticity for array databases

    Get PDF
    Relational databases benefit significantly from elasticity, whereby they execute on a set of changing hardware resources provisioned to match their storage and processing requirements. Such flexibility is especially attractive for scientific databases because their users often have a no-overwrite storage model, in which they delete data only when their available space is exhausted. This results in a database that is regularly growing and expanding its hardware proportionally. Also, scientific databases frequently store their data as multidimensional arrays optimized for spatial querying. This brings about several novel challenges in clustered, skew-aware data placement on an elastic shared-nothing database. In this work, we design and implement elasticity for an array database. We address this challenge on two fronts: determining when to expand a database cluster and how to partition the data within it. In both steps we propose incremental approaches, affecting a minimum set of data and nodes, while maintaining high performance. We introduce an algorithm for gradually augmenting an array database's hardware using a closed-loop control system. After the cluster adds nodes, we optimize data placement for n-dimensional arrays. Many of our elastic partitioners incrementally reorganize an array, redistributing data only to new nodes. By combining these two tools, the scientific database efficiently and seamlessly manages its monotonically increasing hardware resources.Intel Corporation (Science and Technology Center for Big Data

    Application Specific Customization and Scalability of Soft Multiprocessors

    Full text link

    A Survey on Self-Supervised Learning for Non-Sequential Tabular Data

    Full text link
    Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in various domains, where SSL defines pretext tasks based on unlabeled datasets to learn contextualized and robust representations. Recently, SSL has been a new trend in exploring the representation learning capability in the realm of tabular data, which is more challenging due to not having explicit relations for learning descriptive representations. This survey aims to systematically review and summarize the recent progress and challenges of SSL for non-sequential tabular data (SSL4NS-TD). We first present a formal definition of NS-TD and clarify its correlation to related studies. Then, these approaches are categorized into three groups -- predictive learning, contrastive learning, and hybrid learning, with their motivations and strengths of representative methods within each direction. On top of this, application issues of SSL4NS-TD are presented, including automatic data engineering, cross-table transferability, and domain knowledge integration. In addition, we elaborate on existing benchmarks and datasets for NS-TD applications to discuss the performance of existing tabular models. Finally, we discuss the challenges of SSL4NS-TD and provide potential directions for future research. We expect our work to be useful in terms of encouraging more research on lowering the barrier to entry SSL for the tabular domain and improving the foundations for implicit tabular data.Comment: The paper list can be found at https://github.com/wwweiwei/awesome-self-supervised-learning-for-tabular-dat
    corecore