5,736 research outputs found

    Racing to hardware-validated simulation

    Get PDF
    Processor simulators rely on detailed timing models of the processor pipeline to evaluate performance. The diversity in real-world processor designs mandates building flexible simulators that expose parts of the underlying model to the user in the form of configurable parameters. Consequently, the accuracy of modeling a real processor relies on both the accuracy of the pipeline model itself, and the accuracy of adjusting the configuration parameters according to the modeled processor. Unfortunately, processor vendors publicly disclose only a subset of their design decisions, raising the probability of introducing specification inaccuracies when modeling these processors. Inaccurately tuning model parameters deviates the simulated processor from the actual one. In the worst case, using improper parameters may lead to imbalanced pipeline models compromising the simulation output. Therefore, simulation models should be hardware-validated before using them for performance evaluation. As processors increase in complexity and diversity, validating a simulator model against real hardware becomes increasingly more challenging and time-consuming. In this work, we propose a methodology for validating simulation models against real hardware. We create a framework that relies on micro-benchmarks to collect performance statistics on real hardware, and machine learning-based algorithms to fine-tune the unknown parameters based on the accumulated statistics. We overhaul the Sniper simulator to support the ARM AArch64 instruction-set architecture (ISA), and introduce two new timing models for ARM-based in-order and out-of-order cores. Using our proposed simulator validation framework, we tune the in-order and out-of-order models to match the performance of a real-world implementation of the Cortex-A53 and Cortex-A72 cores with an average error of 7% and 15%, respectively, across a set of SPEC CPU2017 benchmarks

    Predicting Rainfall in the Context of Rainfall Derivatives Using Genetic Programming

    Get PDF
    Rainfall is one of the most challenging variables to predict, as it exhibits very unique characteristics that do not exist in other time series data. Moreover, rainfall is a major component and is essential for applications that surround water resource planning. In particular, this paper is interested in the prediction of rainfall for rainfall derivatives. Currently in the rainfall derivatives literature, the process of predicting rainfall is dominated by statistical models, namely using a Markov-chain extended with rainfall prediction (MCRP). In this paper we outline a new methodology to be carried out by predicting rainfall with Genetic Programming (GP). This is the first time in the literature that GP is used within the context of rainfall derivatives. We have created a new tailored GP to this problem domain and we compare the performance of the GP and MCRP on 21 different data sets of cities across Europe and report the results. The goal is to see whether GP can outperform MCRP, which acts as a benchmark. Results indicate that in general GP significantly outperforms MCRP, which is the dominant approach in the literature

    Measuring Accuracy of Triples in Knowledge Graphs

    Get PDF
    An increasing amount of large-scale knowledge graphs have been constructed in recent years. Those graphs are often created from text-based extraction, which could be very noisy. So far, cleaning knowledge graphs are often carried out by human experts and thus very inefficient. It is necessary to explore automatic methods for identifying and eliminating erroneous information. In order to achieve this, previous approaches primarily rely on internal information i.e. the knowledge graph itself. In this paper, we introduce an automatic approach, Triples Accuracy Assessment (TAA), for validating RDF triples (source triples) in a knowledge graph by finding consensus of matched triples (among target triples) from other knowledge graphs. TAA uses knowledge graph interlinks to find identical resources and apply different matching methods between the predicates of source triples and target triples. Then based on the matched triples, TAA calculates a confidence score to indicate the correctness of a source triple. In addition, we present an evaluation of our approach using the FactBench dataset for fact validation. Our findings show promising results for distinguishing between correct and wrong triples

    Predicting Good Configurations for GitHub and Stack Overflow Topic Models

    Full text link
    Software repositories contain large amounts of textual data, ranging from source code comments and issue descriptions to questions, answers, and comments on Stack Overflow. To make sense of this textual data, topic modelling is frequently used as a text-mining tool for the discovery of hidden semantic structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used topic model that aims to explain the structure of a corpus by grouping texts. LDA requires multiple parameters to work well, and there are only rough and sometimes conflicting guidelines available on how these parameters should be set. In this paper, we contribute (i) a broad study of parameters to arrive at good local optima for GitHub and Stack Overflow text corpora, (ii) an a-posteriori characterisation of text corpora related to eight programming languages, and (iii) an analysis of corpus feature importance via per-corpus LDA configuration. We find that (1) popular rules of thumb for topic modelling parameter configuration are not applicable to the corpora used in our experiments, (2) corpora sampled from GitHub and Stack Overflow have different characteristics and require different configurations to achieve good model fit, and (3) we can predict good configurations for unseen corpora reliably. These findings support researchers and practitioners in efficiently determining suitable configurations for topic modelling when analysing textual data contained in software repositories.Comment: to appear as full paper at MSR 2019, the 16th International Conference on Mining Software Repositorie

    Large neighborhood search for the most strings with few bad columns problem

    Get PDF
    In this work, we consider the following NP-hard combinatorial optimization problem from computational biology. Given a set of input strings of equal length, the goal is to identify a maximum cardinality subset of strings that differ maximally in a pre-defined number of positions. First of all, we introduce an integer linear programming model for this problem. Second, two variants of a rather simple greedy strategy are proposed. Finally, a large neighborhood search algorithm is presented. A comprehensive experimental comparison among the proposed techniques shows, first, that larger neighborhood search generally outperforms both greedy strategies. Second, while large neighborhood search shows to be competitive with the stand-alone application of CPLEX for small- and medium-sized problem instances, it outperforms CPLEX in the context of larger instances.Peer ReviewedPostprint (author's final draft

    Single-center experience in the treatment of visceral artery aneurysms

    Get PDF
    Background: Visceral artery aneurysms (VAAs), although rare, represent a life-threatening disease with high mortality rates. With the more frequent use of diagnostic tests, there has been an incidental detection of these lesions which are mostly asymptomatic. It follows that surgeons are increasingly called to decide on the most appropriate management of VAAs between an open surgical or endovascular approach and among the different endovascular options currently available. The aim of this retrospective study was to evaluate the results of open surgery and interventional endovascular strategies of visceral artery aneurysms with respect to technical success, therapy-associated complications, and postinterventional follow-up in the elective and emergency situation. Methods: From January 1992 to January 2017, 125 open surgical or endovascular interventions for VAA were performed at our institution. Once the VAA was diagnosed and the indication for treatment was assessed, the preoperative diagnostic work-up consisted of contrast computed tomography (CT) or magnetic resonance imaging (MRI) and, in some patients, digital subtraction angiography. Follow-up included clinical and duplex ultrasound scan (DUS) and contrast-enhanced ultrasound to assess the treated vessel patency and organ perfusion after 1, 6, and 12 months, and yearly thereafter. CT or MRI controls were also performed at 1 year of follow-up and only when DUS was not diagnostic or showed a complication thereafter. After the first 5 years of follow-up, the status of the patient was obtained by a structured telephone survey. Results: The treatment option was endovascular in 56 of 125 cases (44.8%). Technical success was 98.3%. In one case, the procedure was interrupted for the extensive dissection of the afferent vessel. Twenty-six patients were treated by coil embolization while 29 with covered stenting. The endovascular approach was in emergency in two cases (3.6%). In the endovascular group, mortality was nil. Complications occurred in 5 cases (8.9%): 1 subacute intestinal ischemia caused by superior mesenteric artery dissection, 2 aneurysm reperfusion, 1 stent thrombosis, and 1 massive splenic hematoma. In 69 (55.2%) cases, surgical treatment was preferred, with 24 VAA resections and 45 arterial reconstructions. In 20 cases (29%), open surgery was performed in emergency conditions. In the surgical group, 8 emergency patients (40%) died intraoperatively. The mortality after elective surgical interventions was nil. Complications after surgery were 4 graft late thrombosis (5.8%): asymptomatic in three cases and requiring splenectomy in one. Conclusions: There is no overall consensus regarding the indications for treatment of VAA. Currently in emergent setting, the endovascular approach should be considered as the first choice because of its reduced invasiveness, faster way to access and bleeding control; this accounts for the lower morality of the interventional therapy than open surgery. Endovascular approach is effective for elective repair of VAAs, but procedure-related complications may occur in a not negligible number of patients. Given comparable mortality rates and low procedure-related complication rate, surgical approach still has space in the elective management of VAAs, especially for aneurysms unsuitable or challenging for the endovascular option in patients with low surgical risk. The size, location, and morphology of VAAs, systemic or local comorbidities, and specific anatomical situations such as previous abdominal surgery should dictate treatment choice

    Efficient Benchmarking of Algorithm Configuration Procedures via Model-Based Surrogates

    Get PDF
    The optimization of algorithm (hyper-)parameters is crucial for achieving peak performance across a wide range of domains, ranging from deep neural networks to solvers for hard combinatorial problems. The resulting algorithm configuration (AC) problem has attracted much attention from the machine learning community. However, the proper evaluation of new AC procedures is hindered by two key hurdles. First, AC benchmarks are hard to set up. Second and even more significantly, they are computationally expensive: a single run of an AC procedure involves many costly runs of the target algorithm whose performance is to be optimized in a given AC benchmark scenario. One common workaround is to optimize cheap-to-evaluate artificial benchmark functions (e.g., Branin) instead of actual algorithms; however, these have different properties than realistic AC problems. Here, we propose an alternative benchmarking approach that is similarly cheap to evaluate but much closer to the original AC problem: replacing expensive benchmarks by surrogate benchmarks constructed from AC benchmarks. These surrogate benchmarks approximate the response surface corresponding to true target algorithm performance using a regression model, and the original and surrogate benchmark share the same (hyper-)parameter space. In our experiments, we construct and evaluate surrogate benchmarks for hyperparameter optimization as well as for AC problems that involve performance optimization of solvers for hard combinatorial problems, drawing training data from the runs of existing AC procedures. We show that our surrogate benchmarks capture overall important characteristics of the AC scenarios, such as high- and low-performing regions, from which they were derived, while being much easier to use and orders of magnitude cheaper to evaluate

    Construct, Merge, Solve and Adapt: Application to the repetition-free longest common subsequence problem

    Get PDF
    In this paper we present the application of a recently proposed, general, algorithm for combinatorial optimization to the repetition-free longest common subsequence problem. The applied algorithm, which is labelled Construct, Merge, Solve & Adapt, generates sub-instances based on merging the solution components found in randomly constructed solutions. These sub-instances are subsequently solved by means of an exact solver. Moreover, the considered sub-instances are dynamically changing due to adding new solution components at each iteration, and removing existing solution components on the basis of indicators about their usefulness. The results of applying this algorithm to the repetition-free longest common subsequence problem show that the algorithm generally outperforms competing approaches from the literature. Moreover, they show that the algorithm is competitive with CPLEX for small and medium size problem instances, whereas it outperforms CPLEX for larger problem instances.Peer ReviewedPostprint (author's final draft

    Studying Solutions of the p-Median Problem for the Location of Public Bike Stations

    Get PDF
    The use of bicycles as a means of transport is becoming more and more popular today, especially in urban areas, to avoid the disadvantages of individual car traffic. In fact, city managers react to this trend and actively promote the use of bicycles by providing a network of bicycles for public use and stations where they can be stored. Establishing such a network involves the task of finding best locations for stations, which is, however, not a trivial task. In this work, we examine models to determine the best location of bike stations so that citizens will travel the shortest distance possible to one of them. Based on real data from the city of Malaga, we formulate our problem as a p-median problem and solve it with a variable neighborhood search algorithm that was automatically configured with irace. We compare the locations proposed by the algorithm with the real ones used currently by the city council. We also study where new locations should be placed if the network grows.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. This research was partially funded by the University of Málaga, Andalucı́a Tech, the Spanish MINECO and FEDER projects: TIN2014- 57341-R, TIN2016-81766-REDT, and TIN2017-88213-R. C. Cintrano is supported by a FPI grant (BES-2015-074805) from Spanish MINECO
    corecore