298 research outputs found

    OFSET_mine:an integrated framework for cardiovascular diseases risk prediction based on retinal vascular function

    Get PDF
    As cardiovascular disease (CVD) represents a spectrum of disorders that often manifestfor the first time through an acute life-threatening event, early identification of seemingly healthy subjects with various degrees of risk is a priority.More recently, traditional scores used for early identification of CVD risk are slowly being replaced by more sensitive biomarkers that assess individual, rather than population risks for CVD. Among these, retinal vascular function, as assessed by the retinal vessel analysis method (RVA), has been proven as an accurate reflection of subclinical CVD in groups of participants without overt disease but with certain inherited or acquired risk factors. Furthermore, in order to correctly detect individual risk at an early stage, specialized machine learning methods and featureselection techniques that can cope with the characteristics of the data need to bedevised.The main contribution of this thesis is an integrated framework, OFSET_mine, that combinesnovel machine learning methods to produce a bespoke solution for Cardiovascular Risk Prediction based on RVA data that is also applicable to other medical datasets with similar characteristics. The three identified essential characteristics are 1) imbalanced dataset,2) high dimensionality and 3) overlapping feature ranges with the possibility of acquiring new samples. The thesis proposes FiltADASYN as an oversampling method that deals with imbalance, DD_Rank as a feature selection method that handles high dimensionality, and GCO_mine as a method for individual-based classification, all three integrated within the OFSET_mine framework.The new oversampling method FiltADASYN extends Adaptive Synthetic Oversampling(ADASYN) with an additional step to filter the generated samples and improve the reliability of the resultant sample set. The feature selection method DD_Rank is based on Restricted Boltzmann Machine (RBM) and ranks features according to their stability and discrimination power. GCO_mine is a lazy learning method based on Graph Cut Optimization (GCO), which considers both the local arrangements and the global structure of the data.OFSET_mine compares favourably to well established composite techniques. Itex hibits high classification performance when applied to a wide range of benchmark medical datasets with variable sample size, dimensionality and imbalance ratios.When applying OFSET _mine on our RVA data, an accuracy of 99.52% is achieved. In addition, using OFSET, the hybrid solution of FiltADASYN and DD_Rank, with Random Forest on our RVA data produces risk group classifications with accuracy 99.68%. This not only reflects the success of the framework but also establishes RVAas a valuable cardiovascular risk predicto

    Enabling Ubiquitous OLAP Analyses

    Get PDF
    An OLAP analysis session is carried out as a sequence of OLAP operations applied to multidimensional cubes. At each step of a session, an operation is applied to the result of the previous step in an incremental fashion. Due to its simplicity and flexibility, OLAP is the most adopted paradigm used to explore the data stored in data warehouses. With the goal of expanding the fruition of OLAP analyses, in this thesis we touch several critical topics. We first present our contributions to deal with data extractions from service-oriented sources, which are nowadays used to provide access to many databases and analytic platforms. By addressing data extraction from these sources we make a step towards the integration of external databases into the data warehouse, thus providing richer data that can be analyzed through OLAP sessions. The second topic that we study is that of visualization of multidimensional data, which we exploit to enable OLAP on devices with limited screen and bandwidth capabilities (i.e., mobile devices). Finally, we propose solutions to obtain multidimensional schemata from unconventional sources (e.g., sensor networks), which are crucial to perform multidimensional analyses

    Lazy Contracts: Alleviating High Gas Costs by Secure and Trustless Off-chain Execution of Smart Contracts

    Full text link
    Smart contracts are programs that are executed on the blockchain and can hold, manage and transfer assets in the form of cryptocurrencies. The contract's execution is then performed on-chain and is subject to consensus, i.e. every node on the blockchain network has to run the function calls and keep track of their side-effects. In most programmable blockchains, such as Ethereum, the notion of gas is introduced to prevent DoS attacks by malicious parties who might try to slow down the network by performing heavy computations. A fixed cost to each atomic operation, and the initiator of a function call pays the total gas cost as a transaction fee. This helps prevent DoS attacks, but the resulting fees are extremely high. For example, in 2022, on Ethereum alone, there has been a total gas usage of 1.77 Million ETH ~ 4.3 Billion USD. This thesis proposes "lazy contracts" as a solution to alleviate these costs. Our solution moves most of the computation off-chain, ensuring that each function call incurs only a tiny amount of gas usage, while preserving enough data on-chain to guarantee an implicit consensus about the state of the contract variables and ownership of funds. A complete on-chain execution of the functions will only be triggered in case two parties to the contract are in disagreement about the current state, which in turn can only happen if at least one party is dishonest. In such cases, our protocol can identify the dishonest party and penalize them by having them pay for the entire gas usage. Hence, no rational party has an incentive to act dishonestly. Finally, we perform extensive experiments over 160,735 real-world Solidity contracts that were involved in 9,055,492 transactions in January 2022--January 2023 on Ethereum and show that our approach reduces the overall gas usage by 55.4%, which amounts to an astounding saving of 109.9 Million USD in gas fees.Comment: 60 pages, 10 figure

    Hybrid eager and lazy evaluation for efficient compilation of Haskell

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 208-220).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.The advantage of a non-strict, purely functional language such as Haskell lies in its clean equational semantics. However, lazy implementations of Haskell fall short: they cannot express tail recursion gracefully without annotation. We describe resource-bounded hybrid evaluation, a mixture of strict and lazy evaluation, and its realization in Eager Haskell. From the programmer's perspective, Eager Haskell is simply another implementation of Haskell with the same clean equational semantics. Iteration can be expressed using tail recursion, without the need to resort to program annotations. Under hybrid evaluation, computations are ordinarily executed in program order just as in a strict functional language. When particular stack, heap, or time bounds are exceeded, suspensions are generated for all outstanding computations. These suspensions are re-started in a demand-driven fashion from the root. The Eager Haskell compiler translates Ac, the compiler's intermediate representation, to efficient C code. We use an equational semantics for Ac to develop simple correctness proofs for program transformations, and connect actions in the run-time system to steps in the hybrid evaluation strategy.(cont.) The focus of compilation is efficiency in the common case of straight-line execution; the handling of non-strictness and suspension are left to the run-time system. Several additional contributions have resulted from the implementation of hybrid evaluation. Eager Haskell is the first eager compiler to use a call stack. Our generational garbage collector uses this stack as an additional predictor of object lifetime. Objects above a stack watermark are assumed to be likely to die; we avoid promoting them. Those below are likely to remain untouched and therefore are good candidates for promotion. To avoid eagerly evaluating error checks, they are compiled into special bottom thunks, which are treated specially by the run-time system. The compiler identifies error handling code using a mixture of strictness and type information. This information is also used to avoid inlining error handlers, and to enable aggressive program transformation in the presence of error handling.by Jan-Willem Maessen.Ph.D

    Efficient Precise Dynamic Data Race Detection For Cpu And Gpu

    Get PDF
    Data races are notorious bugs. They introduce non-determinism in programs behavior, complicate programs semantics, making it challenging to debug parallel programs. To make parallel programming easier, efficient data race detection has been a research topic in the last decades. However, existing data race detectors either sacrifice precision or incur high overhead, limiting their application to real-world applications and scenarios. This dissertation proposes approaches to improve the performance of dynamic data race detection without undermining precision, by identifying and removing metadata redundancy dynamically. This dissertation also explores ways to make it practical to detect data races dynamically for GPU programs, which has a disparate programming and execution model from CPU workloads. Further, this dissertation shows how the structured synchronization model in GPU programs can simplify the algorithm design of data race detection for GPU, and how the unique patterns in GPU workloads enable an efficient implementation of the algorithm, yielding a high-performance dynamic data race detector for GPU programs

    The 2011 International Planning Competition

    Get PDF
    After a 3 years gap, the 2011 edition of the IPC involved a total of 55 planners, some of them versions of the same planner, distributed among four tracks: the sequential satisficing track (27 planners submitted out of 38 registered), the sequential multicore track (8 planners submitted out of 12 registered), the sequential optimal track (12 planners submitted out of 24 registered) and the temporal satisficing track (8 planners submitted out of 14 registered). Three more tracks were open to participation: temporal optimal, preferences satisficing and preferences optimal. Unfortunately the number of submitted planners did not allow these tracks to be finally included in the competition. A total of 55 people were participating, grouped in 31 teams. Participants came from Australia, Canada, China, France, Germany, India, Israel, Italy, Spain, UK and USA. For the sequential tracks 14 domains, with 20 problems each, were selected, while the temporal one had 12 domains, also with 20 problems each. Both new and past domains were included. As in previous competitions, domains and problems were unknown for participants and all the experimentation was carried out by the organizers. To run the competition a cluster of eleven 64-bits computers (Intel XEON 2.93 Ghz Quad core processor) using Linux was set up. Up to 1800 seconds, 6 GB of RAM memory and 750 GB of hard disk were available for each planner to solve a problem. This resulted in 7540 computing hours (about 315 days), plus a high number of hours devoted to preliminary experimentation with new domains, reruns and bugs fixing. The detailed results of the competition, the software used for automating most tasks, the source code of all the participating planners and the description of domains and problems can be found at the competition’s web page: http://www.plg.inf.uc3m.es/ipc2011-deterministicThis booklet summarizes the participants on the Deterministic Track of the International Planning Competition (IPC) 2011. Papers describing all the participating planners are included

    Cost-Based Optimization of Integration Flows

    Get PDF
    Integration flows are increasingly used to specify and execute data-intensive integration tasks between heterogeneous systems and applications. There are many different application areas such as real-time ETL and data synchronization between operational systems. For the reasons of an increasing amount of data, highly distributed IT infrastructures, and high requirements for data consistency and up-to-dateness of query results, many instances of integration flows are executed over time. Due to this high load and blocking synchronous source systems, the performance of the central integration platform is crucial for an IT infrastructure. To tackle these high performance requirements, we introduce the concept of cost-based optimization of imperative integration flows that relies on incremental statistics maintenance and inter-instance plan re-optimization. As a foundation, we introduce the concept of periodical re-optimization including novel cost-based optimization techniques that are tailor-made for integration flows. Furthermore, we refine the periodical re-optimization to on-demand re-optimization in order to overcome the problems of many unnecessary re-optimization steps and adaptation delays, where we miss optimization opportunities. This approach ensures low optimization overhead and fast workload adaptation

    Sequence-to-sequence learning for machine translation and automatic differentiation for machine learning software tools

    Full text link
    Cette thèse regroupe des articles d'apprentissage automatique et s'articule autour de deux thématiques complémentaires. D'une part, les trois premiers articles examinent l'application des réseaux de neurones artificiels aux problèmes du traitement automatique du langage naturel (TALN). Le premier article introduit une structure codificatrice-décodificatrice avec des réseaux de neurones récurrents pour traduire des segments de phrases de longueur variable. Le deuxième article analyse la performance de ces modèles de `traduction neuronale automatique' de manière qualitative et quantitative, tout en soulignant les difficultés posées par les phrases longues et les mots rares. Le troisième article s'adresse au traitement des mots rares et hors du vocabulaire commun en combinant des algorithmes de compression par dictionnaire et des réseaux de neurones récurrents. D'autre part, la deuxième partie de cette thèse fait abstraction de modèles particuliers de réseaux de neurones afin d'aborder l'infrastructure logicielle nécessaire à leur définition et entraînement. Les infrastructures modernes d'apprentissage profond doivent avoir la capacité d'exécuter efficacement des programmes d'algèbre linéaire et par tableaux, tout en étant capable de différentiation automatique (DA) pour calculer des dérivées multiples. Le premier article aborde les défis généraux posés par la conciliation de ces deux objectifs et propose la solution d'une représentation intermédiaire fondée sur les graphes. Le deuxième article attaque le même problème d'une manière différente: en implémentant un code source par bande dans un langage de programmation dynamique par tableau (Python et NumPy).This thesis consists of a series of articles that contribute to the field of machine learning. In particular, it covers two distinct and loosely related fields. The first three articles consider the use of neural network models for problems in natural language processing (NLP). The first article introduces the use of an encoder-decoder structure involving recurrent neural networks (RNNs) to translate from and to variable length phrases and sentences. The second article contains a quantitative and qualitative analysis of the performance of these `neural machine translation' models, laying bare the difficulties posed by long sentences and rare words. The third article deals with handling rare and out-of-vocabulary words in neural network models by using dictionary coder compression algorithms and multi-scale RNN models. The second half of this thesis does not deal with specific neural network models, but with the software tools and frameworks that can be used to define and train them. Modern deep learning frameworks need to be able to efficiently execute programs involving linear algebra and array programming, while also being able to employ automatic differentiation (AD) in order to calculate a variety of derivatives. The first article provides an overview of the difficulties posed in reconciling these two objectives, and introduces a graph-based intermediate representation that aims to tackle these difficulties. The second article considers a different approach to the same problem, implementing a tape-based source-code transformation approach to AD on a dynamically typed array programming language (Python and NumPy)

    Teaching a Robot to Drive - A Skill Learning Inspired Approach

    Get PDF
    Roboter können unser Leben erleichtern, indem sie für uns unangenehme, oder sogar gefährliche Aufgaben übernehmen. Um sie effizient einsetzen zu können, sollten sie autonom, adaptiv und einfach zu instruieren sein. Traditionelle 'white-box'-Ansätze in der Robotik basieren auf dem Verständnis des Ingenieurs der unterliegenden physikalischen Struktur des gegebenen Problems. Ausgehend von diesem Verständnis kann der Ingenieur eine mögliche Lösung finden und es in dem System implementieren. Dieser Ansatz ist sehr mächtig, aber gleichwohl limitiert. Der wichtigste Nachteil ist, dass derart erstellte Systeme von vordefiniertem Wissen abhängen und deswegen jedes neue Verhalten den gleichen, teuren Entwicklungszyklus benötigt. Im Gegensatz dazu sind Menschen und einige andere Tiere nicht auf ihre angeborene Verhalten beschränkt, sondern können während ihrer Lebenszeit vielzählige weitere Fähigkeiten erwerben. Zusätzlich scheinen sie dazu kein detailliertes Wissen über den (physikalische) Ablauf einer gegebenen Aufgabe zu benötigen. Diese Eigenschaften sind auch für künstliche Systeme wünschenswert. Deswegen untersuchen wir in dieser Dissertation die Hypothese, dass Prinzipien des menschlichen Fähigkeitslernens zu alternativen Methoden für adaptive Systemkontrolle führen können. Wir untersuchen diese Hypothese anhand der Aufgabe des Autonomen Fahrens, welche ein klassiches Problem der Systemkontrolle darstellt und die Möglichkeit für vielfältige Applikationen bietet. Die genaue Aufgabe ist das Erlernen eines grundlegenden, antizipatorischen Fahrverhaltens von einem menschlichem Lehrer. Nachdem wir relevante Aspekte bezüglich des menschlichen Fähigkeitslernen aufgezeigt haben, und die Begriffe 'interne Modelle' und 'chunking' eingeführt haben, beschreiben wir die Anwendung dieser auf die gegebene Aufgabe. Wir realisieren chunking mit Hilfe einer Datenbank in welcher Beispiele menschlichen Fahreverhaltens gespeichert werden und mit Beschreibungen der visuell erfassten Strassentrajektorie verknüpft werden. Dies wird zunächst innerhalb einer Laborumgebung mit Hilfe eines Roboters verwirklicht und später, im Laufe des Europäischen DRIVSCO Projektes, auf ein echtes Auto übertragen. Wir untersuchen ausserdem das Erlernen visueller 'Vorwärtsmodelle', welche zu den internen Modellen gehören, sowie ihren Effekt auf die Kontrollperformanz beim Roboter. Das Hauptresultat dieser interdisziplinären und anwendungsorientierten Arbeit ist ein System, welches in der Lage ist als Antwort auf die visuell wahrgenommene Strassentrajektorie entsprechende Aktionspläne zu generieren, ohne das dazu metrische Informationen benötigt werden. Die vorhergesagten Aktionen in der Laborumgebung sind Lenken und Geschwindigkeit. Für das echte Auto Lenken und Beschleunigung, wobei die prediktive Kapazität des Systems für Letzteres beschränkt ist. D.h. der Roboter lernt autonomes Fahren von einem menschlichen Lehrer und das Auto lernt die Vorhersage menschlichen Fahrverhaltens. Letzteres wurde während der Begutachtung des Projektes duch ein internationales Expertenteam erfolgreich demonstriert. Das Ergebnis dieser Arbeit ist relevant für Anwendungen in der Roboterkontrolle und dabei besonders in dem Bereich intelligenter Fahrerassistenzsysteme
    corecore