298 research outputs found
OFSET_mine:an integrated framework for cardiovascular diseases risk prediction based on retinal vascular function
As cardiovascular disease (CVD) represents a spectrum of disorders that often manifestfor the first time through an acute life-threatening event, early identification of seemingly healthy subjects with various degrees of risk is a priority.More recently, traditional scores used for early identification of CVD risk are slowly being replaced by more sensitive biomarkers that assess individual, rather than population risks for CVD. Among these, retinal vascular function, as assessed by the retinal vessel analysis method (RVA), has been proven as an accurate reflection of subclinical CVD in groups of participants without overt disease but with certain inherited or acquired risk factors. Furthermore, in order to correctly detect individual risk at an early stage, specialized machine learning methods and featureselection techniques that can cope with the characteristics of the data need to bedevised.The main contribution of this thesis is an integrated framework, OFSET_mine, that combinesnovel machine learning methods to produce a bespoke solution for Cardiovascular Risk Prediction based on RVA data that is also applicable to other medical datasets with similar characteristics. The three identified essential characteristics are 1) imbalanced dataset,2) high dimensionality and 3) overlapping feature ranges with the possibility of acquiring new samples. The thesis proposes FiltADASYN as an oversampling method that deals with imbalance, DD_Rank as a feature selection method that handles high dimensionality, and GCO_mine as a method for individual-based classification, all three integrated within the OFSET_mine framework.The new oversampling method FiltADASYN extends Adaptive Synthetic Oversampling(ADASYN) with an additional step to filter the generated samples and improve the reliability of the resultant sample set. The feature selection method DD_Rank is based on Restricted Boltzmann Machine (RBM) and ranks features according to their stability and discrimination power. GCO_mine is a lazy learning method based on Graph Cut Optimization (GCO), which considers both the local arrangements and the global structure of the data.OFSET_mine compares favourably to well established composite techniques. Itex hibits high classification performance when applied to a wide range of benchmark medical datasets with variable sample size, dimensionality and imbalance ratios.When applying OFSET _mine on our RVA data, an accuracy of 99.52% is achieved. In addition, using OFSET, the hybrid solution of FiltADASYN and DD_Rank, with Random Forest on our RVA data produces risk group classifications with accuracy 99.68%. This not only reflects the success of the framework but also establishes RVAas a valuable cardiovascular risk predicto
Enabling Ubiquitous OLAP Analyses
An OLAP analysis session is carried out as a sequence of OLAP operations applied to multidimensional cubes. At each step of a session, an operation is applied to the result of the previous step in an incremental fashion. Due to its simplicity and flexibility, OLAP is the most adopted paradigm used to explore the data stored in data warehouses. With the goal of expanding the fruition of OLAP analyses, in this thesis we touch several critical topics. We first present our contributions to deal with data extractions from service-oriented sources, which are nowadays used to provide access to many databases and analytic platforms. By addressing data extraction from these sources we make a step towards the integration of external databases into the data warehouse, thus providing richer data that can be analyzed through OLAP sessions. The second topic that we study is that of visualization of multidimensional data, which we exploit to enable OLAP on devices with limited screen and bandwidth capabilities (i.e., mobile devices). Finally, we propose solutions to obtain multidimensional schemata from unconventional sources (e.g., sensor networks), which are crucial to perform multidimensional analyses
Lazy Contracts: Alleviating High Gas Costs by Secure and Trustless Off-chain Execution of Smart Contracts
Smart contracts are programs that are executed on the blockchain and can
hold, manage and transfer assets in the form of cryptocurrencies. The
contract's execution is then performed on-chain and is subject to consensus,
i.e. every node on the blockchain network has to run the function calls and
keep track of their side-effects. In most programmable blockchains, such as
Ethereum, the notion of gas is introduced to prevent DoS attacks by malicious
parties who might try to slow down the network by performing heavy
computations. A fixed cost to each atomic operation, and the initiator of a
function call pays the total gas cost as a transaction fee. This helps prevent
DoS attacks, but the resulting fees are extremely high. For example, in 2022,
on Ethereum alone, there has been a total gas usage of 1.77 Million ETH ~ 4.3
Billion USD. This thesis proposes "lazy contracts" as a solution to alleviate
these costs. Our solution moves most of the computation off-chain, ensuring
that each function call incurs only a tiny amount of gas usage, while
preserving enough data on-chain to guarantee an implicit consensus about the
state of the contract variables and ownership of funds. A complete on-chain
execution of the functions will only be triggered in case two parties to the
contract are in disagreement about the current state, which in turn can only
happen if at least one party is dishonest. In such cases, our protocol can
identify the dishonest party and penalize them by having them pay for the
entire gas usage. Hence, no rational party has an incentive to act dishonestly.
Finally, we perform extensive experiments over 160,735 real-world Solidity
contracts that were involved in 9,055,492 transactions in January 2022--January
2023 on Ethereum and show that our approach reduces the overall gas usage by
55.4%, which amounts to an astounding saving of 109.9 Million USD in gas fees.Comment: 60 pages, 10 figure
Hybrid eager and lazy evaluation for efficient compilation of Haskell
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 208-220).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.The advantage of a non-strict, purely functional language such as Haskell lies in its clean equational semantics. However, lazy implementations of Haskell fall short: they cannot express tail recursion gracefully without annotation. We describe resource-bounded hybrid evaluation, a mixture of strict and lazy evaluation, and its realization in Eager Haskell. From the programmer's perspective, Eager Haskell is simply another implementation of Haskell with the same clean equational semantics. Iteration can be expressed using tail recursion, without the need to resort to program annotations. Under hybrid evaluation, computations are ordinarily executed in program order just as in a strict functional language. When particular stack, heap, or time bounds are exceeded, suspensions are generated for all outstanding computations. These suspensions are re-started in a demand-driven fashion from the root. The Eager Haskell compiler translates Ac, the compiler's intermediate representation, to efficient C code. We use an equational semantics for Ac to develop simple correctness proofs for program transformations, and connect actions in the run-time system to steps in the hybrid evaluation strategy.(cont.) The focus of compilation is efficiency in the common case of straight-line execution; the handling of non-strictness and suspension are left to the run-time system. Several additional contributions have resulted from the implementation of hybrid evaluation. Eager Haskell is the first eager compiler to use a call stack. Our generational garbage collector uses this stack as an additional predictor of object lifetime. Objects above a stack watermark are assumed to be likely to die; we avoid promoting them. Those below are likely to remain untouched and therefore are good candidates for promotion. To avoid eagerly evaluating error checks, they are compiled into special bottom thunks, which are treated specially by the run-time system. The compiler identifies error handling code using a mixture of strictness and type information. This information is also used to avoid inlining error handlers, and to enable aggressive program transformation in the presence of error handling.by Jan-Willem Maessen.Ph.D
Efficient Precise Dynamic Data Race Detection For Cpu And Gpu
Data races are notorious bugs. They introduce non-determinism in programs behavior, complicate programs semantics, making it challenging to debug parallel programs. To make parallel programming easier, efficient data race detection has been a research topic in the last decades. However, existing data race detectors either sacrifice precision or incur high overhead, limiting their application to real-world applications and scenarios. This dissertation proposes approaches to improve the performance of dynamic data race detection without undermining precision, by identifying and removing metadata redundancy dynamically. This dissertation also explores ways to make it practical to detect data races dynamically for GPU programs, which has a disparate programming and execution model from CPU workloads. Further, this dissertation shows how the structured synchronization model in GPU programs can simplify the algorithm design of
data race detection for GPU, and how the unique patterns in GPU workloads enable an efficient implementation of the algorithm, yielding a high-performance dynamic data race detector for GPU programs
The 2011 International Planning Competition
After a 3 years gap, the 2011 edition of the IPC involved a total of 55 planners,
some of them versions of the same planner, distributed among four tracks: the sequential
satisficing track (27 planners submitted out of 38 registered), the sequential multicore
track (8 planners submitted out of 12 registered), the sequential optimal track (12
planners submitted out of 24 registered) and the temporal satisficing track (8 planners
submitted out of 14 registered). Three more tracks were open to participation: temporal
optimal, preferences satisficing and preferences optimal. Unfortunately the number of submitted planners did not allow these tracks to be finally included in the competition.
A total of 55 people were participating, grouped in 31 teams. Participants came
from Australia, Canada, China, France, Germany, India, Israel, Italy, Spain, UK and
USA.
For the sequential tracks 14 domains, with 20 problems each, were selected, while
the temporal one had 12 domains, also with 20 problems each. Both new and past
domains were included. As in previous competitions, domains and problems were
unknown for participants and all the experimentation was carried out by the organizers.
To run the competition a cluster of eleven 64-bits computers (Intel XEON 2.93 Ghz
Quad core processor) using Linux was set up. Up to 1800 seconds, 6 GB of RAM memory and 750 GB of hard disk were available for each planner to solve a problem. This resulted in 7540 computing hours (about 315 days), plus a high number of hours devoted to preliminary experimentation with new domains, reruns and bugs fixing.
The detailed results of the competition, the software used for automating most
tasks, the source code of all the participating planners and the description of domains and problems can be found at the competition’s web page:
http://www.plg.inf.uc3m.es/ipc2011-deterministicThis booklet summarizes the participants on the Deterministic Track of the International
Planning Competition (IPC) 2011. Papers describing all the participating planners
are included
Cost-Based Optimization of Integration Flows
Integration flows are increasingly used to specify and execute data-intensive integration tasks between heterogeneous systems and applications. There are many different application areas such as real-time ETL and data synchronization between operational systems. For the reasons of an increasing amount of data, highly distributed IT infrastructures, and high requirements for data consistency and up-to-dateness of query results, many instances of integration flows are executed over time. Due to this high load and blocking synchronous source systems, the performance of the central integration platform is crucial for an IT infrastructure. To tackle these high performance requirements, we introduce the concept of cost-based optimization of imperative integration flows that relies on incremental statistics maintenance and inter-instance plan re-optimization. As a foundation, we introduce the concept of periodical re-optimization including novel cost-based optimization techniques that are tailor-made for integration flows. Furthermore, we refine the periodical re-optimization to on-demand re-optimization in order to overcome the problems of many unnecessary re-optimization steps and adaptation delays, where we miss optimization opportunities. This approach ensures low optimization overhead and fast workload adaptation
Sequence-to-sequence learning for machine translation and automatic differentiation for machine learning software tools
Cette thèse regroupe des articles d'apprentissage automatique et s'articule autour de deux thématiques complémentaires.
D'une part, les trois premiers articles examinent l'application des réseaux de neurones artificiels aux problèmes du traitement automatique du langage naturel (TALN). Le premier article introduit une structure codificatrice-décodificatrice avec des réseaux de neurones récurrents pour traduire des segments de phrases de longueur variable. Le deuxième article analyse la performance de ces modèles de `traduction neuronale automatique' de manière qualitative et quantitative, tout en soulignant les difficultés posées par les phrases longues et les mots rares. Le troisième article s'adresse au traitement des mots rares et hors du vocabulaire commun en combinant des algorithmes de compression par dictionnaire et des réseaux de neurones récurrents.
D'autre part, la deuxième partie de cette thèse fait abstraction de modèles particuliers de réseaux de neurones afin d'aborder l'infrastructure logicielle nécessaire à leur définition et entraînement. Les infrastructures modernes d'apprentissage profond doivent avoir la capacité d'exécuter efficacement des programmes d'algèbre linéaire et par tableaux, tout en étant capable de différentiation automatique (DA) pour calculer des dérivées multiples. Le premier article aborde les défis généraux posés par la conciliation de ces deux objectifs et propose la solution d'une représentation intermédiaire fondée sur les graphes. Le deuxième article attaque le même problème d'une manière différente: en implémentant un code source par bande dans un langage de programmation dynamique par tableau (Python et NumPy).This thesis consists of a series of articles that contribute to the field of machine learning. In particular, it covers two distinct and loosely related fields.
The first three articles consider the use of neural network models for problems in natural language processing (NLP). The first article introduces the use of an encoder-decoder structure involving recurrent neural networks (RNNs) to translate from and to variable length phrases and sentences. The second article contains a quantitative and qualitative analysis of the performance of these `neural machine translation' models, laying bare the difficulties posed by long sentences and rare words. The third article deals with handling rare and out-of-vocabulary words in neural network models by using dictionary coder compression algorithms and multi-scale RNN models.
The second half of this thesis does not deal with specific neural network models, but with the software tools and frameworks that can be used to define and train them. Modern deep learning frameworks need to be able to efficiently execute programs involving linear algebra and array programming, while also being able to employ automatic differentiation (AD) in order to calculate a variety of derivatives. The first article provides an overview of the difficulties posed in reconciling these two objectives, and introduces a graph-based intermediate representation that aims to tackle these difficulties. The second article considers a different approach to the same problem, implementing a tape-based source-code transformation approach to AD on a dynamically typed array programming language (Python and NumPy)
Teaching a Robot to Drive - A Skill Learning Inspired Approach
Roboter können unser Leben erleichtern, indem sie
für uns unangenehme, oder sogar gefährliche Aufgaben
übernehmen. Um sie effizient einsetzen zu können,
sollten sie autonom, adaptiv und einfach zu instruieren
sein. Traditionelle 'white-box'-Ansätze in der Robotik
basieren auf dem Verständnis des Ingenieurs der
unterliegenden physikalischen Struktur des gegebenen
Problems. Ausgehend von diesem Verständnis kann der
Ingenieur eine mögliche Lösung finden und es in dem
System implementieren. Dieser Ansatz ist sehr mächtig,
aber gleichwohl limitiert. Der wichtigste Nachteil ist,
dass derart erstellte Systeme von vordefiniertem Wissen
abhängen und deswegen jedes neue Verhalten den
gleichen, teuren Entwicklungszyklus benötigt. Im
Gegensatz dazu sind Menschen und einige andere Tiere
nicht auf ihre angeborene Verhalten beschränkt, sondern
können während ihrer Lebenszeit vielzählige weitere
Fähigkeiten erwerben. Zusätzlich scheinen sie dazu kein
detailliertes Wissen über den (physikalische) Ablauf
einer gegebenen Aufgabe zu benötigen. Diese
Eigenschaften sind auch für künstliche Systeme
wünschenswert. Deswegen untersuchen wir in dieser
Dissertation die Hypothese, dass Prinzipien des
menschlichen Fähigkeitslernens zu alternativen Methoden
für adaptive Systemkontrolle führen können. Wir
untersuchen diese Hypothese anhand der Aufgabe des
Autonomen Fahrens, welche ein klassiches Problem der
Systemkontrolle darstellt und die Möglichkeit für
vielfältige Applikationen bietet. Die genaue Aufgabe
ist das Erlernen eines grundlegenden, antizipatorischen
Fahrverhaltens von einem menschlichem Lehrer. Nachdem
wir relevante Aspekte bezüglich des menschlichen
Fähigkeitslernen aufgezeigt haben, und die Begriffe
'interne Modelle' und 'chunking' eingeführt haben,
beschreiben wir die Anwendung dieser auf die gegebene
Aufgabe. Wir realisieren chunking mit Hilfe einer
Datenbank in welcher Beispiele menschlichen
Fahreverhaltens gespeichert werden und mit
Beschreibungen der visuell erfassten
Strassentrajektorie verknüpft werden. Dies wird
zunächst innerhalb einer Laborumgebung mit Hilfe eines
Roboters verwirklicht und später, im Laufe des
Europäischen DRIVSCO Projektes, auf ein echtes Auto
übertragen. Wir untersuchen ausserdem das Erlernen
visueller 'Vorwärtsmodelle', welche zu den internen
Modellen gehören, sowie ihren Effekt auf die
Kontrollperformanz beim Roboter. Das Hauptresultat
dieser interdisziplinären und anwendungsorientierten
Arbeit ist ein System, welches in der Lage ist als
Antwort auf die visuell wahrgenommene
Strassentrajektorie entsprechende Aktionspläne zu
generieren, ohne das dazu metrische Informationen
benötigt werden. Die vorhergesagten Aktionen in der
Laborumgebung sind Lenken und Geschwindigkeit. Für das
echte Auto Lenken und Beschleunigung, wobei die
prediktive Kapazität des Systems für Letzteres
beschränkt ist. D.h. der Roboter lernt autonomes Fahren
von einem menschlichen Lehrer und das Auto lernt die
Vorhersage menschlichen Fahrverhaltens. Letzteres wurde
während der Begutachtung des Projektes duch ein
internationales Expertenteam erfolgreich demonstriert.
Das Ergebnis dieser Arbeit ist relevant für Anwendungen
in der Roboterkontrolle und dabei besonders in dem
Bereich intelligenter Fahrerassistenzsysteme
- …