Search CORE

596 research outputs found

On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

Author: Bustince Humberto
Elkano Mikel
Galar Mikel
Uriz Mikel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/02/2019
Field of study

We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData Congress). arXiv admin note: text overlap with arXiv:1902.0935

arXiv.org e-Print Archive

Crossref

Exploring Dynamic Compilation and Cross-Layer Object Management Policies for Managed Language Applications

Author: Jantz Michael
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2014
Field of study

Recent years have witnessed the widespread adoption of managed programming languages that are designed to execute on virtual machines. Virtual machine architectures provide several powerful software engineering advantages over statically compiled binaries, such as portable program representations, additional safety guarantees, automatic memory and thread management, and dynamic program composition, which have largely driven their success. To support and facilitate the use of these features, virtual machines implement a number of services that adaptively manage and optimize application behavior during execution. Such runtime services often require tradeoffs between efficiency and effectiveness, and different policies can have major implications on the system's performance and energy requirements. In this work, we extensively explore policies for the two runtime services that are most important for achieving performance and energy efficiency: dynamic (or Just-In-Time (JIT)) compilation and memory management. First, we examine the properties of single-tier and multi-tier JIT compilation policies in order to find strategies that realize the best program performance for existing and future machines. Our analysis performs hundreds of experiments with different compiler aggressiveness and optimization levels to evaluate the performance impact of varying if and when methods are compiled. We later investigate the issue of how to optimize program regions to maximize performance in JIT compilation environments. For this study, we conduct a thorough analysis of the behavior of optimization phases in our dynamic compiler, and construct a custom experimental framework to determine the performance limits of phase selection during dynamic compilation. Next, we explore innovative memory management strategies to improve energy efficiency in the memory subsystem. We propose and develop a novel cross-layer approach to memory management that integrates information and analysis in the VM with fine-grained management of memory resources in the operating system. Using custom as well as standard benchmark workloads, we perform detailed evaluation that demonstrates the energy-saving potential of our approach. We implement and evaluate all of our studies using the industry-standard Oracle HotSpot Java Virtual Machine to ensure that our conclusions are supported by widely-used, state-of-the-art runtime technology

KU ScholarWorks

Statistiline lähenemine mälulekete tuvastamiseks Java rakendustes

Author: Šor Vladimir
Publication venue
Publication date: 10/10/2014
Field of study

Kaasaegsed hallatud käitusaja keskkonnad (ingl. managed runtime environment) ja programmeerimiskeeled lihtsustavad rakenduste loomist ning haldamist. Kõige levinumaks näiteks säärase keele ja keskkonna kohta on Java. Üheks tähtsaks hallatud käitusaja keskkonna ülesandeks on automaatne mäluhaldus. Vaatamata sisseehitatud prügikoristajale, mälulekke probleem Javas on endiselt relevantne ning tähendab tarbetut mälu hoidmist. Probleem on eriti kriitiline rakendustes mis peaksid ööpäevaringselt tõrgeteta toimima, kuna mäluleke on üks väheseid programmeerimisvigu mis võib hävitada kogu Java rakenduse. Parimaks indikaatoriks otsustamaks kas objekt on kasutuses või mitte on objekti viimane kasutusaeg. Selle meetrika põhiliseks puudujäägiks on selle hind jõudluse mõttes. Käesolev väitekiri uurib mälulekete problemaatikat Javas ning pakub välja uudse mälulekkeid tuvastava ning diagnoosiva algoritmi. Väitekirjas kirjeldatakse alternatiivset lähenemisviisi objektide kasutuse hindamiseks. Põhihüpoteesiks on idee et lekkivaid objekte saab statistiliste meetoditega eristada mittelekkivatest kui vaadelda objektide populatsiooni eluiga erinevate gruppide lõikes. Pakutud lähenemine on oluliselt odavama hinnaga jõudluse mõttes, kuna objekti kohta on vaja salvestada infot ainult selle loomise hetkel. Väitekirja uurimistöö tulemusi on rakendatud mälulekete tuvastamise tööriista Plumbr arendamisel, mida hetkel edukalt kasutatakse ka erinevates toodangkeskkondades. Pärast sissejuhatavaid peatükke, väitekirjas vaadeldakse siiani pakutud lahendusi ning on pakutud välja ka nende meetodite klassifikatsioon. Järgnevalt on kirjeldatud statistiline baasmeetod mälulekete tuvastamiseks. Lisaks on analüüsitud ka kirjeldatud baasmeetodi puudujääke. Järgnevalt on kirjeldatud kuidas said defineeritud lisamõõdikud mis aitasid masinõppe abil baasmeetodit täpsemaks teha. Testandmeid masinõppe tarbeks on kogutud Plumbri abil päris rakendustest ning toodangkeskkondadest. Lisaks, kirjeldatakse väitekirjas juhtumianalüüse ning võrdlust ühe olemasoleva mälulekete tuvastamise lahendusega.Modern managed runtime environments and programming languages greatly simplify creation and maintenance of applications. One of the best examples of such managed runtime environments and a language is the Java Virtual Machine and the Java programming language. Despite the built in garbage collector, the memory leak problem is still relevant in Java and means wasting memory by preventing unused objects from being removed. The problem of memory leaks is especially critical for applications, which are expected to work uninterrupted around the clock, as running out of memory is one of a few reasons which may cause the termination of the whole Java application. The best indicator of whether an object is used or not is the time of the last access. However, the main disadvantage of this metric is the incurred performance overhead. Current thesis researches the memory leak problem and proposes a novel approach for memory leak detection and diagnosis. The thesis proposes an alternative approach for estimation of the 'unusedness' of objects. The main hypothesis is that leaked objects may be identified by applying statistical methods to analyze lifetimes of objects, by observing the ages of the population of objects grouped by their allocation points. Proposed solution is much more efficient performance-wise as for each object it is sufficient to record any information at the time of creation of the object. The research conducted for the thesis is utilized in a memory leak detection tool Plumbr. After the introduction and overview of the state of the art, current thesis reviews existing solutions and proposes the classification for memory leak detection approaches. Next, the statistical approach for memory leak detection is described along with the description of the main metric used to distinguish leaking objects from non-leaking ones. Follows the analysis of this single metric. Based on this analysis additional metrics are designed and machine learning algorithms are applied on the statistical data acquired from real production environments from the Plumbr tool. Case studies of real applications and one previous solution for the memory leak detection are performed in order to evaluate performance overhead of the tool

DSpace at Tartu University Library

Efficient branch and node testing

Author: Misurda Jonathan
Publication venue
Publication date: 01/02/2012
Field of study

Software testing evaluates the correctness of a program’s implementation through a test suite. The quality of a test case or suite is assessed with a coverage metric indicating what percentage of a program’s structure was exercised (covered) during execution. Coverage of every execution path is impossible due to infeasible paths and loops that result in an exponential or infinite number of paths. Instead, metrics such as the number of statements (nodes) or control-flow branches covered are used. Node and branch coverage require instrumentation probes to be present during program runtime. Traditionally, probes were statically inserted during compilation. These static probes remain even after coverage is recorded, incurring unnecessary overhead, reducing the number of tests that can be run, or requiring large amounts of memory In this dissertation, I present three novel techniques for improving branch and node coverage performance for the Java runtime. First, Demand-driven Structural Testing (DDST) uses dynamic insertion and removal of probes so they can be removed after recording coverage, avoiding the unnecessary overhead of static instrumentation. DDST is built on a new framework for developing and researching coverage techniques, Jazz. DDST for node coverage averages 19.7% faster than statically-inserted instrumentation on an industry-standard benchmark suite, SPECjvm98. Due to DDST’s higher-cost probes, no single branch coverage technique performs best on all programs or methods. To address this, I developed Hybrid Structural Testing (HST). HST combines different test techniques, including static and DDST, into one run. HST uses a cost model for analysis, reducing the cost of branch coverage testing an average of 48% versus Static and 56% versus DDST on SPECjvm98. HST never chooses certain techniques due to expensive analysis. I developed a third technique, Test Plan Caching (TPC), that exploits the inherent repetition in testing over a suite. TPC saves analysis results to avoid recomputation. Combined with HST, TPC produces a mix of techniques that record coverage quickly and efficiently. My three techniques reduce the average cost of branch coverage by 51.6–90.8% over previous approaches on SPECjvm98, allowing twice as many test cases in a given time budget

D-Scholarship@Pitt

Reducing energy usage in resource-intensive Java-based scientific applications via micro-benchmark based code refactorings

Author: Longo Mathias
Mateos Diaz Cristian Maximiliano
Rodriguez Ana Virginia
Zunino Suarez Alejandro Octavio
Publication venue: Comsis Consortium
Publication date: 01/06/2019
Field of study

In-silico research has grown considerably. Today's scientific code involves long-running computer simulations and hence powerful computing infrastructures are needed. Traditionally, research in high-performance computing has focused on executing code as fast as possible, while energy has been recently recognized as another goal to consider. Yet, energy-driven research has mostly focused on the hardware and middleware layers, but few efforts target the application level, where many energy-aware optimizations are possible. We revisit a catalog of Java primitives commonly used in OO scientific programming, or micro-benchmarks, to identify energy-friendly versions of the same primitive. We then apply the micro-benchmarks to classical scientific application kernels and machine learning algorithms for both single-thread and multi-thread implementations on a server. Energy usage reductions at the micro-benchmark level are substantial, while for applications obtained reductions range from 3.90% to 99.18%.Fil: Longo, Mathias. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina. University of Southern California; Estados UnidosFil: Rodriguez, Ana Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Mateos Diaz, Cristian Maximiliano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Zunino Suarez, Alejandro Octavio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentin

CONICET Digital

PERFORMANCE OPTIMISATIONS FOR HETEROGENEOUS MANAGED RUNTIME SYSTEMS

Author: Papadimitriou Michail
Publication venue
Publication date: 31/12/2021
Field of study

The University of Manchester - Institutional Repository