28 research outputs found

    Energy Complexity for Sorting Algorithms in Java

    Full text link
    This study extends the concept of time complexity to energy, i.e., energy complexity, by showing a strong correlation between time complexity and energy consumption for sorting algorithms: Bubble Sort, Counting Sort, Merge Sort and Quick Sort, written in Java and run on single kernels. We investigate the correlation between wall time and time complexity, as well as the correlation between energy consumption and wall time. The primary finding is that time complexity can be used as a guideline to estimate the energy consumption of O(n*n), O(nlog(n)) and O(n + k) sorting algorithms. The secondary finding is that the inputs producing the theoretical worst cases for Merge Sort and Bubble Sort did not produce the worst case wall time nor the worst case energy consumption

    Sorting Algorithms and Their Execution Times an Empirical Evaluation

    Get PDF
    One of the main topics in computer science is how to perform data classification without requiring plenty of resources and time. The sorting algorithms Quicksort, Mergesort, Timsort, Heapsort, Bubblesort, Insertion Sort, Selection Sort, Tree Sort, Shell Sort, Radix Sort, Counting Sort, are the most recognized and used. The existence of different sorting algorithm options led us to ask: What is the algorithm that us better execution times? Under this context, it was necessary to understand the various sorting algorithms in C and Python programming language to evaluate them and determine which one has the shortest execution time. We implement algorithms that help create four types of integer arrays (random, almost ordered, inverted, and few unique). We implement eleven classification algorithms to record each execution time, using different elements and iterations to verify the accuracy. We carry out the research using the integrated development environments Dev-C++ 5.11 and Sublime Text 3. The products allow us to identify different situations in which each algorithm shows better execution times

    Understanding Random Forests: From Theory to Practice

    Get PDF
    Data analysis and machine learning have become an integrative part of the modern scientific methodology, offering automated procedures for the prediction of a phenomenon based on past observations, unraveling underlying patterns in data and providing insights about the problem. Yet, caution should avoid using machine learning as a black-box tool, but rather consider it as a methodology, with a rational thought process that is entirely dependent on the problem under study. In particular, the use of algorithms should ideally require a reasonable understanding of their mechanisms, properties and limitations, in order to better apprehend and interpret their results. Accordingly, the goal of this thesis is to provide an in-depth analysis of random forests, consistently calling into question each and every part of the algorithm, in order to shed new light on its learning capabilities, inner workings and interpretability. The first part of this work studies the induction of decision trees and the construction of ensembles of randomized trees, motivating their design and purpose whenever possible. Our contributions follow with an original complexity analysis of random forests, showing their good computational performance and scalability, along with an in-depth discussion of their implementation details, as contributed within Scikit-Learn. In the second part of this work, we analyse and discuss the interpretability of random forests in the eyes of variable importance measures. The core of our contributions rests in the theoretical characterization of the Mean Decrease of Impurity variable importance measure, from which we prove and derive some of its properties in the case of multiway totally randomized trees and in asymptotic conditions. In consequence of this work, our analysis demonstrates that variable importances [...].Comment: PhD thesis. Source code available at https://github.com/glouppe/phd-thesi

    A Mobile Wireless Channel State Recognition Algorihm: Introduction, Definition, and Verification - Sensing for Cognitive Environmental Awareness

    Get PDF
    This research includes mobile wireless systems limited by time and frequency dispersive channels. A blind mobile wireless channel (MWC) state recognition (CSR) algorithm that detects hidden coherent nonselective and noncoherent selective processes is verified. Because the algorithm is blind, it releases capacity based on current channel state that traditionally is fixed and reserved for channel gain estimation and distortion mitigation. The CSR algorithm enables cognitive communication system control including signal processing, resource allocation/deallocation, or distortion mitigation selections based on channel coherence states. MWC coherent and noncoherent states, ergodicity, stationarity, uncorrelated scattering, and Markov processes are assumed for each time block. Furthermore, a hidden Markov model (HMM) is utilized to represent the statistical relationships between hidden dispersive processes and observed receive waveform processes. First-order and second-order statistical extracted features support state hard decisions which are combined in order to increase the accuracy of channel state estimates. This research effort has architected, designed, and verified a blind statistical feature recognition algorithm capable of detecting coherent nonselective, single time selective, single frequency selective, or dual selective noncoherent states. A MWC coherence state model (CSM) was designed to represent these hidden dispersive processes. Extracted statistical features are input into a parallel set of trained HMMs that compute state sequence conditional likelihoods. Hard state decisions are combined to produce a single most likely channel state estimate for each time block. To verify the CSR algorithm performance, combinations of hidden state sequences are applied to the CSR algorithm and verified against input hidden state sequences. State sequence recognition accuracy sensitivity was found to be above 99% while specificity was determined to be above 98% averaged across all features, states, and sequences. While these results establish the feasibility of a MWC blind CSR algorithm, optimal configuration requires future research to further improve performance including: 1) characterizing the range of input signal configurations, 2) waveform feature block size reduction, 3) HMM parameter tracking, 4) HMM computational complexity and latency reduction, 5) feature soft decision combining, 6) recursive implementation, 7) interfacing with state based mobile wireless communication control processes, and 8) extension to wired or wireless waveform recognition

    Turvalise ühisarvutuse rakendamine

    Get PDF
    Andmetest on kasu vaid siis kui neid saab kasutada. Eriti suur lisandväärtus tekib siis, kui ühendada andmed erinevatest allikatest. Näiteks, liites kokku maksu- ja haridusandmed, saab riik läbi viia kõrghariduse erialade tasuvusanalüüse. Sama kehtib ka erasektoris - ühendades pankade maksekohustuste andmebaasid, saab efektiivsemalt tuvastada kõrge krediidiriskiga kliente. Selline andmekogude ühendamine on aga tihti konfidentsiaalsus- või privaatsusnõuete tõttu keelatud. Õigustatult, sest suuremahulised ühendatud andmekogud on atraktiivsed sihtmärgid nii häkkeritele kui ka ametnikele ja andmebaaside administraatoritele, kes oma õigusi kuritarvitada võivad. Seda sorti rünnete vastus aitab turvalise ühisarvutuse tehnoloogia kasutamine, mis võimaldab mitmed osapoolel andmeid ühiselt analüüsida, ilma et keegi neist pääseks ligi üksikutele kirjetele. Oma esimesest rakendamisest praktikas 2008. aastal on turvalise ühisarvutuse tehnoloogia praeguseks jõudnud seisu, kus seda juurutatakse hajusates rakendustes üle interneti ning seda pakutakse ka osana teistest teenustest. Käesolevas töös keskendume turvalise ühisarvutuse praktikas rakendamise tehnilistele küsimustele. Alustuseks tutvustame esimesi selle tehnoloogia rakendusi, tuvastame veel lahendamata probleeme ning pakume töö käigus välja lahendusi. Töö põhitulemus on samm-sammuline ülevaade sellise juurutuse elutsüklist, kasutades näitena esimest turvalise ühisarvutuse abil läbi viidud suuremahulisi registriandmeid hõlmavat uuringut. Sealhulgas anname ülevaate ka mittetehnilistest toimingutest nagu lepingute sõlmimine ja Andmekaitse Inspektsiooniga suhtlemine, mis tulenevad suurte organisatsioonide kaasamisest nagu seda on riigiasutused. Tulevikku vaadates pakume välja lahenduse, mis ühendab endas födereeritud andmevahetusplatvormi ja turvalise ühisarvutuse tehnoloogiat. Konkreetse lahendusena pakume Eesti riigi andmevahetuskihi X-tee täiustamist turvalise ühisarvutuse teenusega Sharemind. Selline arhitektuur võimaldaks mitmeid olemasolevaid andmekogusid uuringuteks liita efektiivselt ja turvaliselt, ilma üksikisikute privaatsust rikkumata.Data is useful only when used. This is especially true if one is able to combine several data sets. For example, combining income and educational data, it is possible for a government to get a return of investment overview of educational investments. The same is true in private sector. Combining data sets of financial obligations of their customers, banks could issue loans with lower credit risks. However, this kind of data sharing is often forbidden as citizens and customers have their privacy expectations. Moreover, such a combined database becomes an interesting target for both hackers as well as nosy officials and administrators taking advantage of their position. Secure multi-party computation is a technology that allows several parties to collaboratively analyse data without seeing any individual values. This technology is suitable for the above mentioned scenarios protecting user privacy from both insider and outsider attacks. With first practical applications using secure multi-party computation developed in 2000s, the technology is now mature enough to be used in distributed deployments and even offered as part of a service. In this work, we present solutions for technical difficulties in deploying secure multi-party computation in real-world applications. We will first give a brief overview of the current state of the art, bring out several shortcomings and address them. The main contribution of this work is an end-to-end process description of deploying secure multi-party computation for the first large-scale registry-based statistical study on linked databases. Involving large stakeholders like government institutions introduces also some non-technical requirements like signing contracts and negotiating with the Data Protection Agency. Looking into the future, we propose to deploy secure multi-party computation technology as a service on a federated data exchange infrastructure. This allows privacy-preserving analysis to be carried out faster and more conveniently, thus promoting a more informed government

    Privacy-Preserving Detection Method for Transmission Line Based on Edge Collaboration

    Full text link
    Unmanned aerial vehicles (UAVs) are commonly used for edge collaborative computing in current transmission line object detection, where computationally intensive tasks generated by user nodes are offloaded to more powerful edge servers for processing. However, performing edge collaborative processing on transmission line image data may result in serious privacy breaches. To address this issue, we propose a secure single-stage detection model called SecYOLOv7 that preserves the privacy of object detecting. Based on secure multi-party computation (MPC), a series of secure computing protocols are designed for the collaborative execution of Secure Feature Contraction, Secure Bounding-Box Prediction and Secure Object Classification by two non-edge servers. Performance evaluation shows that both computational and communication overhead in this framework as well as calculation error significantly outperform existing works

    Efficient estimation algorithms for large and complex data sets

    Get PDF
    The recent world-wide surge in available data allows the investigation of many new and sophisticated questions that were inconceivable just a few years ago. However, two types of data sets often complicate the subsequent analysis: Data that is simple in structure but large in size, and data that is small in size but complex in structure. These two kinds of problems also apply to biological data. For example, data sets acquired from family studies, where the data can be visualized as pedigrees, are small in size but, because of the dependencies within families, they are complex in structure. By comparison, next-generation sequencing data, such as data from chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq), is simple in structure but large in size. Even though the available computational power is increasing steadily, it often cannot keep up with the massive amounts of new data that are being acquired. In these situations, ordinary methods are no longer applicable or scale badly with increasing sample size. The challenge in today’s environment is then to adapt common algorithms for modern data sets. This dissertation considers the challenge of performing inference on modern data sets, and approaches the problem in two parts: first using a problem in the field of genetics, and then using one from molecular biology. In the first part, we focus on data of a complex nature. Specifically, we analyze data from a family study on colorectal cancer (CRC). To model familial clusters of increased cancer risk, we assume inheritable but latent variables for a risk factor that increases the hazard rate for the occurrence of CRC. During parameter estimation, the inheritability of this latent variable necessitates a marginalization of the likelihood that is costly in time for large families. We first approached this problem by implementing computational accelerations that reduced the time for an optimization by the Nelder-Mead method to about 10% of a naive implementation. In a next step, we developed an expectation-maximization (EM) algorithm that works on data obtained from pedigrees. To achieve this, we used factor graphs to factorize the likelihood into a product of “local” functions, which enabled us to apply the sum-product algorithm in the E-step, reducing the computational complexity from exponential to linear. Our algorithm thus enables parameter estimation for family studies in a feasible amount of time. In the second part, we turn to ChIP-Seq data. Previously, practitioners were required to assemble a set of tools based on different statistical assumptions and dedicated to specific applications such as calling protein occupancy peaks or testing for differential occupancies between experimental conditions. In order to remove these restrictions and create a unified framework for ChIP-Seq analysis, we developed GenoGAM (Genome-wide Generalized Additive Model), which extends generalized additive models to efficiently work on data spread over a long x axis by reducing the scaling from cubic to linear and by employing a data parallelism strategy. Our software makes the well-established and flexible GAM framework available for a number of genomic applications. Furthermore, the statistical framework allows for significance testing for differential occupancy. In conclusion, I show how developing algorithms of lower complexity can open the door for analyses that were previously intractable. On this basis, it is recommended to focus subsequent research efforts on lowering the complexity of existing algorithms and design new, lower-complexity algorithms

    Third CLIPS Conference Proceedings, volume 1

    Get PDF
    Expert systems are computed programs which emulate human expertise in well defined problem domains. The potential payoff from expert systems is high: valuable expertise can be captured and preserved, repetitive and/or mundane tasks requiring human expertise can be automated, and uniformity can be applied in decision making processes. The C Language Integrated Production Systems (CLIPS) is an expert system building tool, developed at the Johnson Space Center, which provides a complete environment for the development and delivery of rule and/or object based expert systems. CLIPS was specifically designed to provide a low cost option for developing and deploying expert system applications across a wide range of hardware platforms. The development of CLIPS has helped to improve the ability to deliver expert systems technology throughout the public and private sectors for a wide range of applications and diverse computing environments
    corecore