16 research outputs found

    Online Sorting via Searching and Selection

    Full text link
    In this paper, we present a framework based on a simple data structure and parameterized algorithms for the problems of finding items in an unsorted list of linearly ordered items based on their rank (selection) or value (search). As a side-effect of answering these online selection and search queries, we progressively sort the list. Our algorithms are based on Hoare's Quickselect, and are parameterized based on the pivot selection method. For example, if we choose the pivot as the last item in a subinterval, our framework yields algorithms that will answer q<=n unique selection and/or search queries in a total of O(n log q) average time. After q=\Omega(n) queries the list is sorted. Each repeated selection query takes constant time, and each repeated search query takes O(log n) time. The two query types can be interleaved freely. By plugging different pivot selection methods into our framework, these results can, for example, become randomized expected time or deterministic worst-case time. Our methods are easy to implement, and we show they perform well in practice

    Data Quality in Smart Manufacturing

    Get PDF
    Data quality is important aspect for the business in 21st -century. High quality data is needed more and more in companies for producing high quality products and services. The purpose of this research is to find and define tools and process for improving data quality in case company. These tools consist of software and monitoring process estab-lished in collaboration with delivery management of the case company. This thesis is based on studying on improvement of data quality of materials used in production in case company. Few tools for data monitoring are presented and one was chosen for building a prototype for monitoring data quality. These tools were 3rd party software from SAP and IBM, own solution from case company and a Microsoft Power BI report. This prototype was built with Microsoft Power BI and configured for the needs of delivery management according the scope presented in this thesis. This scope consisted of few key parameters of material data that have impact on production. Based on this study and literature review, a process for improving data quality was found. This process consists of six simple steps, that when followed correctly, can yield great improvements in data quality. These steps were: identifying metrics to collect, identify where to monitor, implementing monitoring process, running a baseline assess-ment, posting monitoring reports and reviewing monitoring trends. Improvement was also found in data quality in this thesis. Issues, for example missing master data parame-ters, in material master data quality decreased significantly, when comparing to time be-fore data monitoring to time after monitoring process was implemented.Datan laatu on tärkeä osa liiketoimintaa 2000-luvulla. Korkealaatuista dataa tarvitaan yhä enemmän yritysten toimesta, että voidaan tuottaa korkealaatuisia tuotteita ja palveluita. Tämän tutkimuksen tarkoitus on löytää ja määritellä työkaluja ja prosessi datan laadun parantamiseksi kohdeyrityksessä. Nämä työkalut ja monitorointiprosessi luotiin yhteistyössä kohdeyhtiön toimitushallinan organisaation kanssa. Tämä lopputyö perustuu tuotannossa käytettävien materiaalien datan laadun parantamisen tutkimiseen kohdeyrityksessä. Lopputyössä on esitelty muutama eri työkalu datan monitorointia varten ja näistä yksi on valittu, josta on rakennettu prototyyppi datan laadun monitorointia varten. Työkaluja olivat kolmannen osapuolen ohjelmistot SAP:lta ja IBM:ltä, kohdeyhtiön oma ohjelmisto sekä Microsoftin Power BI raportti. Prototyyppi rakennettiin Microsoftin Power BI -ohjelmistolla ja se konfiguroitiin toimitushallinan tarpeiden mukaiseksi tutkimuksen määrittelemän laajuuden perusteella. Tämä määritelty laajuus koostuu muutamasta avainparametrista materiaalidatassa, joilla on vaikutusta tuotantoon. Tämän tutkimuksen sekä kirjallisuuskatsauksen perusteella hyvä toimintatapa ja prosessi datan laadun parantamiseksi löydettiin. Tämä prosessi koostuu kuudesta yksinkertaisesta askeleesta joita seuraamalla voidaan saada suuria parannuksia datan laadussa. Nämä askeleet ovat: kerättävien parametrien tunnistaminen, monitorointikohteen tunninstaminen, monitorointiprosessin toteutus, lähtötilanteen arviointi, monitorointitulosten julkaiseminen sekä monitorointitrendien seuranta. Parannus datan laadussa havaittiin myös tutkimuksessa. Ongelmat materiaalidatassa pienenivät huomattavasti, kun verrattiin aikaa sekä ennen että jälkeen monitorointiprosessiin käyttöönoton

    Optimal Prefix Free Codes with Partial Sorting

    Get PDF
    We describe an algorithm computing an optimal prefix free code for n unsorted positive weights in less time than required to sort them on many large classes of instances, identified by a new measure of difficulty for this problem, the alternation alpha. This asymptotical complexity is within a constant factor of the optimal in the algebraic decision tree computational model, in the worst case over all instances of fixed size n and alternation alpha. Such results refine the state of the art complexity in the worst case over instances of size n in the same computational model, a landmark in compression and coding since 1952, by the mere combination of van Leeuwen\u27s algorithm to compute optimal prefix free codes from sorted weights (known since 1976), with Deferred Data Structures to partially sort multisets (known since 1988)

    Lazy Search Trees

    Get PDF

    28th Annual Symposium on Combinatorial Pattern Matching : CPM 2017, July 4-6, 2017, Warsaw, Poland

    Get PDF
    Peer reviewe

    PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development

    Full text link
    This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/function dispatch to the JVM. This hybrid approach---declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small---results in a system that is ideal for the development of reusable, data-intensive tools and libraries. Through extensive benchmarking, we show that implementing complex objects manipulation and non-trivial, library-style computations on top of PlinyCompute can result in a speedup of 2x to more than 50x or more compared to equivalent implementations on Spark.Comment: 48 pages, including references and Appendi

    Selectable Heaps and Their Application to Lazy Search Trees

    Get PDF
    We show the O(log n) time extract minimum function of efficient priority queues can be generalized to the extraction of the k smallest elements in O(k log(n/k)) time. We first show the heap-ordered tree selection of Kaplan et al. can be applied on the heap-ordered trees of the classic Fibonacci heap to support the extraction in O(k \log(n/k)) amortized time. We then show selection is possible in a priority queue with optimal worst-case guarantees by applying heap-ordered tree selection on Brodal queues, supporting the operation in O(k log(n/k)) worst-case time. Via a reduction from the multiple selection problem, Ω(k log(n/k)) time is necessary. We then apply the result to the lazy search trees of Sandlund & Wild, creating a new interval data structure based on selectable heaps. This gives optimal O(B+n) lazy search tree performance, lowering insertion complexity into a gap Δi to O(log(n/|Δi|))$ time. An O(1)-time merge operation is also made possible under certain conditions. If Brodal queues are used, all runtimes of the lazy search tree can be made worst-case. The presented data structure uses soft heaps of Chazelle, biased search trees, and efficient priority queues in a non-trivial way, approaching the theoretically-best data structure for ordered data

    NetGlance NMS - An integrated network monitoring system

    Get PDF
    Mestrado de dupla diplomação com a Kuban State Agrarian UniversityThis work is about IT infrastructure and, in particular, computer networks in KubSAU and IPB. Also, it is about a network monitoring system “NetGlance NMS” developed for KubSAU System Administration Department. Work objective is to optimize the information structure for KubSAU and IPB. During the work, following tasks were completed: Research the existing IPB information structure, Compare the information structure for KubSAU and IPB, Model the IPB computer network (topology, services), Research bottlenecks and potential pitfalls in the data-center and in the computer network of IPB, Research information security mechanisms in the computer network of IPB, Organize monitoring process for the computer network in KubSAU. The most important impact of the work is an increasing network productivity and user experience as a result of creation and deploy a monitoring software.O trabalho descrito no âmbito desta dissertação incide sobre a infraestrutura TI e, em particular, sobre as redes de computadores da KubSAU e do IPB. Além disso, descreve-se um sistema de gestão integrada de redes, designada “NetGlance NMS”, desenvolvido para o Departamento de Administração de Sistemas da KubSAU. O objetivo do trabalho é desenvolver uma ferramenta para otimizar a gestão da estrutura de comunicações das duas instituições. Durante o trabalho, as seguintes tarefas foram concluídas: levantamento da estrutura de comunicações do IPB, comparação da estrutura de comunicações entre a KubSAU e o IPB, modelação da rede de comunicações do IPB (topologia, serviços), estudo de possíveis estrangulamentos no datacenter e na rede de comunicações doIPB, estudo de mecanismos de segurança na rede de comunicações do IPB, organização do processo de monitorização da rede de comunicações da KubSAU. O contributo mais relevante deste trabalho é o desenvolvimento de uma aplicação de gestão integrada de redes, de forma a contribuir para o aumento da produtividade da rede e da experiência dos utilizadores
    corecore