16 research outputs found
Online Sorting via Searching and Selection
In this paper, we present a framework based on a simple data structure and
parameterized algorithms for the problems of finding items in an unsorted list
of linearly ordered items based on their rank (selection) or value (search). As
a side-effect of answering these online selection and search queries, we
progressively sort the list. Our algorithms are based on Hoare's Quickselect,
and are parameterized based on the pivot selection method.
For example, if we choose the pivot as the last item in a subinterval, our
framework yields algorithms that will answer q<=n unique selection and/or
search queries in a total of O(n log q) average time. After q=\Omega(n) queries
the list is sorted. Each repeated selection query takes constant time, and each
repeated search query takes O(log n) time. The two query types can be
interleaved freely. By plugging different pivot selection methods into our
framework, these results can, for example, become randomized expected time or
deterministic worst-case time. Our methods are easy to implement, and we show
they perform well in practice
Data Quality in Smart Manufacturing
Data quality is important aspect for the business in 21st -century. High quality data is needed more and more in companies for producing high quality products and services. The purpose of this research is to find and define tools and process for improving data quality in case company. These tools consist of software and monitoring process estab-lished in collaboration with delivery management of the case company.
This thesis is based on studying on improvement of data quality of materials used in production in case company. Few tools for data monitoring are presented and one was chosen for building a prototype for monitoring data quality. These tools were 3rd party software from SAP and IBM, own solution from case company and a Microsoft Power BI report. This prototype was built with Microsoft Power BI and configured for the needs of delivery management according the scope presented in this thesis. This scope consisted of few key parameters of material data that have impact on production.
Based on this study and literature review, a process for improving data quality was found. This process consists of six simple steps, that when followed correctly, can yield great improvements in data quality. These steps were: identifying metrics to collect, identify where to monitor, implementing monitoring process, running a baseline assess-ment, posting monitoring reports and reviewing monitoring trends. Improvement was also found in data quality in this thesis. Issues, for example missing master data parame-ters, in material master data quality decreased significantly, when comparing to time be-fore data monitoring to time after monitoring process was implemented.Datan laatu on tärkeä osa liiketoimintaa 2000-luvulla. Korkealaatuista dataa tarvitaan yhä enemmän yritysten toimesta, että voidaan tuottaa korkealaatuisia tuotteita ja palveluita. Tämän tutkimuksen tarkoitus on löytää ja määritellä työkaluja ja prosessi datan laadun parantamiseksi kohdeyrityksessä. Nämä työkalut ja monitorointiprosessi luotiin yhteistyössä kohdeyhtiön toimitushallinan organisaation kanssa.
Tämä lopputyö perustuu tuotannossa käytettävien materiaalien datan laadun parantamisen tutkimiseen kohdeyrityksessä. Lopputyössä on esitelty muutama eri työkalu datan monitorointia varten ja näistä yksi on valittu, josta on rakennettu prototyyppi datan laadun monitorointia varten. Työkaluja olivat kolmannen osapuolen ohjelmistot SAP:lta ja IBM:ltä, kohdeyhtiön oma ohjelmisto sekä Microsoftin Power BI raportti. Prototyyppi rakennettiin Microsoftin Power BI -ohjelmistolla ja se konfiguroitiin toimitushallinan tarpeiden mukaiseksi tutkimuksen määrittelemän laajuuden perusteella. Tämä määritelty laajuus koostuu muutamasta avainparametrista materiaalidatassa, joilla on vaikutusta tuotantoon.
Tämän tutkimuksen sekä kirjallisuuskatsauksen perusteella hyvä toimintatapa ja prosessi datan laadun parantamiseksi löydettiin. Tämä prosessi koostuu kuudesta yksinkertaisesta askeleesta joita seuraamalla voidaan saada suuria parannuksia datan laadussa. Nämä askeleet ovat: kerättävien parametrien tunnistaminen, monitorointikohteen tunninstaminen, monitorointiprosessin toteutus, lähtötilanteen arviointi, monitorointitulosten julkaiseminen sekä monitorointitrendien seuranta. Parannus datan laadussa havaittiin myös tutkimuksessa. Ongelmat materiaalidatassa pienenivät huomattavasti, kun verrattiin aikaa sekä ennen että jälkeen monitorointiprosessiin käyttöönoton
Optimal Prefix Free Codes with Partial Sorting
We describe an algorithm computing an optimal prefix free code for n unsorted positive weights in less time than required to sort them on many large classes of instances, identified by a new measure of difficulty for this problem, the alternation alpha. This asymptotical complexity is within a constant factor of the optimal in the algebraic decision tree computational model, in the worst case over all instances of fixed size n and alternation alpha. Such results refine the state of the art complexity in the worst case over instances of size n in the same computational model, a landmark in compression and coding since 1952, by the mere combination of van Leeuwen\u27s algorithm to compute optimal prefix free codes from sorted weights (known since 1976), with Deferred Data Structures to partially sort multisets (known since 1988)
28th Annual Symposium on Combinatorial Pattern Matching : CPM 2017, July 4-6, 2017, Warsaw, Poland
Peer reviewe
PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development
This paper describes PlinyCompute, a system for development of
high-performance, data-intensive, distributed computing tools and libraries. In
the large, PlinyCompute presents the programmer with a very high-level,
declarative interface, relying on automatic, relational-database style
optimization to figure out how to stage distributed computations. However, in
the small, PlinyCompute presents the capable systems programmer with a
persistent object data model and API (the "PC object model") and associated
memory management system that has been designed from the ground-up for high
performance, distributed, data-intensive computing. This contrasts with most
other Big Data systems, which are constructed on top of the Java Virtual
Machine (JVM), and hence must at least partially cede performance-critical
concerns such as memory management (including layout and de/allocation) and
virtual method/function dispatch to the JVM. This hybrid approach---declarative
in the large, trusting the programmer's ability to utilize PC object model
efficiently in the small---results in a system that is ideal for the
development of reusable, data-intensive tools and libraries. Through extensive
benchmarking, we show that implementing complex objects manipulation and
non-trivial, library-style computations on top of PlinyCompute can result in a
speedup of 2x to more than 50x or more compared to equivalent implementations
on Spark.Comment: 48 pages, including references and Appendi
Selectable Heaps and Their Application to Lazy Search Trees
We show the O(log n) time extract minimum function of efficient priority queues can be generalized to the extraction of the k smallest elements in O(k log(n/k)) time. We first show the heap-ordered tree selection of Kaplan et al. can be applied on the heap-ordered trees of the classic Fibonacci heap to support the extraction in O(k \log(n/k)) amortized time. We then show selection is possible in a priority queue with optimal worst-case guarantees by applying heap-ordered tree selection on Brodal queues, supporting the operation in O(k log(n/k)) worst-case time.
Via a reduction from the multiple selection problem, Ω(k log(n/k)) time is necessary.
We then apply the result to the lazy search trees of Sandlund & Wild, creating a new interval data structure based on selectable heaps. This gives optimal O(B+n) lazy search tree performance, lowering insertion complexity into a gap Δi to O(log(n/|Δi|))$ time. An O(1)-time merge operation is also made possible under certain conditions. If Brodal queues are used, all runtimes of the lazy search tree can be made worst-case. The presented data structure uses soft heaps of Chazelle, biased search trees, and efficient priority queues in a non-trivial way, approaching the theoretically-best data structure for ordered data
NetGlance NMS - An integrated network monitoring system
Mestrado de dupla diplomação com a Kuban State Agrarian UniversityThis work is about IT infrastructure and, in particular, computer networks in KubSAU
and IPB. Also, it is about a network monitoring system “NetGlance NMS” developed for
KubSAU System Administration Department.
Work objective is to optimize the information structure for KubSAU and IPB.
During the work, following tasks were completed: Research the existing IPB information
structure, Compare the information structure for KubSAU and IPB, Model the IPB
computer network (topology, services), Research bottlenecks and potential pitfalls in the
data-center and in the computer network of IPB, Research information security mechanisms
in the computer network of IPB, Organize monitoring process for the computer
network in KubSAU.
The most important impact of the work is an increasing network productivity and user
experience as a result of creation and deploy a monitoring software.O trabalho descrito no âmbito desta dissertação incide sobre a infraestrutura TI e, em
particular, sobre as redes de computadores da KubSAU e do IPB. Além disso, descreve-se
um sistema de gestão integrada de redes, designada “NetGlance NMS”, desenvolvido para
o Departamento de Administração de Sistemas da KubSAU.
O objetivo do trabalho é desenvolver uma ferramenta para otimizar a gestão da estrutura
de comunicações das duas instituições.
Durante o trabalho, as seguintes tarefas foram concluídas: levantamento da estrutura
de comunicações do IPB, comparação da estrutura de comunicações entre a KubSAU e o
IPB, modelação da rede de comunicações do IPB (topologia, serviços), estudo de possíveis
estrangulamentos no datacenter e na rede de comunicações doIPB, estudo de mecanismos
de segurança na rede de comunicações do IPB, organização do processo de monitorização
da rede de comunicações da KubSAU.
O contributo mais relevante deste trabalho é o desenvolvimento de uma aplicação de
gestão integrada de redes, de forma a contribuir para o aumento da produtividade da rede
e da experiência dos utilizadores