67 research outputs found

    Statistical structures for internet-scale data management

    Get PDF
    Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability

    Algorithmic Solutions for Combinatorial Problems in Resource Management of Manufacturing Environments

    Get PDF
    This thesis studies the use of heuristic algorithms in a number of combinatorial problems that occur in various resource constrained environments. Such problems occur, for example, in manufacturing, where a restricted number of resources (tools, machines, feeder slots) are needed to perform some operations. Many of these problems turn out to be computationally intractable, and heuristic algorithms are used to provide efficient, yet sub-optimal solutions. The main goal of the present study is to build upon existing methods to create new heuristics that provide improved solutions for some of these problems. All of these problems occur in practice, and one of the motivations of our study was the request for improvements from industrial sources. We approach three different resource constrained problems. The first is the tool switching and loading problem, and occurs especially in the assembly of printed circuit boards. This problem has to be solved when an efficient, yet small primary storage is used to access resources (tools) from a less efficient (but unlimited) secondary storage area. We study various forms of the problem and provide improved heuristics for its solution. Second, the nozzle assignment problem is concerned with selecting a suitable set of vacuum nozzles for the arms of a robotic assembly machine. It turns out that this is a specialized formulation of the MINMAX resource allocation formulation of the apportionment problem and it can be solved efficiently and optimally. We construct an exact algorithm specialized for the nozzle selection and provide a proof of its optimality. Third, the problem of feeder assignment and component tape construction occurs when electronic components are inserted and certain component types cause tape movement delays that can significantly impact the efficiency of printed circuit board assembly. Here, careful selection of component slots in the feeder improves the tape movement speed. We provide a formal proof that this problem is of the same complexity as the turnpike problem (a well studied geometric optimization problem), and provide a heuristic algorithm for this problem.Siirretty Doriast

    Efficient pac-learning for episodic tasks with acyclic state spaces and the optimal node visitation problem in acyclic stochastic digaphs.

    Get PDF
    The first part of this research program concerns the development of customized and easily implementable Probably Approximately Correct (PAC)-learning algorithms for episodic tasks over acyclic state spaces. The defining characteristic of our algorithms is that they take explicitly into consideration the acyclic structure of the underlying state space and the episodic nature of the considered learning task. The first of the above two attributes enables a very straightforward and efficient resolution of the ``exploration vs exploitation' dilemma, while the second provides a natural regenerating mechanism that is instrumental in the dynamics of our algorithms. Some additional characteristics that distinguish our algorithms from those developed in the past literature are (i) their direct nature, that eliminates the need of a complete specification of the underlying MDP model and reduces their execution to a very simple computation, and (ii) the unique emphasis that they place in the efficient implementation of the sampling process that is defined by their PAC property. More specifically, the aforementioned PAC-learning algorithms complete their learning task by implementing a systematic episodic sampling schedule on the underlying acyclic state space. This sampling schedule combined with the stochastic nature of the transitions taking place, define the need for efficient routing policies that will help the algorithms complete their exploration program while minimizing, in expectation, the number of executed episodes. The design of an optimal policy that will satisfy a specified pattern of arc visitation requirements in an acyclic stochastic graph, while minimizing the expected number of required episodes, is a challenging problem, even under the assumption that all the branching probabilities involved are known a priori. Hence, the sampling process that takes place in the proposed PAC-learning algorithms gives rise to a novel, very interesting stochastic control/scheduling problem, that is characterized as the problem of the Optimal Node Visitation (ONV) in acyclic stochastic digraphs. The second part of the work presented herein seeks the systematic modelling and analysis of the ONV problem. The last part of this research program explores the computational merits obtained by heuristical implementations that result from the integration of the ONV problem developments into the PAC-algorithms developed in the first part of this work. We study, through numerical experimentation, the relative performance of these resulting heuristical implementations in comparison to (i) the initial version of the PAC-learning algorithms, presented in the first part of the research program, and (ii) standard Q-learning algorithm variations provided in the RL literature. The work presented in this last part reinforces and confirms the driving assumption of this research, i.e., that one can design customized RL algorithms of enhanced performance if the underlying problem structure is taken into account.Ph.D.Committee Chair: Reveliotis, Spyros; Committee Member: Ayhan, Hayriye; Committee Member: Goldsman, Dave; Committee Member: Shamma, Jeff; Committee Member: Zwart, Ber

    A Markovian Perspective on the Classical Occupancy Problem with a Generalization to Pure Birth Processes

    Full text link
    We study the classical occupancy problem from the viewpoint of its embedding Markov chain. We derive new expressions for the probability mass function and (complementary) distribution function in generalized form. Furthermore, we derive a completely novel sparse bidiagonal system of recursion relations for the (complementary) distribution function and provide its efficient matrix implementation. Importantly, we generalize these results to the entire class of discrete-time pure birth processes.Comment: Master Thesis, Econometrics, 2020, Erasmus University Rotterda

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Global Constraint Catalog, 2nd Edition

    Get PDF
    This report presents a catalogue of global constraints where each constraint is explicitly described in terms of graph properties and/or automata and/or first order logical formulae with arithmetic. When available, it also presents some typical usage as well as some pointers to existing filtering algorithms

    Applications

    Get PDF
    Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications

    Global Constraint Catalog, 2nd Edition (revision a)

    Get PDF
    This report presents a catalogue of global constraints where each constraint is explicitly described in terms of graph properties and/or automata and/or first order logical formulae with arithmetic. When available, it also presents some typical usage as well as some pointers to existing filtering algorithms
    corecore