67 research outputs found
Statistical structures for internet-scale data management
Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability
Algorithmic Solutions for Combinatorial Problems in Resource Management of Manufacturing Environments
This thesis studies the use of heuristic algorithms in a number of combinatorial problems that occur in various resource constrained environments. Such problems occur, for example, in manufacturing, where a restricted number of resources (tools, machines, feeder slots) are needed to perform some operations. Many of these problems turn out to be computationally intractable, and heuristic algorithms are used to provide efficient, yet sub-optimal solutions. The main goal of the present study is to build upon existing methods to create new heuristics that provide improved solutions for some of these problems. All of these problems occur in practice, and one of the motivations of our study was the request for improvements from industrial sources. We approach three different resource constrained problems. The first is the tool switching and loading problem, and occurs especially in the assembly of printed circuit boards. This problem has to be solved when an efficient, yet small primary storage is used to access resources (tools) from a less efficient (but unlimited) secondary storage area. We study various forms of the problem and provide improved heuristics for its solution. Second, the nozzle assignment problem is concerned with selecting a suitable set of vacuum nozzles for the arms of a robotic assembly machine. It turns out that this is a specialized formulation of the MINMAX resource allocation formulation of the apportionment problem and it can be solved efficiently and optimally. We construct an exact algorithm specialized for the nozzle selection and provide a proof of its optimality. Third, the problem of feeder assignment and component tape construction occurs when electronic components are inserted and certain component types cause tape movement delays that can significantly impact the efficiency of printed circuit board assembly. Here, careful selection of component slots in the feeder improves the tape movement speed. We provide a formal proof that this problem is of the same complexity as the turnpike problem (a well studied geometric optimization problem), and provide a heuristic algorithm for this problem.Siirretty Doriast
Efficient pac-learning for episodic tasks with acyclic state spaces and the optimal node visitation problem in acyclic stochastic digaphs.
The first part of this research program concerns the development of customized and easily implementable Probably Approximately Correct (PAC)-learning algorithms for episodic tasks over acyclic state spaces. The defining characteristic of our algorithms is that they take explicitly into consideration the acyclic structure of the underlying state space and the episodic nature of the considered learning task. The first of the above two attributes enables a very straightforward and efficient resolution of the ``exploration vs exploitation' dilemma, while the second provides a natural regenerating mechanism that is instrumental in the dynamics of our algorithms. Some additional characteristics that distinguish our algorithms from those developed in the past literature are (i) their direct nature, that eliminates the need of a complete specification of the underlying MDP model and reduces their execution to a very simple computation, and (ii) the unique emphasis that they place in the efficient implementation of the sampling process that is defined by their PAC property.
More specifically, the aforementioned PAC-learning algorithms complete their learning task by implementing a systematic episodic sampling schedule on the underlying acyclic state space.
This sampling schedule combined with the stochastic nature of
the transitions taking place, define
the need for efficient routing policies that will help the
algorithms complete their exploration program while minimizing, in expectation, the
number of executed episodes.
The design of an optimal policy that
will satisfy a specified pattern of arc visitation requirements in
an acyclic stochastic graph, while minimizing the expected number
of required episodes, is a challenging problem, even under the
assumption that all the branching probabilities involved are known
a priori. Hence, the sampling process that takes place in
the proposed PAC-learning algorithms gives rise to a novel, very
interesting stochastic control/scheduling problem, that is
characterized as the problem of the Optimal Node Visitation
(ONV) in acyclic stochastic digraphs. The second part of the work presented herein seeks the systematic modelling and analysis of the ONV problem.
The last part of this research program explores the computational merits
obtained by heuristical implementations that result from the integration of
the ONV problem developments into the PAC-algorithms developed in the first part of this work. We study, through numerical experimentation, the relative performance of these resulting heuristical implementations in comparison to (i) the initial version of the PAC-learning algorithms, presented in the first part of the research program, and (ii) standard Q-learning algorithm variations provided in the RL literature. The work presented in this last part reinforces and confirms the driving assumption of this research, i.e., that one can design customized RL algorithms of enhanced performance if the underlying problem structure is taken into account.Ph.D.Committee Chair: Reveliotis, Spyros; Committee Member: Ayhan, Hayriye; Committee Member: Goldsman, Dave; Committee Member: Shamma, Jeff; Committee Member: Zwart, Ber
A Markovian Perspective on the Classical Occupancy Problem with a Generalization to Pure Birth Processes
We study the classical occupancy problem from the viewpoint of its embedding
Markov chain. We derive new expressions for the probability mass function and
(complementary) distribution function in generalized form. Furthermore, we
derive a completely novel sparse bidiagonal system of recursion relations for
the (complementary) distribution function and provide its efficient matrix
implementation. Importantly, we generalize these results to the entire class of
discrete-time pure birth processes.Comment: Master Thesis, Econometrics, 2020, Erasmus University Rotterda
Applications
Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
Global Constraint Catalog, 2nd Edition
This report presents a catalogue of global constraints where
each constraint is explicitly described in terms of graph properties and/or automata and/or first order logical formulae with arithmetic. When available, it also presents some typical usage as well as some pointers to existing
filtering algorithms
Applications
Volume 3 describes how resource-aware machine learning methods and techniques are used to successfully solve real-world problems. The book provides numerous specific application examples: in health and medicine for risk modelling, diagnosis, and treatment selection for diseases in electronics, steel production and milling for quality control during manufacturing processes in traffic, logistics for smart cities and for mobile communications
Global Constraint Catalog, 2nd Edition (revision a)
This report presents a catalogue of global constraints where
each constraint is explicitly described in terms of graph properties and/or automata and/or first order logical formulae with arithmetic. When available, it also presents some typical usage as well as some pointers to existing
filtering algorithms
- …