279 research outputs found
A Survey of Parallel Data Mining
With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons
learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms
Performance analysis of parallel branch and bound search with the hypercube architecture
With the availability of commercial parallel computers, researchers are examining new classes of problems which might benefit from parallel computing. This paper presents results of an investigation of the class of search intensive problems. The specific problem discussed is the Least-Cost Branch and Bound search method of deadline job scheduling. The object-oriented design methodology was used to map the problem into a parallel solution. While the initial design was good for a prototype, the best performance resulted from fine-tuning the algorithm for a specific computer. The experiments analyze the computation time, the speed up over a VAX 11/785, and the load balance of the problem when using loosely coupled multiprocessor system based on the hypercube architecture
Encapsulated search and constraint programming in Oz
Oz is an attempt to create a high-level concurrent programming language providing the problem solving capabilities of logic programming (i.e., constraints and search). Its computation model can be seen as a rather radical extension of the concurrent constraint model providing for higher-order programming, deep guards, state, and encapsulated search. This paper focuses on the most recent extension, a higher-order combinator providing for encapsulated search. The search combinator spawns a local computation space and resolves remaining choices by returning the alternatives as first-class citizens. The search combinator allows to program different search strategies, including depth-first, indeterministic one solution, demand-driven multiple solution, all solutions, and best solution (branch and bound) search. The paper also discusses the semantics of integer and finite domain constraints in a deep guard computation model
An Investigation in Efficient Spatial Patterns Mining
The technical progress in computerized spatial data acquisition and storage results
in the growth of vast spatial databases. Faced with large amounts of increasing spatial
data, a terminal user has more difficulty in understanding them without the helpful
knowledge from spatial databases. Thus, spatial data mining has been brought under
the umbrella of data mining and is attracting more attention.
Spatial data mining presents challenges. Differing from usual data, spatial data includes
not only positional data and attribute data, but also spatial relationships among
spatial events. Further, the instances of spatial events are embedded in a continuous
space and share a variety of spatial relationships, so the mining of spatial patterns demands
new techniques.
In this thesis, several contributions were made. Some new techniques were proposed,
i.e., fuzzy co-location mining, CPI-tree (Co-location Pattern Instance Tree),
maximal co-location patterns mining, AOI-ags (Attribute-Oriented Induction based on Attributes’
Generalization Sequences), and fuzzy association prediction. Three algorithms
were put forward on co-location patterns mining: the fuzzy co-location mining algorithm,
the CPI-tree based co-location mining algorithm (CPI-tree algorithm) and the orderclique-
based maximal prevalence co-location mining algorithm (order-clique-based algorithm).
An attribute-oriented induction algorithm based on attributes’ generalization sequences
(AOI-ags algorithm) is further given, which unified the attribute thresholds and
the tuple thresholds. On the two real-world databases with time-series data, a fuzzy association
prediction algorithm is designed. Also a cell-based spatial object fusion algorithm
is proposed. Two fuzzy clustering methods using domain knowledge were proposed:
Natural Method and Graph-Based Method, both of which were controlled by a
threshold. The threshold was confirmed by polynomial regression. Finally, a prototype
system on spatial co-location patterns’ mining was developed, and shows the relative
efficiencies of the co-location techniques proposed
The techniques presented in the thesis focus on improving the feasibility, usefulness,
effectiveness, and scalability of related algorithm. In the design of fuzzy co-location
Abstract
mining algorithm, a new data structure, the binary partition tree, used to improve the
process of fuzzy equivalence partitioning, was proposed. A prefix-based approach to
partition the prevalent event set search space into subsets, where each sub-problem can
be solved in main-memory, was also presented. The scalability of CPI-tree algorithm is
guaranteed since it does not require expensive spatial joins or instance joins for identifying
co-location table instances. In the order-clique-based algorithm, the co-location table
instances do not need be stored after computing the Pi value of corresponding colocation,
which dramatically reduces the executive time and space of mining maximal colocations.
Some technologies, for example, partitions, equivalence partition trees, prune
optimization strategies and interestingness, were used to improve the efficiency of the
AOI-ags algorithm. To implement the fuzzy association prediction algorithm, the “growing
window” and the proximity computation pruning were introduced to reduce both I/O and
CPU costs in computing the fuzzy semantic proximity between time-series.
For new techniques and algorithms, theoretical analysis and experimental results
on synthetic data sets and real-world datasets were presented and discussed in the thesis
Routing optimization algorithms in integrated fronthaul/backhaul networks supporting multitenancy
Mención Internacional en el título de doctorEsta tesis pretende ayudar en la definición y el diseño de la quinta generación de
redes de telecomunicaciones (5G) a través del modelado matemático de las diferentes
cualidades que las caracterizan. En general, la ambición de estos modelos es realizar
una optimización de las redes, ensalzando sus capacidades recientemente adquiridas para
mejorar la eficiencia de los futuros despliegues tanto para los usuarios como para los
operadores. El periodo de realización de esta tesis se corresponde con el periodo de
investigación y definición de las redes 5G, y, por lo tanto, en paralelo y en el contexto
de varios proyectos europeos del programa H2020. Por lo tanto, las diferentes partes
del trabajo presentado en este documento cuadran y ofrecen una solución a diferentes
retos que han ido apareciendo durante la definición del 5G y dentro del ámbito de estos
proyectos, considerando los comentarios y problemas desde el punto de vista de todos los
usuarios finales, operadores y proveedores.
Así, el primer reto a considerar se centra en el núcleo de la red, en particular en
cómo integrar tráfico fronthaul y backhaul en el mismo estrato de transporte. La solución
propuesta es un marco de optimización para el enrutado y la colocación de recursos que
ha sido desarrollado teniendo en cuenta restricciones de retardo, capacidad y caminos,
maximizando el grado de despliegue de Unidades Distribuidas (DU) mientras se minimizan
los agregados de las Unidades Centrales (CU) que las soportan. El marco y los algoritmos
heurísticos desarrollados (para reducir la complexidad computacional) son validados y
aplicados a redes tanto a pequeña como a gran (nivel de producción) escala. Esto los
hace útiles para los operadores de redes tanto para la planificación de la red como para
el ajuste dinámico de las operaciones de red en su infraestructura (virtualizada).
Moviéndonos más cerca de los usuarios, el segundo reto considerado se centra en
la colocación de servicios en entornos de nube y borde (cloud/edge). En particular, el
problema considerado consiste en seleccionar la mejor localización para cada función
de red virtual (VNF) que compone un servicio en entornos de robots en la nube, que
implica restricciones estrictas en las cotas de retardo y fiabilidad. Los robots, vehículos y
otros dispositivos finales proveen competencias significativas como impulsores, sensores y
computación local que son esenciales para algunos servicios. Por contra, estos dispositivos
están en continuo movimiento y pueden perder la conexión con la red o quedarse sin batería, cosa que reta aún más la entrega de servicios en este entorno dinámico. Así, el
análisis realizado y la solución propuesta abordan las restricciones de movilidad y batería.
Además, también se necesita tener en cuenta los aspectos temporales y los objetivos
conflictivos de fiabilidad y baja latencia en el despliegue de servicios en una red volátil,
donde los nodos de cómputo móviles actúan como una extensión de la infraestructura
de cómputo de la nube y el borde. El problema se formula como un problema de
optimización para colocación de VNFs minimizando el coste y también se propone un
heurístico eficiente. Los algoritmos son evaluados de forma extensiva desde varios aspectos
por simulación en escenarios que reflejan la realidad de forma detallada.
Finalmente, el último reto analizado se centra en dar soporte a servicios basados en
el borde, en particular, aprendizaje automático (ML) en escenarios del Internet de las
Cosas (IoT) distribuidos. El enfoque tradicional al ML distribuido se centra en adaptar
los algoritmos de aprendizaje a la red, por ejemplo, reduciendo las actualizaciones para
frenar la sobrecarga. Las redes basadas en el borde inteligente, en cambio, hacen posible
seguir un enfoque opuesto, es decir, definir la topología de red lógica alrededor de la
tarea de aprendizaje a realizar, para así alcanzar el resultado de aprendizaje deseado.
La solución propuesta incluye un modelo de sistema que captura dichos aspectos en
el contexto de ML supervisado, teniendo en cuenta tanto nodos de aprendizaje (que
realizan las computaciones) como nodos de información (que proveen datos). El problema
se formula para seleccionar (i) qué nodos de aprendizaje e información deben cooperar
para completar la tarea de aprendizaje, y (ii) el número de iteraciones a realizar, para
minimizar el coste de aprendizaje mientras se garantizan los objetivos de error predictivo y
tiempo de ejecución. La solución también incluye un algoritmo heurístico que es evaluado
ensalzando una topología de red real y considerando tanto las tareas de clasificación
como de regresión, y cuya solución se acerca mucho al óptimo, superando las soluciones
alternativas encontradas en la literatura.This thesis aims to help in the definition and design of the 5th generation of
telecommunications networks (5G) by modelling the different features that characterize
them through several mathematical models. Overall, the aim of these models is to perform
a wide optimization of the network elements, leveraging their newly-acquired capabilities
in order to improve the efficiency of the future deployments both for the users and the
operators. The timeline of this thesis corresponds to the timeline of the research and
definition of 5G networks, and thus in parallel and in the context of several European
H2020 programs. Hence, the different parts of the work presented in this document
match and provide a solution to different challenges that have been appearing during
the definition of 5G and within the scope of those projects, considering the feedback and
problems from the point of view of all the end users, operators and providers.
Thus, the first challenge to be considered focuses on the core network, in particular
on how to integrate fronthaul and backhaul traffic over the same transport stratum.
The solution proposed is an optimization framework for routing and resource placement
that has been developed taking into account delay, capacity and path constraints,
maximizing the degree of Distributed Unit (DU) deployment while minimizing the
supporting Central Unit (CU) pools. The framework and the developed heuristics (to
reduce the computational complexity) are validated and applied to both small and largescale
(production-level) networks. They can be useful to network operators for both
network planning as well as network operation adjusting their (virtualized) infrastructure
dynamically.
Moving closer to the user side, the second challenge considered focuses on the
allocation of services in cloud/edge environments. In particular, the problem tackled
consists of selecting the best the location of each Virtual Network Function (VNF)
that compose a service in cloud robotics environments, that imply strict delay bounds
and reliability constraints. Robots, vehicles and other end-devices provide significant
capabilities such as actuators, sensors and local computation which are essential for some
services. On the negative side, these devices are continuously on the move and might
lose network connection or run out of battery, which further challenge service delivery in
this dynamic environment. Thus, the performed analysis and proposed solution tackle the mobility and battery restrictions. We further need to account for the temporal aspects and
conflicting goals of reliable, low latency service deployment over a volatile network, where
mobile compute nodes act as an extension of the cloud and edge computing infrastructure.
The problem is formulated as a cost-minimizing VNF placement optimization and an
efficient heuristic is proposed. The algorithms are extensively evaluated from various
aspects by simulation on detailed real-world scenarios.
Finally, the last challenge analyzed focuses on supporting edge-based services, in
particular, Machine Learning (ML) in distributed Internet of Things (IoT) scenarios. The
traditional approach to distributed ML is to adapt learning algorithms to the network, e.g.,
reducing updates to curb overhead. Networks based on intelligent edge, instead, make
it possible to follow the opposite approach, i.e., to define the logical network topology
around the learning task to perform, so as to meet the desired learning performance.
The proposed solution includes a system model that captures such aspects in the context
of supervised ML, accounting for both learning nodes (that perform computations) and
information nodes (that provide data). The problem is formulated to select (i) which
learning and information nodes should cooperate to complete the learning task, and (ii)
the number of iterations to perform, in order to minimize the learning cost while meeting
the target prediction error and execution time. The solution also includes an heuristic
algorithm that is evaluated leveraging a real-world network topology and considering
both classification and regression tasks, and closely matches the optimum, outperforming
state-of-the-art alternatives.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Pablo Serrano Yáñez-Mingot.- Secretario: Andrés García Saavedra.- Vocal: Luca Valcarengh
Strategic directions in constraint programming
An abstract is not available
- …