40 research outputs found
Novel Techniques for Automated Dental Identification
Automated dental identification is one of the best candidates for postmortem identification. With the large number of victims encountered in mass disasters, automating the process of postmortem identification is receiving an increased attention. This dissertation introduces new approaches for different stages of Automated Dental Identification system: These stages include segmentations, classification, labeling, and matching:;We modified the seam carving technique to adapt the problem of segmenting dental image records into individual teeth. We propose a two-stage teeth segmentation approach for segmenting the dental images. In the first stage, the teeth images are preprocessed by a two-step thresholding technique, which starts with an iterative thresholding followed by an adaptive thresholding to binarize the teeth images. In the second stage, we adapt the seam carving technique on the binary images, using both horizontal and vertical seams, to separate each individual tooth. We have obtained an optimality rate of 54.02% for the bitewing type images, which is superior to all existing fully automated dental segmentation algorithms in the literature, and a failure rate of 1.05%. For the periapical type images, we have obtained a high optimality rate of 58.13% and a low failure rate of 0.74 which also surpasses the performance of existing techniques. An important problem in automated dental identification is automatic classification of teeth into four classes (molars, premolars, canines, and incisors). A dental chart is a key to avoiding illogical comparisons that inefficiently consume the limited computational resources, and may mislead decision-making. We tackle this composite problem using a two-stage approach. The first stage, utilizes low computational-cost, appearance-based features, using Orthogonal Locality Preserving Projections (OLPP) for assigning an initial class. The second stage applies a string matching technique, based on teeth neighborhood rules, to validate initial teeth-classes and hence to assign each tooth a number corresponding to its location in the dental chart, even in the presence of a missed tooth. The experimental results of teeth classification show that on a large dataset of bitewing and periapical films, the proposed approach achieves overall classification accuracy of 77% and teeth class validation enhances the overall teeth classification accuracy to 87% which is slightly better than the performance obtained from previous methods based on EigenTeeth the performance of which is 75% and 86%, respectively.;We present a new technique that searches the dental database to find a candidate list. We use dental records of the FBI\u27s Criminal Justice Service (CJIC) ADIS database, that contains 104 records (about 500 bitewing and periapical films) involving more than 2000 teeth, 47 Antemortem (AM) records and 57 Postmortem (PM) records with 20 matched records.;The proposed approach consists of two main stages, the first stage is to preprocess the dental records (segmentation and teeth labeling classification) in order to get a reliable, appearance-based, low computational-cost feature. In the second stage, we developed a technique based on LaplacianTeeth using OLPP algorithm to produce a candidate list. The proposed technique can correctly retrieve the dental records 65% in the 5 top ranks while the method based on EigenTeeth remains at 60%. The proposed approach takes about 0.17 seconds to make record to record comparison while the other method based on EigenTeeth takes about 0.09 seconds.;Finally, we address the teeth matching problem by presenting a new technique for dental record retrieval. The technique is based on the matching of the Scale Invariant feature Transform (SIFT) descriptors guided by the teeth contour between the subject and reference dental records. Our fundamental objective is to accomplish a relatively short match list, with a high probability of having the correct match reference. The proposed technique correctly retrieves the dental records with performance rates of 35% and 75% in the 1 and 5 top ranks respectively, and takes only an average time of 4.18 minutes to retrieve a match list. This compares favorably with the existing technique shape-based (edge direction histogram) method which has the performance rates of 29% and 46% in the 1 and 5 top ranks respectively.;In summary, the proposed ADIS system accurately retrieves the dental record with an overall rate of 80% in top 5 ranks when a candidate list of 20 is used (from potential match search) whereas a candidate size of 10 yields an overall rate of 84% in top 5 ranks and takes only a few minutes to search the database, which compares favorably against most of the existing methods in the literature, when both accuracy and computational complexity are considered
Top-K link recommendation for development of P2P social networks
Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2014.Thesis (Master's) -- Bilkent University, 2014.Includes bibliographical references leaves 45-48.The common approach for implementing social networks has been using centralized
infrastructures, which inherently include problems of privacy, censorship,
scalability, and fault-tolerance. Although decentralized systems offer a natural
solution, significant research is needed to build an end-to-end peer-to-peer social
network where data is stored among trusted users. The centralized algorithms
need to be revisited for a P2P setting, where the nodes have connectivity to only
neighbors, have no information of global topology, and may go offline and churn
resulting in changes of the graph structure. The social graph algorithms should
be designed as robust to node failures and network changes. We model P2P social
networks as uncertain graphs where each node can go offline, and we introduce
link recommendation algorithms that support the development of decentralized
social networks. We propose methods to recommend top-k links to improve the
underlying topology and efficiency of the overlay network, while preserving the
locality of the social structure. Our approach aims to optimize the probabilistic
reachability, improve the robustness of the local network and avoid loss from failures
of the peers. We model the problem through discrete optimization and assign
a score to each node to capture both the topological connectivity and the social
centrality of the corresponding node. We evaluate the proposed methods with
respect to performance and quality measures developed for P2P social networks.Aytaş, YusufM.S
Smooth Scan: Statistics-Oblivious Access Paths
Query optimizers depend heavily on statistics representing column distributions to create efficient query plans. In many cases, though, statistics are outdated or non-existent, and the process of refreshing statistics is very expensive, especially for ad-hoc workloads on ever bigger data. This results in suboptimal plans that severely hurt performance. The main problem is that any decision, once made by the optimizer, is fixed throughout the execution of a query. In particular, each logical operator translates into a fixed choice of a physical operator at run-time. In this paper we advocate for continuous adaptation and morphing of physical operators throughout their lifetime, by adjusting their behavior in accordance with the statistical properties of the data. We demonstrate the benefits of the new paradigm by designing and implementing an adaptive access path operator called Smooth Scan, which morphs continuously within the space of traditional index access and full table scan. Smooth Scan behaves similarly to an index scan for low selectivity; if selectivity increases, however, Smooth Scan progressively morphs its behavior toward a sequential scan. As a result, a system with Smooth Scan requires no optimization decisions up front nor does it need accurate statistics to provide good performance. We implement Smooth Scan in PostgreSQL and, using both synthetic benchmarks as well as TPC-H, we show that it achieves robust performance while at the same time being statistics-oblivious
Benchmark methodologies for the optimized physical synthesis of RISC-V microprocessors
As technology continues to advance and chip sizes shrink, the complexity and design time required for integrated circuits have significantly increased. To address these challenges, Electronic Design Automation (EDA) tools have been introduced to streamline the design flow. These tools offer various methodologies and options to optimize power, performance, and chip area. However, selecting the most suitable methods from these options can be challenging, as they may lead to trade-offs among power, performance, and area. While architectural and Register Transfer Level (RTL) optimizations have been extensively studied in existing literature, the impact of optimization methods available in EDA tools on performance has not been thoroughly researched. This thesis aims to optimize a semiconductor processor through EDA tools within the physical synthesis domain to achieve increased performance while maintaining a balance between power efficiency and area utilization. By leveraging floorplanning tools and carefully selecting technology libraries and optimization options, the CV32E40P open-source processor is subjected to various floorplans to analyze their impact on chip performance. The employed techniques, including multibit components prefer option, multiplexer tree prefer option, identification and exclusion of problematic cells, and placement blockages, lead to significant improvements in cell density, congestion mitigation, and timing. The optimized synthesis results demonstrate a 71\% enhancement in chip design performance without a substantial increase in area, showcasing the effectiveness of these techniques in improving large-scale integrated circuits' performance, efficiency, and manufacturability. By exploring and implementing the available options in EDA tools, this study demonstrates how the processor's performance can be significantly improved while maintaining a balanced and efficient chip design. The findings contribute valuable insights to the field of electronic design automation, offering guidance to designers in selecting suitable methodologies for optimizing processors and other integrated circuits
Automatic synthesis and optimization of chip multiprocessors
The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed
Routing optimization algorithms in integrated fronthaul/backhaul networks supporting multitenancy
Mención Internacional en el título de doctorEsta tesis pretende ayudar en la definición y el diseño de la quinta generación de
redes de telecomunicaciones (5G) a través del modelado matemático de las diferentes
cualidades que las caracterizan. En general, la ambición de estos modelos es realizar
una optimización de las redes, ensalzando sus capacidades recientemente adquiridas para
mejorar la eficiencia de los futuros despliegues tanto para los usuarios como para los
operadores. El periodo de realización de esta tesis se corresponde con el periodo de
investigación y definición de las redes 5G, y, por lo tanto, en paralelo y en el contexto
de varios proyectos europeos del programa H2020. Por lo tanto, las diferentes partes
del trabajo presentado en este documento cuadran y ofrecen una solución a diferentes
retos que han ido apareciendo durante la definición del 5G y dentro del ámbito de estos
proyectos, considerando los comentarios y problemas desde el punto de vista de todos los
usuarios finales, operadores y proveedores.
Así, el primer reto a considerar se centra en el núcleo de la red, en particular en
cómo integrar tráfico fronthaul y backhaul en el mismo estrato de transporte. La solución
propuesta es un marco de optimización para el enrutado y la colocación de recursos que
ha sido desarrollado teniendo en cuenta restricciones de retardo, capacidad y caminos,
maximizando el grado de despliegue de Unidades Distribuidas (DU) mientras se minimizan
los agregados de las Unidades Centrales (CU) que las soportan. El marco y los algoritmos
heurísticos desarrollados (para reducir la complexidad computacional) son validados y
aplicados a redes tanto a pequeña como a gran (nivel de producción) escala. Esto los
hace útiles para los operadores de redes tanto para la planificación de la red como para
el ajuste dinámico de las operaciones de red en su infraestructura (virtualizada).
Moviéndonos más cerca de los usuarios, el segundo reto considerado se centra en
la colocación de servicios en entornos de nube y borde (cloud/edge). En particular, el
problema considerado consiste en seleccionar la mejor localización para cada función
de red virtual (VNF) que compone un servicio en entornos de robots en la nube, que
implica restricciones estrictas en las cotas de retardo y fiabilidad. Los robots, vehículos y
otros dispositivos finales proveen competencias significativas como impulsores, sensores y
computación local que son esenciales para algunos servicios. Por contra, estos dispositivos
están en continuo movimiento y pueden perder la conexión con la red o quedarse sin batería, cosa que reta aún más la entrega de servicios en este entorno dinámico. Así, el
análisis realizado y la solución propuesta abordan las restricciones de movilidad y batería.
Además, también se necesita tener en cuenta los aspectos temporales y los objetivos
conflictivos de fiabilidad y baja latencia en el despliegue de servicios en una red volátil,
donde los nodos de cómputo móviles actúan como una extensión de la infraestructura
de cómputo de la nube y el borde. El problema se formula como un problema de
optimización para colocación de VNFs minimizando el coste y también se propone un
heurístico eficiente. Los algoritmos son evaluados de forma extensiva desde varios aspectos
por simulación en escenarios que reflejan la realidad de forma detallada.
Finalmente, el último reto analizado se centra en dar soporte a servicios basados en
el borde, en particular, aprendizaje automático (ML) en escenarios del Internet de las
Cosas (IoT) distribuidos. El enfoque tradicional al ML distribuido se centra en adaptar
los algoritmos de aprendizaje a la red, por ejemplo, reduciendo las actualizaciones para
frenar la sobrecarga. Las redes basadas en el borde inteligente, en cambio, hacen posible
seguir un enfoque opuesto, es decir, definir la topología de red lógica alrededor de la
tarea de aprendizaje a realizar, para así alcanzar el resultado de aprendizaje deseado.
La solución propuesta incluye un modelo de sistema que captura dichos aspectos en
el contexto de ML supervisado, teniendo en cuenta tanto nodos de aprendizaje (que
realizan las computaciones) como nodos de información (que proveen datos). El problema
se formula para seleccionar (i) qué nodos de aprendizaje e información deben cooperar
para completar la tarea de aprendizaje, y (ii) el número de iteraciones a realizar, para
minimizar el coste de aprendizaje mientras se garantizan los objetivos de error predictivo y
tiempo de ejecución. La solución también incluye un algoritmo heurístico que es evaluado
ensalzando una topología de red real y considerando tanto las tareas de clasificación
como de regresión, y cuya solución se acerca mucho al óptimo, superando las soluciones
alternativas encontradas en la literatura.This thesis aims to help in the definition and design of the 5th generation of
telecommunications networks (5G) by modelling the different features that characterize
them through several mathematical models. Overall, the aim of these models is to perform
a wide optimization of the network elements, leveraging their newly-acquired capabilities
in order to improve the efficiency of the future deployments both for the users and the
operators. The timeline of this thesis corresponds to the timeline of the research and
definition of 5G networks, and thus in parallel and in the context of several European
H2020 programs. Hence, the different parts of the work presented in this document
match and provide a solution to different challenges that have been appearing during
the definition of 5G and within the scope of those projects, considering the feedback and
problems from the point of view of all the end users, operators and providers.
Thus, the first challenge to be considered focuses on the core network, in particular
on how to integrate fronthaul and backhaul traffic over the same transport stratum.
The solution proposed is an optimization framework for routing and resource placement
that has been developed taking into account delay, capacity and path constraints,
maximizing the degree of Distributed Unit (DU) deployment while minimizing the
supporting Central Unit (CU) pools. The framework and the developed heuristics (to
reduce the computational complexity) are validated and applied to both small and largescale
(production-level) networks. They can be useful to network operators for both
network planning as well as network operation adjusting their (virtualized) infrastructure
dynamically.
Moving closer to the user side, the second challenge considered focuses on the
allocation of services in cloud/edge environments. In particular, the problem tackled
consists of selecting the best the location of each Virtual Network Function (VNF)
that compose a service in cloud robotics environments, that imply strict delay bounds
and reliability constraints. Robots, vehicles and other end-devices provide significant
capabilities such as actuators, sensors and local computation which are essential for some
services. On the negative side, these devices are continuously on the move and might
lose network connection or run out of battery, which further challenge service delivery in
this dynamic environment. Thus, the performed analysis and proposed solution tackle the mobility and battery restrictions. We further need to account for the temporal aspects and
conflicting goals of reliable, low latency service deployment over a volatile network, where
mobile compute nodes act as an extension of the cloud and edge computing infrastructure.
The problem is formulated as a cost-minimizing VNF placement optimization and an
efficient heuristic is proposed. The algorithms are extensively evaluated from various
aspects by simulation on detailed real-world scenarios.
Finally, the last challenge analyzed focuses on supporting edge-based services, in
particular, Machine Learning (ML) in distributed Internet of Things (IoT) scenarios. The
traditional approach to distributed ML is to adapt learning algorithms to the network, e.g.,
reducing updates to curb overhead. Networks based on intelligent edge, instead, make
it possible to follow the opposite approach, i.e., to define the logical network topology
around the learning task to perform, so as to meet the desired learning performance.
The proposed solution includes a system model that captures such aspects in the context
of supervised ML, accounting for both learning nodes (that perform computations) and
information nodes (that provide data). The problem is formulated to select (i) which
learning and information nodes should cooperate to complete the learning task, and (ii)
the number of iterations to perform, in order to minimize the learning cost while meeting
the target prediction error and execution time. The solution also includes an heuristic
algorithm that is evaluated leveraging a real-world network topology and considering
both classification and regression tasks, and closely matches the optimum, outperforming
state-of-the-art alternatives.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Pablo Serrano Yáñez-Mingot.- Secretario: Andrés García Saavedra.- Vocal: Luca Valcarengh
Integrated Software Synthesis for Signal Processing Applications
Signal processing applications usually encounter multi-dimensional real-time performance requirements and restrictions on resources, which makes software implementation complex. Although major advances have been made in embedded processor technology for this application domain -- in particular, in technology for programmable digital signal processors -- traditional compiler techniques applied to such platforms do not generate machine code of desired quality. As a result, low-level, human-driven fine tuning of software implementations is needed, and we are therefore in need of more effective strategies for software implementation for signal processing applications.
In this thesis, a number of important memory and performance optimization problems are addressed for translating high-level representations of signal processing applications into embedded software implementations. This investigation centers around signal processing-oriented dataflow models of computation. This form of dataflow provides a coarse grained modeling approach that is well-suited to the signal processing domain and is increasingly supported by commercial and research-oriented tools for design and implementation of signal processing systems.
Well-developed dataflow models of signal processing systems expose high-level application structure that can be used by designers and design tools to guide optimization of hardware and software implementations. This thesis advances the suite of techniques available for optimization of software implementations that are derived from the application structure exposed from dataflow representations. In addition, the specialized architecture of programmable digital signal processors is considered jointly with dataflow-based analysis to streamline the optimization process for this important family of embedded processors. The specialized features of programmable digital signal processors that are addressed in this thesis include parallel memory banks to facilitate data parallelism, and signal-processing-oriented addressing modes and address register management capabilities.
The problems addressed in this thesis involve several inter-related features, and therefore an integrated approach is required to solve them effectively. This thesis proposes such an integrated approach, and develops the approach through formal problem formulations, in-depth theoretical analysis, and extensive experimentation
Resource and thermal management in 3D-stacked multi-/many-core systems
Continuous semiconductor technology scaling and the rapid increase in computational needs have stimulated the emergence of multi-/many-core processors. While up to hundreds of cores can be placed on a single chip, the performance capacity of the cores cannot be fully exploited due to high latencies of interconnects and memory, high power consumption, and low manufacturing yield in traditional (2D) chips. 3D stacking is an emerging technology that aims to overcome these limitations of 2D designs by stacking processor dies over each other and using through-silicon-vias (TSVs) for on-chip communication, and thus, provides a large amount of on-chip resources and shortens communication latency. These benefits, however, are limited by challenges in high power densities and temperatures.
3D stacking also enables integrating heterogeneous technologies into a single chip. One example of heterogeneous integration is building many-core systems with silicon-photonic network-on-chip (PNoC), which reduces on-chip communication latency significantly and provides higher bandwidth compared to electrical links. However, silicon-photonic links are vulnerable to on-chip thermal and process variations. These variations can be countered by actively tuning the temperatures of optical devices through micro-heaters, but at the cost of substantial power overhead.
This thesis claims that unearthing the energy efficiency potential of 3D-stacked systems requires intelligent and application-aware resource management. Specifically, the thesis improves energy efficiency of 3D-stacked systems via three major components of computing systems: cache, memory, and on-chip communication. We analyze characteristics of workloads in computation, memory usage, and communication, and present techniques that leverage these characteristics for energy-efficient computing.
This thesis introduces 3D cache resource pooling, a cache design that allows for flexible heterogeneity in cache configuration across a 3D-stacked system and improves cache utilization and system energy efficiency. We also demonstrate the impact of resource pooling on a real prototype 3D system with scratchpad memory.
At the main memory level, we claim that utilizing heterogeneous memory modules and memory object level management significantly helps with energy efficiency. This thesis proposes a memory management scheme at a finer granularity: memory object level, and a page allocation policy to leverage the heterogeneity of available memory modules and cater to the diverse memory requirements of workloads.
On the on-chip communication side, we introduce an approach to limit the power overhead of PNoC in (3D) many-core systems through cross-layer thermal management. Our proposed thermally-aware workload allocation policies coupled with an adaptive thermal tuning policy minimize the required thermal tuning power for PNoC, and in this way, help broader integration of PNoC. The thesis also introduces techniques in placement and floorplanning of optical devices to reduce optical loss and, thus, laser source power consumption.2018-03-09T00:00:00
Smooth Scan: Robust Query Execution with a Statistics-oblivious Access Operator
Query optimizers depend heavily on statistics representing column distributions to create efficient query plans. In many cases, though, statistics are outdated or non-existent, and the process of refreshing statistics is very expensive, especially for ad-hoc workloads on ever bigger data. This results in suboptimal plans that severely hurt performance. The main problem is that any decision, once made by the optimizer, is fixed throughout the execution of a query. In particular, each logical operator translates into a fixed choice of a physical operator at run-time. In this paper we advocate for continuous adaptation and morphing of physical operators throughout their lifetime, by adjusting their behavior in accordance with the statistical properties of the data. We demonstrate the benefits of the new paradigm by designing and implementing an adaptive access path operator called Smooth Scan, which morphs continuously within the space of traditional index access and full table scan. Smooth Scan behaves similarly to an index scan for low selectivity; if selectivity increases, however, Smooth Scan progressively morphs its behavior toward a sequential scan. As a result, a system with Smooth Scan requires no access path decisions up front nor does it need accurate statistics to provide good performance. We implement Smooth Scan in PostgreSQL and, using both synthetic benchmarks as well as TPC-H, we show that it achieves robust performance while at the same time being statistics-oblivious