225 research outputs found

    Spectral-spatial classification of n-dimensional images in real-time based on segmentation and mathematical morphology on GPUs

    Get PDF
    The objective of this thesis is to develop efficient schemes for spectral-spatial n-dimensional image classification. By efficient schemes, we mean schemes that produce good classification results in terms of accuracy, as well as schemes that can be executed in real-time on low-cost computing infrastructures, such as the Graphics Processing Units (GPUs) shipped in personal computers. The n-dimensional images include images with two and three dimensions, such as images coming from the medical domain, and also images ranging from ten to hundreds of dimensions, such as the multiand hyperspectral images acquired in remote sensing. In image analysis, classification is a regularly used method for information retrieval in areas such as medical diagnosis, surveillance, manufacturing and remote sensing, among others. In addition, as the hyperspectral images have been widely available in recent years owing to the reduction in the size and cost of the sensors, the number of applications at lab scale, such as food quality control, art forgery detection, disease diagnosis and forensics has also increased. Although there are many spectral-spatial classification schemes, most are computationally inefficient in terms of execution time. In addition, the need for efficient computation on low-cost computing infrastructures is increasing in line with the incorporation of technology into everyday applications. In this thesis we have proposed two spectral-spatial classification schemes: one based on segmentation and other based on wavelets and mathematical morphology. These schemes were designed with the aim of producing good classification results and they perform better than other schemes found in the literature based on segmentation and mathematical morphology in terms of accuracy. Additionally, it was necessary to develop techniques and strategies for efficient GPU computing, for example, a block–asynchronous strategy, resulting in an efficient implementation on GPU of the aforementioned spectral-spatial classification schemes. The optimal GPU parameters were analyzed and different data partitioning and thread block arrangements were studied to exploit the GPU resources. The results show that the GPU is an adequate computing platform for on-board processing of hyperspectral information

    Towards Intelligent Runtime Framework for Distributed Heterogeneous Systems

    Get PDF
    Scientific applications strive for increased memory and computing performance, requiring massive amounts of data and time to produce results. Applications utilize large-scale, parallel computing platforms with advanced architectures to accommodate their needs. However, developing performance-portable applications for modern, heterogeneous platforms requires lots of effort and expertise in both the application and systems domains. This is more relevant for unstructured applications whose workflow is not statically predictable due to their heavily data-dependent nature. One possible solution for this problem is the introduction of an intelligent Domain-Specific Language (iDSL) that transparently helps to maintain correctness, hides the idiosyncrasies of lowlevel hardware, and scales applications. An iDSL includes domain-specific language constructs, a compilation toolchain, and a runtime providing task scheduling, data placement, and workload balancing across and within heterogeneous nodes. In this work, we focus on the runtime framework. We introduce a novel design and extension of a runtime framework, the Parallel Runtime Environment for Multicore Applications. In response to the ever-increasing intra/inter-node concurrency, the runtime system supports efficient task scheduling and workload balancing at both levels while allowing the development of custom policies. Moreover, the new framework provides abstractions supporting the utilization of heterogeneous distributed nodes consisting of CPUs and GPUs and is extensible to other devices. We demonstrate that by utilizing this work, an application (or the iDSL) can scale its performance on heterogeneous exascale-era supercomputers with minimal effort. A future goal for this framework (out of the scope of this thesis) is to be integrated with machine learning to improve its decision-making and performance further. As a bridge to this goal, since the framework is under development, we experiment with data from Nuclear Physics Particle Accelerators and demonstrate the significant improvements achieved by utilizing machine learning in the hit-based track reconstruction process

    Methods to enhance content distribution for very large scale online communities

    Get PDF
    The Internet has experienced an exponential growth in the last years, and its number of users far from decay keeps on growing. Popular Web 2.0 services such as Facebook, YouTube or Twitter among others sum millions of users and employ vast infrastructures deployed worldwide. The size of these infrastructures is getting huge in order to support such a massive number of users. This increment of the infrastructure size has brought new problems regarding scalability, power consumption, cooling, hardware lifetime, underutilization, investment recovery, etc. Owning this kind of infrastructures is not always affordable nor convenient. This could be a major handicap for starting projects with a humble budget whose success is based on reaching a large audience. However, current technologies might permit to deploy vast infrastructures reducing their cost. We refer to peer-to-peer networks and cloud computing. Peer-to-peer systems permit users to yield their own resources to distributed infrastructures. These systems have demonstrated to be a valuable choice capable of distributing vast amounts of data to large audiences with a minimal starting infrastructure. Nevertheless, aspects such as content availability cannot be controlled in these systems, whereas classic server infrastructures can improve this aspect. In the recent time, the cloud has been revealed as a promising paradigm for hosting horizontally scalable Web systems. The cloud offers elastic capabilities that permit to save costs by adapting the number of resources to the incoming demand. Additionally, the cloud makes accessible a vast amount of resources that may be employed on peak workloads. However, how to determine the amount of resources to use remains a challenge. In this thesis, we describe a hierarchical architecture that combines both: peer-to-peer and elastic server infrastructures in order to enhance content distribution. The peer-topeer infrastructure brings a scalable solution that reduces the workload in the servers, while the server infrastructure assures availability and reduces costs varying its size when necessary. We propose a distributed collaborative caching infrastructure that employs a clusterbased locality-aware self-organizing P2P system. This system, leverages collaborative data classification in order to improve content locality. Our evaluation demonstrates that incrementing data locality permits to improve data search while reducing traffic. We explore the utilization of elastic server infrastructures addressing three issues: system sizing, data grouping and content distribution. We propose novel multi-model techniques for hierarchical workload prediction. These predictions are employed to determine the system size and request distribution policies. Additionally, we propose novel techniques for adaptive control that permit to identify inaccurate models and redefine them. Our evaluation using traces extracted from real systems indicate that the utilization of a hierarchy of multiple models increases prediction accuracy. This hierarchy in conjunction with our adaptive control techniques increments the accuracy during unexpected workload variations. Finally, we demonstrate that locality-aware request distribution policies can take advantage of prediction models to adequate content distribution independently of the system size

    Timely processing of big data in collaborative large-scale distributed systems

    Get PDF
    Today’s Big Data phenomenon, characterized by huge volumes of data produced at very high rates by heterogeneous and geographically dispersed sources, is fostering the employment of large-scale distributed systems in order to leverage parallelism, fault tolerance and locality awareness with the aim of delivering suitable performances. Among the several areas where Big Data is gaining increasing significance, the protection of Critical Infrastructure is one of the most strategic since it impacts on the stability and safety of entire countries. Intrusion detection mechanisms can benefit a lot from novel Big Data technologies because these allow to exploit much more information in order to sharpen the accuracy of threats discovery. A key aspect for increasing even more the amount of data at disposal for detection purposes is the collaboration (meant as information sharing) among distinct actors that share the common goal of maximizing the chances to recognize malicious activities earlier. Indeed, if an agreement can be found to share their data, they all have the possibility to definitely improve their cyber defenses. The abstraction of Semantic Room (SR) allows interested parties to form trusted and contractually regulated federations, the Semantic Rooms, for the sake of secure information sharing and processing. Another crucial point for the effectiveness of cyber protection mechanisms is the timeliness of the detection, because the sooner a threat is identified, the faster proper countermeasures can be put in place so as to confine any damage. Within this context, the contributions reported in this thesis are threefold * As a case study to show how collaboration can enhance the efficacy of security tools, we developed a novel algorithm for the detection of stealthy port scans, named R-SYN (Ranked SYN port scan detection). We implemented it in three distinct technologies, all of them integrated within an SR-compliant architecture that allows for collaboration through information sharing: (i) in a centralized Complex Event Processing (CEP) engine (Esper), (ii) in a framework for distributed event processing (Storm) and (iii) in Agilis, a novel platform for batch-oriented processing which leverages the Hadoop framework and a RAM-based storage for fast data access. Regardless of the employed technology, all the evaluations have shown that increasing the number of participants (that is, increasing the amount of input data at disposal), allows to improve the detection accuracy. The experiments made clear that a distributed approach allows for lower detection latency and for keeping up with higher input throughput, compared with a centralized one. * Distributing the computation over a set of physical nodes introduces the issue of improving the way available resources are assigned to the elaboration tasks to execute, with the aim of minimizing the time the computation takes to complete. We investigated this aspect in Storm by developing two distinct scheduling algorithms, both aimed at decreasing the average elaboration time of the single input event by decreasing the inter-node traffic. Experimental evaluations showed that these two algorithms can improve the performance up to 30%. * Computations in online processing platforms (like Esper and Storm) are run continuously, and the need of refining running computations or adding new computations, together with the need to cope with the variability of the input, requires the possibility to adapt the resource allocation at runtime, which entails a set of additional problems. Among them, the most relevant concern how to cope with incoming data and processing state while the topology is being reconfigured, and the issue of temporary reduced performance. At this aim, we also explored the alternative approach of running the computation periodically on batches of input data: although it involves a performance penalty on the elaboration latency, it allows to eliminate the great complexity of dynamic reconfigurations. We chose Hadoop as batch-oriented processing framework and we developed some strategies specific for dealing with computations based on time windows, which are very likely to be used for pattern recognition purposes, like in the case of intrusion detection. Our evaluations provided a comparison of these strategies and made evident the kind of performance that this approach can provide

    Facial feature point fitting with combined color and depth information for interactive displays

    Get PDF
    Interactive displays are driven by natural interaction with the user, necessitating a computer system that recognizes body gestures and facial expressions. User inputs are not easily or reliably recognized for a satisfying user experience, as the complexities of human communication are difficult to interpret in real-time. Recognizing facial expressions in particular is a problem that requires high-accuracy and efficiency for stable interaction environments. The recent availability of the Kinect, a low cost, low resolution sensor that supplies simultaneous color and depth images, provides a breakthrough opportunity to enhance the interactive capabilities of displays and overall user experience. This new RGBD (RGB + depth) sensor generates an additional channel of depth information that can be used to improve the performance of existing state of the art technology and develop new techniques. The Active Shape Model (ASM) is a well-known deformable model that has been extensively studied for facial feature point placement. Previous shape model techniques have applied 3D reconstruction techniques using multiple cameras or other statistical methods for producing 3D information from 2D color images. These methods showed improved results compared to using only color data, but required an additional deformable model or expensive imaging equipment. In this thesis, an ASM model is trained using the RGBD image produced by the Kinect. The real-time information from the depth sensor is registered to the color image to create a pixel-for-pixel match. To improve the quality of the depth image, a temporal median filter is applied to reduce random noise produced by the sensor. The resulting combined model is designed to produce more robust fitting of facial feature points compared to a purely color based active shape model

    Scalable and Declarative Information Extraction in a Parallel Data Analytics System

    Get PDF
    Informationsextraktions (IE) auf sehr großen Datenmengen erfordert hochkomplexe, skalierbare und anpassungsfähige Systeme. Obwohl zahlreiche IE-Algorithmen existieren, ist die nahtlose und erweiterbare Kombination dieser Werkzeuge in einem skalierbaren System immer noch eine große Herausforderung. In dieser Arbeit wird ein anfragebasiertes IE-System für eine parallelen Datenanalyseplattform vorgestellt, das für konkrete Anwendungsdomänen konfigurierbar ist und für Textsammlungen im Terabyte-Bereich skaliert. Zunächst werden konfigurierbare Operatoren für grundlegende IE- und Web-Analytics-Aufgaben definiert, mit denen komplexe IE-Aufgaben in Form von deklarativen Anfragen ausgedrückt werden können. Alle Operatoren werden hinsichtlich ihrer Eigenschaften charakterisiert um das Potenzial und die Bedeutung der Optimierung nicht-relationaler, benutzerdefinierter Operatoren (UDFs) für Data Flows hervorzuheben. Anschließend wird der Stand der Technik in der Optimierung nicht-relationaler Data Flows untersucht und herausgearbeitet, dass eine umfassende Optimierung von UDFs immer noch eine Herausforderung ist. Darauf aufbauend wird ein erweiterbarer, logischer Optimierer (SOFA) vorgestellt, der die Semantik von UDFs mit in die Optimierung mit einbezieht. SOFA analysiert eine kompakte Menge von Operator-Eigenschaften und kombiniert eine automatisierte Analyse mit manuellen UDF-Annotationen, um die umfassende Optimierung von Data Flows zu ermöglichen. SOFA ist in der Lage, beliebige Data Flows aus unterschiedlichen Anwendungsbereichen logisch zu optimieren, was zu erheblichen Laufzeitverbesserungen im Vergleich mit anderen Techniken führt. Als Viertes wird die Anwendbarkeit des vorgestellten Systems auf Korpora im Terabyte-Bereich untersucht und systematisch die Skalierbarkeit und Robustheit der eingesetzten Methoden und Werkzeuge beurteilt um schließlich die kritischsten Herausforderungen beim Aufbau eines IE-Systems für sehr große Datenmenge zu charakterisieren.Information extraction (IE) on very large data sets requires highly complex, scalable, and adaptive systems. Although numerous IE algorithms exist, their seamless and extensible combination in a scalable system still is a major challenge. This work presents a query-based IE system for a parallel data analysis platform, which is configurable for specific application domains and scales for terabyte-sized text collections. First, configurable operators are defined for basic IE and Web Analytics tasks, which can be used to express complex IE tasks in the form of declarative queries. All operators are characterized in terms of their properties to highlight the potential and importance of optimizing non-relational, user-defined operators (UDFs) for dataflows. Subsequently, we survey the state of the art in optimizing non-relational dataflows and highlight that a comprehensive optimization of UDFs is still a challenge. Based on this observation, an extensible, logical optimizer (SOFA) is introduced, which incorporates the semantics of UDFs into the optimization process. SOFA analyzes a compact set of operator properties and combines automated analysis with manual UDF annotations to enable a comprehensive optimization of data flows. SOFA is able to logically optimize arbitrary data flows from different application areas, resulting in significant runtime improvements compared to other techniques. Finally, the applicability of the presented system to terabyte-sized corpora is investigated. Hereby, we systematically evaluate scalability and robustness of the employed methods and tools in order to pinpoint the most critical challenges in building an IE system for very large data sets

    Efficient Implementation of Stochastic Inference on Heterogeneous Clusters and Spiking Neural Networks

    Get PDF
    Neuromorphic computing refers to brain inspired algorithms and architectures. This paradigm of computing can solve complex problems which were not possible with traditional computing methods. This is because such implementations learn to identify the required features and classify them based on its training, akin to how brains function. This task involves performing computation on large quantities of data. With this inspiration, a comprehensive multi-pronged approach is employed to study and efficiently implement neuromorphic inference model using heterogeneous clusters to address the problem using traditional Von Neumann architectures and by developing spiking neural networks (SNN) for native and ultra-low power implementation. In this regard, an extendable high-performance computing (HPC) framework and optimizations are proposed for heterogeneous clusters to modularize complex neuromorphic applications in a distributed manner. To achieve best possible throughput and load balancing for such modularized architectures a set of algorithms are proposed to suggest the optimal mapping of different modules as an asynchronous pipeline to the available cluster resources while considering the complex data dependencies between stages. On the other hand, SNNs are more biologically plausible and can achieve ultra-low power implementation due to its sparse spike based communication, which is possible with emerging non-Von Neumann computing platforms. As a significant progress in this direction, spiking neuron models capable of distributed online learning are proposed. A high performance SNN simulator (SpNSim) is developed for simulation of large scale mixed neuron model networks. An accompanying digital hardware neuron RTL is also proposed for efficient real time implementation of SNNs capable of online learning. Finally, a methodology for mapping probabilistic graphical model to off-the-shelf neurosynaptic processor (IBM TrueNorth) as a stochastic SNN is presented with ultra-low power consumption
    corecore