200 research outputs found

    Support Efficient, Scalable, and Online Social Spam Detection in System

    Get PDF
    The broad success of online social networks (OSNs) has created fertile soil for the emergence and fast spread of social spam. Fake news, malicious URL links, fraudulent advertisements, fake reviews, and biased propaganda are bringing serious consequences for both virtual social networks and human life in the real world. Effectively detecting social spam is a hot topic in both academia and industry. However, traditional social spam detection techniques are limited to centralized processing on top of one specific data source but ignore the social spam correlations of distributed data sources. Moreover, a few research efforts are conducting in integrating the stream system (e.g., Storm, Spark) with the large-scale social spam detection, but they typically ignore the specific details in managing and recovering interim states during the social stream data processing. We observed that social spammers who aim to advertise their products or post victim links are more frequently spreading malicious posts during a very short period of time. They are quite smart to adapt themselves to old models that were trained based on historical records. Therefore, these bring a question: how can we uncover and defend against these online spam activities in an online and scalable manner? In this dissertation, we present there systems that support scalable and online social spam detection from streaming social data: (1) the first part introduces Oases, a scalable system that can support large-scale online social spam detection, (2) the second part introduces a system named SpamHunter, a novel system that supports efficient online scalable spam detection in social networks. The system gives novel insights in guaranteeing the efficiency of the modern stream applications by leveraging the spam correlations at scale, and (3) the third part refers to the state recovery during social spam detection, it introduces a customizable state recovery framework that provides fast and scalable state recovery mechanisms for protecting large distributed states in social spam detection applications

    Quality-of-Service-Adequate Wireless Receiver Design

    Get PDF

    FPGA acceleration of DNA sequence alignment: design analysis and optimization

    Get PDF
    Existing FPGA accelerators for short read mapping often fail to utilize the complete biological information in sequencing data for simple hardware design, leading to missed or incorrect alignment. In this work, we propose a runtime reconfigurable alignment pipeline that considers all information in sequencing data for the biologically accurate acceleration of short read mapping. We focus our efforts on accelerating two string matching techniques: FM-index and the Smith-Waterman algorithm with the affine-gap model which are commonly used in short read mapping. We further optimize the FPGA hardware using a design analyzer and merger to improve alignment performance. The contributions of this work are as follows. 1. We accelerate the exact-match and mismatch alignment by leveraging the FM-index technique. We optimize memory access by compressing the data structure and interleaving the access with multiple short reads. The FM-index hardware also considers complete information in the read data to maximize accuracy. 2. We propose a seed-and-extend model to accelerate alignment with indels. The FM-index hardware is extended to support the seeding stage while a Smith-Waterman implementation with the affine-gap model is developed on FPGA for the extension stage. This model can improve the efficiency of indel alignment with comparable accuracy versus state-of-the-art software. 3. We present an approach for merging multiple FPGA designs into a single hardware design, so that multiple place-and-route tasks can be replaced by a single task to speed up functional evaluation of designs. We first experiment with this approach to demonstrate its feasibility for different designs. Then we apply this approach to optimize one of the proposed FPGA aligners for better alignment performance.Open Acces

    Language Design for Reactive Systems: On Modal Models, Time, and Object Orientation in Lingua Franca and SCCharts

    Get PDF
    Reactive systems play a crucial role in the embedded domain. They continuously interact with their environment, handle concurrent operations, and are commonly expected to provide deterministic behavior to enable application in safety-critical systems. In this context, language design is a key aspect, since carefully tailored language constructs can aid in addressing the challenges faced in this domain, as illustrated by the various concurrency models that prevent the known pitfalls of regular threads. Today, many languages exist in this domain and often provide unique characteristics that make them specifically fit for certain use cases. This thesis evolves around two distinctive languages: the actor-oriented polyglot coordination language Lingua Franca and the synchronous statecharts dialect SCCharts. While they take different approaches in providing reactive modeling capabilities, they share clear similarities in their semantics and complement each other in design principles. This thesis analyzes and compares key design aspects in the context of these two languages. For three particularly relevant concepts, it provides and evaluates lean and seamless language extensions that are carefully aligned with the fundamental principles of the underlying language. Specifically, Lingua Franca is extended toward coordinating modal behavior, while SCCharts receives a timed automaton notation with an efficient execution model using dynamic ticks and an extension toward the object-oriented modeling paradigm

    Monitoring and analysis system for performance troubleshooting in data centers

    Get PDF
    It was not long ago. On Christmas Eve 2012, a war of troubleshooting began in Amazon data centers. It started at 12:24 PM, with an mistaken deletion of the state data of Amazon Elastic Load Balancing Service (ELB for short), which was not realized at that time. The mistake first led to a local issue that a small number of ELB service APIs were affected. In about six minutes, it evolved into a critical one that EC2 customers were significantly affected. One example was that Netflix, which was using hundreds of Amazon ELB services, was experiencing an extensive streaming service outage when many customers could not watch TV shows or movies on Christmas Eve. It took Amazon engineers 5 hours 42 minutes to find the root cause, the mistaken deletion, and another 15 hours and 32 minutes to fully recover the ELB service. The war ended at 8:15 AM the next day and brought the performance troubleshooting in data centers to world’s attention. As shown in this Amazon ELB case.Troubleshooting runtime performance issues is crucial in time-sensitive multi-tier cloud services because of their stringent end-to-end timing requirements, but it is also notoriously difficult and time consuming. To address the troubleshooting challenge, this dissertation proposes VScope, a flexible monitoring and analysis system for online troubleshooting in data centers. VScope provides primitive operations which data center operators can use to troubleshoot various performance issues. Each operation is essentially a series of monitoring and analysis functions executed on an overlay network. We design a novel software architecture for VScope so that the overlay networks can be generated, executed and terminated automatically, on-demand. From the troubleshooting side, we design novel anomaly detection algorithms and implement them in VScope. By running anomaly detection algorithms in VScope, data center operators are notified when performance anomalies happen. We also design a graph-based guidance approach, called VFocus, which tracks the interactions among hardware and software components in data centers. VFocus provides primitive operations by which operators can analyze the interactions to find out which components are relevant to the performance issue. VScope’s capabilities and performance are evaluated on a testbed with over 1000 virtual machines (VMs). Experimental results show that the VScope runtime negligibly perturbs system and application performance, and requires mere seconds to deploy monitoring and analytics functions on over 1000 nodes. This demonstrates VScope’s ability to support fast operation and online queries against a comprehensive set of application to system/platform level metrics, and a variety of representative analytics functions. When supporting algorithms with high computation complexity, VScope serves as a ‘thin layer’ that occupies no more than 5% of their total latency. Further, by using VFocus, VScope can locate problematic VMs that cannot be found via solely application-level monitoring, and in one of the use cases explored in the dissertation, it operates with levels of perturbation of over 400% less than what is seen for brute-force and most sampling-based approaches. We also validate VFocus with real-world data center traces. The experimental results show that VFocus has troubleshooting accuracy of 83% on average.Ph.D

    Interfaces for Modular Surgical Planning and Assistance Systems

    Get PDF
    Modern surgery of the 21st century relies in many aspects on computers or, in a wider sense, digital data processing. Department administration, OR scheduling, billing, and - with increasing pervasion - patient data management are performed with the aid of so called Surgical Information Systems (SIS) or, more general, Hospital Information Systems (HIS). Computer Assisted Surgery (CAS) summarizes techniques which assist a surgeon in the preparation and conduction of surgical interventions. Today still predominantly based on radiology images, these techniques include the preoperative determination of an optimal surgical strategy and intraoperative systems which aim at increasing the accuracy of surgical manipulations. CAS is a relatively young field of computer science. One of the unsolved "teething troubles" of CAS is the absence of technical standards for the interconnectivity of CAS system. Current CAS systems are usually "islands of information" with no connection to other devices within the operating room or hospital-wide information systems. Several workshop reports and individual publications point out that this situation leads to ergonomic, logistic, and economic limitations in hospital work. Perioperative processes are prolonged by the manual installation and configuration of an increasing amount of technical devices. Intraoperatively, a large amount of the surgeons'' attention is absorbed by the requirement to monitor and operate systems. The need for open infrastructures which enable the integration of CAS devices from different vendors in order to exchange information as well as commands among these devices through a network has been identified by numerous experts with backgrounds in medicine as well as engineering. This thesis contains two approaches to the integration of CAS systems: - For perioperative data exchange, the specification of new data structures as an amendment to the existing DICOM standard for radiology image management is presented. The extension of DICOM towards surgical application allows for the seamless integration of surgical planning and reporting systems into DICOM-based Picture Archiving and Communication Systems (PACS) as they are installed in most hospitals for the exchange and long-term archival of patient images and image-related patient data. - For the integration of intraoperatively used CAS devices, such as, e.g., navigation systems, video image sources, or biosensors, the concept of a surgical middleware is presented. A c++ class library, the TiCoLi, is presented which facilitates the configuration of ad-hoc networks among the modules of a distributed CAS system as well as the exchange of data streams, singular data objects, and commands between these modules. The TiCoLi is the first software library for a surgical field of application to implement all of these services. To demonstrate the suitability of the presented specifications and their implementation, two modular CAS applications are presented which utilize the proposed DICOM extensions for perioperative exchange of surgical planning data as well as the TiCoLi for establishing an intraoperative network of autonomous, yet not independent, CAS modules.Die moderne Hochleistungschirurgie des 21. Jahrhunderts ist auf vielerlei Weise abhĂ€ngig von Computern oder, im weiteren Sinne, der digitalen Datenverarbeitung. Administrative AblĂ€ufe, wie die Erstellung von NutzungsplĂ€nen fĂŒr die verfĂŒgbaren technischen, rĂ€umlichen und personellen Ressourcen, die Rechnungsstellung und - in zunehmendem Maße - die Verwaltung und Archivierung von Patientendaten werden mit Hilfe von digitalen Informationssystemen rationell und effizient durchgefĂŒhrt. Innerhalb der Krankenhausinformationssysteme (KIS, oder englisch HIS) stehen fĂŒr die speziellen BedĂŒrfnisse der einzelnen Fachabteilungen oft spezifische Informationssysteme zur VerfĂŒgung. Chirurgieinformationssysteme (CIS, oder englisch SIS) decken hierbei vor allen Dingen die Bereiche Operationsplanung sowie Materialwirtschaft fĂŒr spezifisch chirurgische Verbrauchsmaterialien ab. WĂ€hrend die genannten HIS und SIS vornehmlich der Optimierung administrativer Aufgaben dienen, stehen die Systeme der Computerassistierten Chirugie (CAS) wesentlich direkter im Dienste der eigentlichen chirugischen Behandlungsplanung und Therapie. Die CAS verwendet Methoden der Robotik, digitalen Bild- und Signalverarbeitung, kĂŒnstlichen Intelligenz, numerischen Simulation, um nur einige zu nennen, zur patientenspezifischen Behandlungsplanung und zur intraoperativen UnterstĂŒtzung des OP-Teams, allen voran des Chirurgen. Vor allen Dingen Fortschritte in der rĂ€umlichen Verfolgung von Werkzeugen und Patienten ("Tracking"), die VerfĂŒgbarkeit dreidimensionaler radiologischer Aufnahmen (CT, MRT, ...) und der Einsatz verschiedener Robotersysteme haben in den vergangenen Jahrzehnten den Einzug des Computers in den Operationssaal - medienwirksam - ermöglicht. Weniger prominent, jedoch keinesfalls von untergeordnetem praktischen Nutzen, sind Beispiele zur automatisierten Überwachung klinischer Messwerte, wie etwa Blutdruck oder SauerstoffsĂ€ttigung. Im Gegensatz zu den meist hochgradig verteilten und gut miteinander verwobenen Informationssystemen fĂŒr die Krankenhausadministration und Patientendatenverwaltung, sind die Systeme der CAS heutzutage meist wenig oder ĂŒberhaupt nicht miteinander und mit Hintergrundsdatenspeichern vernetzt. Eine Reihe wissenschaftlicher Publikationen und interdisziplinĂ€rer Workshops hat sich in den vergangen ein bis zwei Jahrzehnten mit den Problemen des Alltagseinsatzes von CAS Systemen befasst. Mit steigender IntensitĂ€t wurde hierbei auf den Mangel an infrastrukturiellen Grundlagen fĂŒr die Vernetzung intraoperativ eingesetzter CAS Systeme miteinander und mit den perioperativ eingesetzten Planungs-, Dokumentations- und Archivierungssystemen hingewiesen. Die sich daraus ergebenden negativen EinflĂŒsse auf die Effizienz perioperativer AblĂ€ufe - jedes GerĂ€t muss manuell in Betrieb genommen und mit den spezifischen Daten des nĂ€chsten Patienten gefĂŒttert werden - sowie die zunehmende Aufmerksamkeit, welche der Operateur und sein Team auf die Überwachung und dem Betrieb der einzelnen GerĂ€te verwenden muss, werden als eine der "Kinderkrankheiten" dieser relativ jungen Technologie betrachtet und stehen einer Verbreitung ĂŒber die Grenzen einer engagierten technophilen Nutzergruppe hinaus im Wege. Die vorliegende Arbeit zeigt zwei parallel von einander (jedoch, im Sinne der SchnittstellenkompatibilitĂ€t, nicht gĂ€nzlich unabhĂ€ngig voneinander) zu betreibende AnsĂ€tze zur Integration von CAS Systemen. - FĂŒr den perioperativen Datenaustausch wird die Spezifikation zusĂ€tzlicher Datenstrukturen zum Transfer chirurgischer Planungsdaten im Rahmen des in radiologischen Bildverarbeitungssystemen weit verbreiteten DICOM Standards vorgeschlagen und an zwei Beispielen vorgefĂŒhrt. Die Erweiterung des DICOM Standards fĂŒr den perioperativen Einsatz ermöglicht hierbei die nahtlose Integration chirurgischer Planungssysteme in existierende "Picture Archiving and Communication Systems" (PACS), welche in den meisten FĂ€llen auf dem DICOM Standard basieren oder zumindest damit kompatibel sind. Dadurch ist einerseits der Tatsache Rechnung getragen, dass die patientenspezifische OP-Planung in hohem Masse auf radiologischen Bildern basiert und andererseits sicher gestellt, dass die Planungsergebnisse entsprechend der geltenden Bestimmungen langfristig archiviert und gegen unbefugten Zugriff geschĂŒtzt sind - PACS Server liefern hier bereits wohlerprobte Lösungen. - FĂŒr die integration intraoperativer CAS Systeme, wie etwa Navigationssysteme, Videobildquellen oder Sensoren zur Überwachung der Vitalparameter, wird das Konzept einer "chirurgischen Middleware" vorgestellt. Unter dem Namen TiCoLi wurde eine c++ Klassenbibliothek entwickelt, auf deren Grundlage die Konfiguration von ad-hoc Netzwerken wĂ€hrend der OP-Vorbereitung mittels plug-and-play Mechanismen erleichtert wird. Nach erfolgter Konfiguration ermöglicht die TiCoLi den Austausch kontinuierlicher Datenströme sowie einzelner Datenpakete und Kommandos zwischen den Modulen einer verteilten CAS Anwendung durch ein Ethernet-basiertes Netzwerk. Die TiCoLi ist die erste frei verfĂŒgbare Klassenbibliothek welche diese FunktionalitĂ€ten dediziert fĂŒr einen Einsatz im chirurgischen Umfeld vereinigt. Zum Nachweis der Tauglichkeit der gezeigten Spezifikationen und deren Implementierungen, werden zwei modulare CAS Anwendungen prĂ€sentiert, welche die vorgeschlagenen DICOM Erweiterungen zum perioperativen Austausch von Planungsergebnissen sowie die TiCoLi zum intraoperativen Datenaustausch von Messdaten unter echzeitnahen Anforderungen verwenden

    Accelerating Iterative Computations for Large-Scale Data Processing

    Get PDF
    Recent advances in sensing, storage, and networking technologies are creating massive amounts of data at an unprecedented scale and pace. Large-scale data processing is commonly leveraged to make sense of these data, which will enable companies, governments, and organizations, to make better decisions and bring convenience to our daily life. However, the massive amount of data involved makes it challenging to perform data processing in a timely manner. On the one hand, huge volumes of data might not even fit into the disk of a single machine. On the other hand, data mining and machine learning algorithms, which are usually involved in large-scale data processing, typically require time-consuming iterative computations. Therefore, it is imperative to efficiently perform iterative computations on large computer clusters or cloud using highly-parallel and shared-nothing distributed systems. This research aims to explore new forms of iterative computations that reduce unnecessary computations so as to accelerate large-scale data processing in a distributed environment. We propose the iterative computation transformation for well-known data mining and machine learning algorithms, such as expectation-maximization, nonnegative matrix factorization, belief propagation, and graph algorithms (e.g., PageRank). These algorithms have been used in a wide range of application domains. First, we show how to accelerate expectation-maximization algorithms with frequent updates in a distributed environment. Then, we illustrate the way of efficiently scaling distributed nonnegative matrix factorization with block-wise updates. Next, our approach of scaling distributed belief propagation with prioritized block updates is presented. Last, we illustrate how to efficiently perform distributed incremental computation on evolving graphs. We will elaborate how to implement these transformed iterative computations on existing distributed programming models such as the MapReduce-based model, as well as develop new scalable and efficient distributed programming models and frameworks when necessary. The goal of these supporting distributed frameworks is to lift the burden of the programmers in specifying transformation of iterative computations and communication mechanisms, and automatically optimize the execution of the computation. Our techniques are evaluated extensively to demonstrate their efficiency. While the techniques we propose are in the context of specific algorithms, they address the challenges commonly faced in many other algorithms

    Distance-Aware Selective Online Query Processing Over Large Distributed Graphs

    Get PDF
    Performing online selective queries against graphs is a challenging problem due to the unbounded nature of graph queries which leads to poor computation locality. It becomes even difficult when a graph is too large to be fit in the memory. Although there have been emerging efforts on managing large graphs in a distributed and parallel setting, e.g., Pregel, HaLoop and etc, these computing frameworks are designed from the perspective of scalability instead of the query efficiency. In this work, we present our solution methodology for online selective graph queries based on the shortest path distance semantic, which finds various applications in practice. The essential intuition is to build a distance-aware index for online distance-based query processing and to eliminate redundant graph traversal as much as possible. We discuss how the solution can be applied to two types of research problems, distance join and vertex set bonding, which are distance-based graph pattern discovery and finding the structure-wise bonding of vertices, respectively
    • 

    corecore