18,865 research outputs found

    Blazes: Coordination Analysis for Distributed Programs

    Full text link
    Distributed consistency is perhaps the most discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to debug. This raises significant challenges for distributed system architects and developers. In this paper we present Blazes, a cross-platform program analysis framework that (a) identifies program locations that require coordination to ensure consistent executions, and (b) automatically synthesizes application-specific coordination code that can significantly outperform general-purpose techniques. We present two case studies, one using annotated programs in the Twitter Storm system, and another using the Bloom declarative language.Comment: Updated to include additional materials from the original technical report: derivation rules, output stream label

    OGSA first impressions: a case study re-engineering a scientific applicationwith the open grid services architecture

    Get PDF
    We present a case study of our experience re-engineeringa scientific application using the Open Grid Services Architecture(OGSA), a new specification for developing Gridapplications using web service technologies such as WSDLand SOAP. During the last decade, UCL?s Chemistry departmenthas developed a computational approach for predictingthe crystal structures of small molecules. However,each search involves running large iterations of computationallyexpensive calculations and currently takes a fewmonths to perform. Making use of early implementationsof the OGSA specification we have wrapped the Fortranbinaries into OGSI-compliant service interfaces to exposethe existing scientific application as a set of loosely coupledweb services. We show how the OGSA implementationfacilitates the distribution of such applications across alarge network, radically improving performance of the systemthrough parallel CPU capacity, coordinated resourcemanagement and automation of the computational process.We discuss the difficulties that we encountered turning Fortranexecutables into OGSA services and delivering a robust,scalable system. One unusual aspect of our approachis the way we transfer input and output data for the Fortrancodes. Instead of employing a file transfer service wetransform the XML encoded data in the SOAP message tonative file format, where possible using XSLT stylesheets.We also discuss a computational workflow service that enablesusers to distribute and manage parts of the computationalprocess across different clusters and administrativedomains. We examine how our experience re-engineeringthe polymorph prediction application led to this approachand to what extent our efforts have succeeded

    Improving Malware Detection Accuracy by Extracting Icon Information

    Full text link
    Detecting PE malware files is now commonly approached using statistical and machine learning models. While these models commonly use features extracted from the structure of PE files, we propose that icons from these files can also help better predict malware. We propose an innovative machine learning approach to extract information from icons. Our proposed approach consists of two steps: 1) extracting icon features using summary statics, histogram of gradients (HOG), and a convolutional autoencoder, 2) clustering icons based on the extracted icon features. Using publicly available data and by using machine learning experiments, we show our proposed icon clusters significantly boost the efficacy of malware prediction models. In particular, our experiments show an average accuracy increase of 10% when icon clusters are used in the prediction model.Comment: Full version. IEEE MIPR 201

    Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

    Full text link
    We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data

    Algorithmic Clustering of Music

    Full text link
    We present a fully automatic method for music classification, based only on compression of strings that represent the music pieces. The method uses no background knowledge about music whatsoever: it is completely general and can, without change, be used in different areas like linguistic classification and genomics. It is based on an ideal theory of the information content in individual objects (Kolmogorov complexity), information distance, and a universal similarity metric. Experiments show that the method distinguishes reasonably well between various musical genres and can even cluster pieces by composer.Comment: 17 pages, 11 figure

    A computationally efficient framework for large-scale distributed fingerprint matching

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science, School of Computer Science and Applied Mathematics. May 2017.Biometric features have been widely implemented to be utilized for forensic and civil applications. Amongst many diļ¬€erent kinds of biometric characteristics, the ļ¬ngerprint is globally accepted and remains the mostly used biometric characteristic by commercial and industrial societies due to its easy acquisition, uniqueness, stability and reliability. There are currently various eļ¬€ective solutions available, however the ļ¬ngerprint identiļ¬cation is still not considered a fully solved problem mainly due to accuracy and computational time requirements. Although many of the ļ¬ngerprint recognition systems based on minutiae provide good accuracy, the systems with very large databases require fast and real time comparison of ļ¬ngerprints, they often either fail to meet the high performance speed requirements or compromise the accuracy. For ļ¬ngerprint matching that involves databases containing millions of ļ¬ngerprints, real time identiļ¬cation can only be obtained through the implementation of optimal algorithms that may utilize the given hardware as robustly and efļ¬ciently as possible. There are currently no known distributed database and computing framework available that deal with real time solution for ļ¬ngerprint recognition problem involving databases containing as many as sixty million ļ¬ngerprints, the size which is close to the size of the South African population. This research proposal intends to serve two main purposes: 1) exploit and scale the best known minutiae matching algorithm for a minimum of sixty million ļ¬ngerprints; and 2) design a framework for distributed database to deal with large ļ¬ngerprint databases based on the results obtained in the former item.GR201
    • ā€¦
    corecore