Search CORE

18,865 research outputs found

Blazes: Coordination Analysis for Distributed Programs

Author: Alvaro Peter
Conway Neil
Hellerstein Joseph M.
Maier David
Publication venue
Publication date: 28/11/2013
Field of study

Distributed consistency is perhaps the most discussed topic in distributed systems today. Coordination protocols can ensure consistency, but in practice they cause undesirable performance unless used judiciously. Scalable distributed architectures avoid coordination whenever possible, but under-coordinated systems can exhibit behavioral anomalies under fault, which are often extremely difficult to debug. This raises significant challenges for distributed system architects and developers. In this paper we present Blazes, a cross-platform program analysis framework that (a) identifies program locations that require coordination to ensure consistent executions, and (b) automatically synthesizes application-specific coordination code that can significantly outperform general-purpose techniques. We present two case studies, one using annotated programs in the Twitter Storm system, and another using the Bloom declarative language.Comment: Updated to include additional materials from the original technical report: derivation rules, output stream label

arXiv.org e-Print Archive

CiteSeerX

Crossref

OGSA first impressions: a case study re-engineering a scientific applicationwith the open grid services architecture

Author: Butchart B
Chapman C
Emmerich W
Publication venue
Publication date: 01/01/2003
Field of study

We present a case study of our experience re-engineeringa scientific application using the Open Grid Services Architecture(OGSA), a new specification for developing Gridapplications using web service technologies such as WSDLand SOAP. During the last decade, UCL?s Chemistry departmenthas developed a computational approach for predictingthe crystal structures of small molecules. However,each search involves running large iterations of computationallyexpensive calculations and currently takes a fewmonths to perform. Making use of early implementationsof the OGSA specification we have wrapped the Fortranbinaries into OGSI-compliant service interfaces to exposethe existing scientific application as a set of loosely coupledweb services. We show how the OGSA implementationfacilitates the distribution of such applications across alarge network, radically improving performance of the systemthrough parallel CPU capacity, coordinated resourcemanagement and automation of the computational process.We discuss the difficulties that we encountered turning Fortranexecutables into OGSA services and delivering a robust,scalable system. One unusual aspect of our approachis the way we transfer input and output data for the Fortrancodes. Instead of employing a file transfer service wetransform the XML encoded data in the SOAP message tonative file format, where possible using XSLT stylesheets.We also discuss a computational workflow service that enablesusers to distribute and manage parts of the computationalprocess across different clusters and administrativedomains. We examine how our experience re-engineeringthe polymorph prediction application led to this approachand to what extent our efforts have succeeded

CiteSeerX

UCL Discovery

Improving Malware Detection Accuracy by Extracting Icon Information

Author: Akhavan-Masouleh Sepehr
Li Li
Silva Pedro
Publication venue
Publication date: 10/12/2017
Field of study

Detecting PE malware files is now commonly approached using statistical and machine learning models. While these models commonly use features extracted from the structure of PE files, we propose that icons from these files can also help better predict malware. We propose an innovative machine learning approach to extract information from icons. Our proposed approach consists of two steps: 1) extracting icon features using summary statics, histogram of gradients (HOG), and a convolutional autoencoder, 2) clustering icons based on the extracted icon features. Using publicly available data and by using machine learning experiments, we show our proposed icon clusters significantly boost the efficacy of malware prediction models. In particular, our experiments show an average accuracy increase of 10% when icon clusters are used in the prediction model.Comment: Full version. IEEE MIPR 201

arXiv.org e-Print Archive

Crossref

Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

Author: Dalton Jeff
Li Zhenghua
Lin Jimmy
Mishne Gilad
Sharma Aneesh
Publication venue
Publication date: 27/10/2012
Field of study

We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data

arXiv.org e-Print Archive

CiteSeerX

Algorithmic Clustering of Music

Author: Cilibrasi Rudi
de Wolf Ronald
Vitanyi Paul
Publication venue
Publication date: 01/01/2003
Field of study

We present a fully automatic method for music classification, based only on compression of strings that represent the music pieces. The method uses no background knowledge about music whatsoever: it is completely general and can, without change, be used in different areas like linguistic classification and genomics. It is based on an ideal theory of the information content in individual objects (Kolmogorov complexity), information distance, and a universal similarity metric. Experiments show that the method distinguishes reasonably well between various musical genres and can even cluster pieces by composer.Comment: 17 pages, 11 figure

arXiv.org e-Print Archive

CiteSeerX

A computationally efficient framework for large-scale distributed fingerprint matching

Author: Muhammad Atif
Publication venue
Publication date: 01/01/2017
Field of study

A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of requirements for the degree of Master of Science, School of Computer Science and Applied Mathematics. May 2017.Biometric features have been widely implemented to be utilized for forensic and civil applications. Amongst many diﬀerent kinds of biometric characteristics, the ﬁngerprint is globally accepted and remains the mostly used biometric characteristic by commercial and industrial societies due to its easy acquisition, uniqueness, stability and reliability. There are currently various eﬀective solutions available, however the ﬁngerprint identiﬁcation is still not considered a fully solved problem mainly due to accuracy and computational time requirements. Although many of the ﬁngerprint recognition systems based on minutiae provide good accuracy, the systems with very large databases require fast and real time comparison of ﬁngerprints, they often either fail to meet the high performance speed requirements or compromise the accuracy. For ﬁngerprint matching that involves databases containing millions of ﬁngerprints, real time identiﬁcation can only be obtained through the implementation of optimal algorithms that may utilize the given hardware as robustly and efﬁciently as possible. There are currently no known distributed database and computing framework available that deal with real time solution for ﬁngerprint recognition problem involving databases containing as many as sixty million ﬁngerprints, the size which is close to the size of the South African population. This research proposal intends to serve two main purposes: 1) exploit and scale the best known minutiae matching algorithm for a minimum of sixty million ﬁngerprints; and 2) design a framework for distributed database to deal with large ﬁngerprint databases based on the results obtained in the former item.GR201

Wits Institutional Repository on DSPACE