Search CORE

227 research outputs found

Hare: a file system for non-cache-coherent multicores

Author: Barak A.
Barbalace A.
Baumann A.
Fiuczynski M. E.
Gamsa B.
Jujjuri V.
Mickens J.
Nightingale E. B.
Pawlowski B.
Reinders J.
Schmuck F.
Weil S. A.
Whitehouse S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2015
Field of study

Hare is a new file system that provides a POSIX-like interface on multicore processors without cache coherence. Hare allows applications on different cores to share files, directories, and file descriptors. The challenge in designing Hare is to support the shared abstractions faithfully enough to run applications that run on traditional shared-memory operating systems, with few modifications, and to do so while scaling with an increasing number of cores. To achieve this goal, Hare must support features (such as shared file descriptors) that traditional network file systems don't support, as well as implement them in a way that scales (e.g., shard a directory across servers to allow concurrent operations in that directory). Hare achieves this goal through a combination of new protocols (including a 3-phase commit protocol to implement directory operations correctly and scalably) and leveraging properties of non-cache-coherent multiprocessors (e.g., atomic low-latency message delivery and shared DRAM). An evaluation on a 40-core machine demonstrates that Hare can run many challenging Linux applications (including a mail server and a Linux kernel build) with minimal or no modifications. The results also show these applications achieve good scalability on Hare, and that Hare's techniques are important to achieving scalability.Quanta Computer (Firm

DSpace@MIT

Crossref

Security Log Analysis Using Hadoop

Author: Annangi Harikrishna
Publication venue: The Repository at St. Cloud State
Publication date: 01/03/2017
Field of study

Hadoop is used as a general-purpose storage and analysis platform for big data by industries. Commercial Hadoop support is available from large enterprises, like EMC, IBM, Microsoft and Oracle and Hadoop companies like Cloudera, Hortonworks, and Map Reduce. Hadoop is a scheme written in Java that allows distributed processes of large data sets across clusters of computers using programming models. A Hadoop frame work application works in an environment that provides storage and computation across clusters of computers. This is designed to scale up from a single server to thousands of machines with local computation and storage. Security breaches happen most frequently nowadays which can be found out by monitoring the server logs. This server-log analysis can be done by using Hadoop. This takes the analysis to the next level by improving security forensics which can be done as a low-cost platform

St. Cloud State University

Suorituskyky ja skaalautuvuus sensoridatan tallennuksessa

Author: Tötterman Paul
Publication venue
Publication date: 30/03/2015
Field of study

Modern artificial intelligence and machine learning applications build on analysis and training using large datasets. New research and development does not always start with existing big datasets, but accumulate data over time. The same storage solution does not necessarily cover the scale during the lifetime of the research, especially if scaling up from using common workgroup storage technologies. The storage infrastructure at ZenRobotics has grown using standard workgroup technologies. The current approach is starting to show its limits, while the storage growth is predicted to continue and accelerate. Successful capacity planning and expansion requires a better understanding of the patterns of the use of storage and its growth. We have examined the current storage architecture and stored data from different perspectives in order to gain a better understanding of the situation. By performing a number of experiments we determine key properties of the employed technologies. The combination of these factors allows us to make informed decisions about future storage solutions. Current usage patterns are in many ways inefficient and changes are needed in order to be able to work with larger volumes of data. Some changes would allow to scale the current architecture a bit further, but in order to scale horizontally instead of just vertically, there is a need to start designing for scalability in the future system architecture.Modernit tekoälyn ja koneoppimisen sovellukset perustuvat suurten tietomää- rien analyysin ja käyttöön opetusdatana. Suuren aineiston olemassaolo ei aina ole itsestäänselvää tutkimuksen tai tuotekehityksen alkaessa. Samat tallennus- ratkaisut eivät välttämättä pysty kattamaan skaalautumistarpeita tutkimuksen koko keston ajalta, varsinkaan jos lähtökohtana ovat laajassa käytössä olevat työryhmätallennusratkaisut. ZenRoboticsilla käytössä oleva tallennusinfrastruktuuri on kasvanut yleisiä työ- ryhmätallennusteknologioita käyttäen. Nykyisen lähestymistavan rajat alkavat tulla vastaan, kun taas tallennuskapasiteetin tarve näyttäisi kasvavan ja kasvun tahti kiihtyvän. Tallennuskapasiteetin laajentamisen suunnittelu ja laajennuksen toteuttaminen edellyttävät parempaa käyttötapojen ja kasvun ymmärrystä. Tämä diplomityö tutkii nykyistä tallennusarkkitehtuuria ja tallennettua dataa eri näkökulmista nykytilanteen parempaan hahmottamiseen tähdäten. Suoritetuilla mittauksilla selvitimme käytössä olevien teknologioiden oleellisimmat ominaisuu- det. Yhdessä näiden perusteella pystymme tekemään tietoisempia valintoja tulevia tallennusratkaisuja koskien. Nykyiset käyttötavat ovat monin tavoin tehottomia. Suurempien tietomäärien käsittelemisen mahdollistamiseksi on tehtävä muutoksia. Työ esittelee muuto- sehdotuksia, joilla olisi mahdollista skaalata nykyistä tallennusarkkitehtuuria hieman suuremmalle kapasiteetille. Horisontaalisen skaalautumisen mahdollista- miseksi vertikaalisen sijaan on kuitenkin otettava skaalautuminen huomioon koko järjestelmän arkkitehtuurin suunnittelussa

Aaltodoc Publication Archive

Analyzing Metadata Performance in Distributed File Systems

Author: Biardzki Christoph
Publication venue
Publication date: 01/01/2008
Field of study

Distributed file systems are important building blocks in modern computing environments. The challenge of increasing I/O bandwidth to files has been largely resolved by the use of parallel file systems and sufficient hardware. However, determining the best means by which to manage large amounts of metadata, which contains information about files and directories stored in a distributed file system, has proved a more difficult challenge. The objective of this thesis is to analyze the role of metadata and present past and current implementations and access semantics. Understanding the development of the current file system interfaces and functionality is a key to understanding their performance limitations. Based on this analysis, a distributed metadata benchmark termed DMetabench is presented. DMetabench significantly improves on existing benchmarks and allows stress on metadata operations in a distributed file system in a parallelized manner. Both intranode and inter-node parallelity, current trends in computer architecture, can be explicitly tested with DMetabench. This is due to the fact that a distributed file system can have different semantics inside a client node rather than semantics between multiple nodes. As measurements in larger distributed environments may exhibit performance artifacts difficult to explain by reference to average numbers, DMetabench uses a time-logging technique to record time-related changes in the performance of metadata operations and also protocols additional details of the runtime environment for post-benchmark analysis. Using the large production file systems at the Leibniz Supercomputing Center (LRZ) in Munich, the functionality of DMetabench is evaluated by means of measurements on different distributed file systems. The results not only demonstrate the effectiveness of the methods proposed but also provide unique insight into the current state of metadata performance in modern file systems

Heidelberger Dokumentenserver