227 research outputs found

    Hare: a file system for non-cache-coherent multicores

    Get PDF
    Hare is a new file system that provides a POSIX-like interface on multicore processors without cache coherence. Hare allows applications on different cores to share files, directories, and file descriptors. The challenge in designing Hare is to support the shared abstractions faithfully enough to run applications that run on traditional shared-memory operating systems, with few modifications, and to do so while scaling with an increasing number of cores. To achieve this goal, Hare must support features (such as shared file descriptors) that traditional network file systems don't support, as well as implement them in a way that scales (e.g., shard a directory across servers to allow concurrent operations in that directory). Hare achieves this goal through a combination of new protocols (including a 3-phase commit protocol to implement directory operations correctly and scalably) and leveraging properties of non-cache-coherent multiprocessors (e.g., atomic low-latency message delivery and shared DRAM). An evaluation on a 40-core machine demonstrates that Hare can run many challenging Linux applications (including a mail server and a Linux kernel build) with minimal or no modifications. The results also show these applications achieve good scalability on Hare, and that Hare's techniques are important to achieving scalability.Quanta Computer (Firm

    Security Log Analysis Using Hadoop

    Get PDF
    Hadoop is used as a general-purpose storage and analysis platform for big data by industries. Commercial Hadoop support is available from large enterprises, like EMC, IBM, Microsoft and Oracle and Hadoop companies like Cloudera, Hortonworks, and Map Reduce. Hadoop is a scheme written in Java that allows distributed processes of large data sets across clusters of computers using programming models. A Hadoop frame work application works in an environment that provides storage and computation across clusters of computers. This is designed to scale up from a single server to thousands of machines with local computation and storage. Security breaches happen most frequently nowadays which can be found out by monitoring the server logs. This server-log analysis can be done by using Hadoop. This takes the analysis to the next level by improving security forensics which can be done as a low-cost platform

    Suorituskyky ja skaalautuvuus sensoridatan tallennuksessa

    Get PDF
    Modern artificial intelligence and machine learning applications build on analysis and training using large datasets. New research and development does not always start with existing big datasets, but accumulate data over time. The same storage solution does not necessarily cover the scale during the lifetime of the research, especially if scaling up from using common workgroup storage technologies. The storage infrastructure at ZenRobotics has grown using standard workgroup technologies. The current approach is starting to show its limits, while the storage growth is predicted to continue and accelerate. Successful capacity planning and expansion requires a better understanding of the patterns of the use of storage and its growth. We have examined the current storage architecture and stored data from different perspectives in order to gain a better understanding of the situation. By performing a number of experiments we determine key properties of the employed technologies. The combination of these factors allows us to make informed decisions about future storage solutions. Current usage patterns are in many ways inefficient and changes are needed in order to be able to work with larger volumes of data. Some changes would allow to scale the current architecture a bit further, but in order to scale horizontally instead of just vertically, there is a need to start designing for scalability in the future system architecture.Modernit tekoälyn ja koneoppimisen sovellukset perustuvat suurten tietomää- rien analyysin ja käyttöön opetusdatana. Suuren aineiston olemassaolo ei aina ole itsestäänselvää tutkimuksen tai tuotekehityksen alkaessa. Samat tallennus- ratkaisut eivät välttämättä pysty kattamaan skaalautumistarpeita tutkimuksen koko keston ajalta, varsinkaan jos lähtökohtana ovat laajassa käytössä olevat työryhmätallennusratkaisut. ZenRoboticsilla käytössä oleva tallennusinfrastruktuuri on kasvanut yleisiä työ- ryhmätallennusteknologioita käyttäen. Nykyisen lähestymistavan rajat alkavat tulla vastaan, kun taas tallennuskapasiteetin tarve näyttäisi kasvavan ja kasvun tahti kiihtyvän. Tallennuskapasiteetin laajentamisen suunnittelu ja laajennuksen toteuttaminen edellyttävät parempaa käyttötapojen ja kasvun ymmärrystä. Tämä diplomityö tutkii nykyistä tallennusarkkitehtuuria ja tallennettua dataa eri näkökulmista nykytilanteen parempaan hahmottamiseen tähdäten. Suoritetuilla mittauksilla selvitimme käytössä olevien teknologioiden oleellisimmat ominaisuu- det. Yhdessä näiden perusteella pystymme tekemään tietoisempia valintoja tulevia tallennusratkaisuja koskien. Nykyiset käyttötavat ovat monin tavoin tehottomia. Suurempien tietomäärien käsittelemisen mahdollistamiseksi on tehtävä muutoksia. Työ esittelee muuto- sehdotuksia, joilla olisi mahdollista skaalata nykyistä tallennusarkkitehtuuria hieman suuremmalle kapasiteetille. Horisontaalisen skaalautumisen mahdollista- miseksi vertikaalisen sijaan on kuitenkin otettava skaalautuminen huomioon koko järjestelmän arkkitehtuurin suunnittelussa

    Analyzing Metadata Performance in Distributed File Systems

    Get PDF
    Distributed file systems are important building blocks in modern computing environments. The challenge of increasing I/O bandwidth to files has been largely resolved by the use of parallel file systems and sufficient hardware. However, determining the best means by which to manage large amounts of metadata, which contains information about files and directories stored in a distributed file system, has proved a more difficult challenge. The objective of this thesis is to analyze the role of metadata and present past and current implementations and access semantics. Understanding the development of the current file system interfaces and functionality is a key to understanding their performance limitations. Based on this analysis, a distributed metadata benchmark termed DMetabench is presented. DMetabench significantly improves on existing benchmarks and allows stress on metadata operations in a distributed file system in a parallelized manner. Both intranode and inter-node parallelity, current trends in computer architecture, can be explicitly tested with DMetabench. This is due to the fact that a distributed file system can have different semantics inside a client node rather than semantics between multiple nodes. As measurements in larger distributed environments may exhibit performance artifacts difficult to explain by reference to average numbers, DMetabench uses a time-logging technique to record time-related changes in the performance of metadata operations and also protocols additional details of the runtime environment for post-benchmark analysis. Using the large production file systems at the Leibniz Supercomputing Center (LRZ) in Munich, the functionality of DMetabench is evaluated by means of measurements on different distributed file systems. The results not only demonstrate the effectiveness of the methods proposed but also provide unique insight into the current state of metadata performance in modern file systems
    corecore