312 research outputs found

    What broke where for distributed and parallel applications — a whodunit story

    Get PDF
    Detection, diagnosis and mitigation of performance problems in today\u27s large-scale distributed and parallel systems is a difficult task. These large distributed and parallel systems are composed of various complex software and hardware components. When the system experiences some performance or correctness problem, developers struggle to understand the root cause of the problem and fix in a timely manner. In my thesis, I address these three components of the performance problems in computer systems. First, we focus on diagnosing performance problems in large-scale parallel applications running on supercomputers. We developed techniques to localize the performance problem for root-cause analysis. Parallel applications, most of which are complex scientific simulations running in supercomputers, can create up to millions of parallel tasks that run on different machines and communicate using the message passing paradigm. We developed a highly scalable and accurate automated debugging tool called PRODOMETER, which uses sophisticated algorithms to first, create a logical progress dependency graph of the tasks to highlight how the problem spread through the system manifesting as a system-wide performance issue. Second, uses this logical progress dependence graph to identify the task where the problem originated. Finally, PRODOMETER pinpoints the code region corresponding to the origin of the bug. Second, we developed a tool-chain that can detect performance anomaly using machine-learning techniques and can achieve very low false positive rate. Our input-aware performance anomaly detection system consists of a scalable data collection framework to collect performance related metrics from different granularity of code regions, an offline model creation and prediction-error characterization technique, and a threshold based anomaly-detection-engine for production runs. Our system requires few training runs and can handle unknown inputs and parameter combinations by dynamically calibrating the anomaly detection threshold according to the characteristics of the input data and the characteristics of the prediction-error of the models. Third, we developed performance problem mitigation scheme for erasure-coded distributed storage systems. Repair operations of the failed blocks in erasure-coded distributed storage system take really long time in networked constrained data-centers. The reason being, during the repair operation for erasure-coded distributed storage, a lot of data from multiple nodes are gathered into a single node and then a mathematical operation is performed to reconstruct the missing part. This process severely congests the links toward the destination where newly recreated data is to be hosted. We proposed a novel distributed repair technique, called Partial-Parallel-Repair (PPR) that performs this reconstruction in parallel on multiple nodes and eliminates network bottlenecks, and as a result, greatly speeds up the repair process. Fourth, we study how for a class of applications, performance can be improved (or performance problems can be mitigated) by selectively approximating some of the computations. For many applications, the main computation happens inside a loop that can be logically divided into a few temporal segments, we call phases. We found that while approximating the initial phases might severely degrade the quality of the results, approximating the computation for the later phases have very small impact on the final quality of the result. Based on this observation, we developed an optimization framework that for a given budget of quality-loss, would find the best approximation settings for each phase in the execution

    Distributed Storage for Proximity Based Services

    Get PDF
    Mobiilidatan määrä on kasvanut dramaattisesti viime vuosien aikana. Lisäksi mobiililaitteiden tallennuskapasiteetti on kasvanut. Tämän diplomityön päämääränä on suunnitella sellainen hajautettu tiedontallennusjärjestelmä, joka vähentää mobiilijärjestelmän kokonaisenergiankulutusta hyödyntämällä mobiililaitteiden tallennustilaa. Tässä työssä oletetaan, että mobiililaitteiden tallennustilaa voidaan käyttää tiedostojen tai tiedostojen osien tallentamiseen. Samoin tässä työssä oletetaan, että mobiilikäyttäjät voivat ladata dataa toisiltaan, ja että bitin lähettäminen käyttäjältä toiselle on halvempaa (kuluttaa vähemmän energiaa) kuin bitin lähettäminen tukiasemalta käyttäjälle. Tämä on realistinen oletus, jos tukiasema on kaukana käyttäjistä ja käyttäjät ovat lähellä toisiaan. Hajautetussa tiedontallennuksessa tiedosto tallennetaan osina usealle (mieluiten riippumattomalle) tiedontallennuslaitteelle. Regeneroivat koodit ovat koodeja, jotka on suunniteltu nimenomaan hajautettuun tiedontallennukseen. Tässä työssä tutkitaan, miten ja milloin regeneroivia koodeja voidaan käyttää sellaisissa tiedontallennusjärjestelmissä, joissa tietoa voi tallentaa itse käyttäjille. Tässä työssä vertailtiin järjestelmää, joka ei käytä hajautettua tallennusta järjestelmiin, jotka käyttävät koodaamatonta, pariteettikoodattua ja regeneroivilla koodeilla koodattua hajautettua tallennusta. Koodaamattomalla hajautetulla tallennuksella saavutettiin 15 %:n energiansäästö. Pariteettikoodauksella saavutettiin 24 %:n energiansäästö, kun taas regeneroivilla koodeilla saavutettiin 26 %:n säästö. Näin ollen tässä työssä esitelty regeneroiviin koodeihin perustuva tallennusmenetelmä oli valituista menetelmistä energiatehokkain.Over the last couple of years, the amount of mobile data traffic has been drastically increasing. Also, the storage capacity of mobile devices has been increasing. The main focus of this thesis is designing a distributed storage system that takes advantage of the available storage capacity of mobile terminals in order to decrease the expected power consumption of wireless transmission systems. In this thesis, it is assumed that the storage capacity of mobile devices can be used to store data files or fractions of data files. Furthermore, it is assumed that any user can download data from other users and transmitting a bit from one user to another is less expensive (consumes less energy) than transmitting a bit from a base station to a user. This is a realistic assumption if the base station is far away from the users whilst the users are close to each other. Distributed storage is a means of storing data on several (preferably independent) storage devices. Regenerating codes are erasure codes that are specifically designed for distributed storage. In this work, we investigate if and when regenerating codes should be applied to a system where data can be stored on mobile terminals. For a default system setup, the energy consumption of a system that does not take advantage of the available storage capacity of the user terminals was compared with the energy consumption of systems that apply distributed storage techniques: a method with uncoded distributed storage offered a 15% saving, while a method with traditional erasure coding (parity coding) yielded a 24% saving. Ultimately, our distributed storage method with regenerating codes consumed 26% less energy and was, thus, the most energy efficient solution

    SDSF : social-networking trust based distributed data storage and co-operative information fusion.

    Get PDF
    As of 2014, about 2.5 quintillion bytes of data are created each day, and 90% of the data in the world was created in the last two years alone. The storage of this data can be on external hard drives, on unused space in peer-to-peer (P2P) networks or using the more currently popular approach of storing in the Cloud. When the users store their data in the Cloud, the entire data is exposed to the administrators of the services who can view and possibly misuse the data. With the growing popularity and usage of Cloud storage services like Google Drive, Dropbox etc., the concerns of privacy and security are increasing. Searching for content or documents, from this distributed stored data, given the rate of data generation, is a big challenge. Information fusion is used to extract information based on the query of the user, and combine the data and learn useful information. This problem is challenging if the data sources are distributed and heterogeneous in nature where the trustworthiness of the documents may be varied. This thesis proposes two innovative solutions to resolve both of these problems. Firstly, to remedy the situation of security and privacy of stored data, we propose an innovative Social-based Distributed Data Storage and Trust based co-operative Information Fusion Framework (SDSF). The main objective is to create a framework that assists in providing a secure storage system while not overloading a single system using a P2P like approach. This framework allows the users to share storage resources among friends and acquaintances without compromising the security or privacy and enjoying all the benefits that the Cloud storage offers. The system fragments the data and encodes it to securely store it on the unused storage capacity of the data owner\u27s friends\u27 resources. The system thus gives a centralized control to the user over the selection of peers to store the data. Secondly, to retrieve the stored distributed data, the proposed system performs the fusion also from distributed sources. The technique uses several algorithms to ensure the correctness of the query that is used to retrieve and combine the data to improve the information fusion accuracy and efficiency for combining the heterogeneous, distributed and massive data on the Cloud for time critical operations. We demonstrate that the retrieved documents are genuine when the trust scores are also used while retrieving the data sources. The thesis makes several research contributions. First, we implement Social Storage using erasure coding. Erasure coding fragments the data, encodes it, and through introduction of redundancy resolves issues resulting from devices failures. Second, we exploit the inherent concept of trust that is embedded in social networks to determine the nodes and build a secure net-work where the fragmented data should be stored since the social network consists of a network of friends, family and acquaintances. The trust between the friends, and availability of the devices allows the user to make an informed choice about where the information should be stored using `k\u27 optimal paths. Thirdly, for the purpose of retrieval of this distributed stored data, we propose information fusion on distributed data using a combination of Enhanced N-grams (to ensure correctness of the query), Semantic Machine Learning (to extract the documents based on the context and not just bag of words and also considering the trust score) and Map Reduce (NSM) Algorithms. Lastly we evaluate the performance of distributed storage of SDSF using era- sure coding and identify the social storage providers based on trust and evaluate their trustworthiness. We also evaluate the performance of our information fusion algorithms in distributed storage systems. Thus, the system using SDSF framework, implements the beneficial features of P2P networks and Cloud storage while avoiding the pitfalls of these systems. The multi-layered encrypting ensures that all other users, including the system administrators cannot decode the stored data. The application of NSM algorithm improves the effectiveness of fusion since large number of genuine documents are retrieved for fusion

    Improving Data Availability in Decentralized Storage Systems

    Get PDF
    PhD thesis in Information technologyPreserving knowledge for future generations has been a primary concern for humanity since the dawn of civilization. State-of-the-art methods have included stone carvings, papyrus scrolls, and paper books. With each advance in technology, it has become easier to record knowledge. In the current digital age, humanity may preserve enormous amounts of knowledge on hard drives with the click of a button. The aggregation of several hard drives into a computer forms the basis for a storage system. Traditionally, large storage systems have comprised many distinct computers operated by a single administrative entity. With the rise in popularity of blockchain and cryptocurrencies, a new type of storage system has emerged. This new type of storage system is fully decentralized and comprises a network of untrusted peers cooperating to act as a single storage system. During upload, files are split into chunks and distributed across a network of peers. These storage systems encode files using Merkle trees, a hierarchical data structure that provides integrity verification and lookup services. While decentralized storage systems are popular and have a user base in the millions, many technical aspects are still in their infancy. As such, they have yet to prove themselves viable alternatives to traditional centralized storage systems. In this thesis, we contribute to the technical aspects of decentralized storage systems by proposing novel techniques and protocols. We make significant contributions with the design of three practical protocols that each improve data availability in different ways. Our first contribution is Snarl and entangled Merkle trees. Entangled Merkle trees are resilient data structures that decrease the impact hierarchical dependencies have on data availability. Whenever a chunk loss is detected, Snarl uses the entangled Merkle trees to find parity chunks to repair the lost chunk. Our results show that by encoding data as an entangled Merkle tree and using Snarl’s repair algorithm, the storage utilization in current systems could be improved by over five times, with improved data availability. Second, we propose SNIPS, a protocol that efficiently synchronizes the data stored on peers to ensure that all peers have the same data. We designed a Proof of Storage-like construction using a Minimal Perfect Hash Function. Each peer uses the PoS-like construction to create a storage proof for those chunks it wants to synchronize. Peers exchange storage proofs and use them to efficiently determine which chunks they are missing. The evaluation shows that by using SNIPS, the amount of synchronization data can be reduced by three orders of magnitude in current systems. Lastly, in our third contribution, we propose SUP, a protocol that uses cryptographic proofs to check if a chunk is already stored in the network before doing wasteful uploads. We show that SUP may reduce the amount of data transferred by up to 94 % in current systems. The protocols may be deployed independently or in combination to create a decentralized storage system that is more robust to major outages. Each of the protocols has been implemented and evaluated on a large cluster of 1,000 peers

    Analysis of material efficiency aspects of personal computers product group

    Get PDF
    This report has been developed within the project ‘Technical support for environmental footprinting, material efficiency in product policy and the European Platform on Life Cycle Assessment’ (LCA) (2013-2017) funded by the Directorate-General for Environment. The report summarises the findings of the analysis of material-efficiency aspects of the personal-computer (PC) product group, namely durability, reusability, reparability and recyclability. It also aims to identify material-efficiency aspects which can be relevant for the current revision of the Ecodesign Regulation (EU) No 617/2013. Special focus was given to the content of EU critical raw materials (CRMs) ( ) in computers and computer components, and how to increase the efficient use of these materials, including material savings thanks to reuse and repair and recovery of the products at end of life. The analysis has been based mainly on the REAPro method ( ) developed by the Joint Research Centre for the material-efficiency assessment of products. This work has been carried out in the period June 2016-September 2017, in parallel with the development of The preparatory study on the review of Regulation 617/2013 (Lot 3) — computers and computer servers led by Viegand Maagøe and Vlaamse Instelling voor Technologisch Onderzoek NV (VITO) (2017) ( ). During this period, close communication was maintained with the authors of the preparatory study. This allowed ensuring consistency between input data and assumptions of the two studies. Moreover, outcomes of the present research were used as scientific basis for the preparatory study for the analysis of material-efficiency aspects for computers. The research has been differentiated as far as possible for different types of computers (i.e. tablet, notebooks and desktop computers). The report starts with the analysis of the technical and scientific background relevant for material-efficiency aspects of computers, such as market sales, expected lifetime, bill of materials, and a focus on the content of CRMs (especially cobalt in batteries, rare earths including neodymium in hard disk drives and palladium in printed circuit boards). Successively the report analyses the current practices for repair, reuse and recycling of computers. Based on results available from the literature, material efficiency of the product group has the potential to be improved, in particular the lifetime extension. The residence time ( ) of IT equipment put on the market in 2000 versus 2010 generally declined by approximately 10 % (Huisman et al., 2012), while consumers expressed their preference for durable goods, lasting considerably longer than they are typically used (Wieser and Tröger, 2016). Design barriers (such as difficulties for the disassembly of certain components or for their processing for data sanitisation) can hinder the repair and the reuse of products. Malfunction and accident rates are not negligible (IDC, 2016, 2010; SquareTrade, 2009) and difficulties in repair may bring damaged products to be discarded even if still functioning. Once a computer reaches the end of its useful life, it is addressed to ‘waste of electrical and electronic equipment’ (WEEE) recycling plants. Recycling of computers is usually based on a combination of manual dismantling of certain components (mainly components containing hazardous substances or valuable materials, e.g. batteries, printed circuit boards, display panels, data-storage components), followed by mechanical processing including shredding. The recycling of traditional desktop computers is perceived as non-problematic by recyclers, with the exception of some miniaturised new models (i.e. mini desktop computers), which still are not found in recycling plants and which could present some difficulties for the extraction of printed circuit boards and batteries (if present). The design of notebooks and tablets can originate some difficulties for the dismantling of batteries, especially for computers with compact design. Recycling of plastics from computers of all types is generally challenging due to the large use of different plastics with additives, such as flame retardants. According to all the interviewed recyclers, recycling of WEEE plastics with flame retardant is very poor or null with current technologies. Building on this analysis, the report then focuses on possible actions to improve material efficiency in computers, namely measures to improve (a) waste prevention, (b) repair and reuse and (c) design for recycling. The possible actions identified are listed hereinafter. (a) Waste prevention a.1 Implementation of dedicated functionality ( ) for the optimisation of the lifetime of batteries in notebooks: the lifetime of batteries could be extended by systematically implementing a preinstalled functionality on notebooks, which makes it possible to optimise the state of charge (SoC) of the battery when the device is used in grid operation (stationary). By preventing the battery remaining at full load when the notebook is in grid operation, the lifetime of batteries can be potentially extended by up to 50 %. Users could be informed about the existence and characteristics of such a functionality and the potential benefits related to its use. a.2 Decoupling external power supplies (EPS) from personal computers: the provision of information on the EPS specifications and the presence/absence of the EPS in the packaging of notebooks and tablets could facilitate the reuse by the consumer of already-available EPS with suitable characteristics. Such a measure could promote the use of common EPS across different devices, as well as the reuse of already-owned EPS. This would result in a reduction in material consumption for the production of unnecessary power supplies (and related packaging and transport) and overall a reduction of treatment of electronic waste. The International Electrotechnical Commission (IEC) technical specification (TS) 62700, the Standard Institute of Electrical and Electronics Engineers (IEEE) 1823 and Recommendation ITU-T L.1002 can be used to develop standards for the correct definition of connectors and power specifications. a.3 Provision of information about the durability of batteries: the analysis identified the existence of endurance tests suitable for the assessment of the durability of batteries in computers according to existing standards (e.g. EN 61960). The availability of information about these endurance tests could help users to get an indication on the residual capacity of the battery after a predefined number of charge/discharge cycles. Moreover, such information would allow for comparison between different products and potentially push the market towards longer-lasting batteries. a.4 Provision of information about the ‘liquid ingress protection (IP) class’ for personal computers: this can be assessed for a notebook or tablet by performing specific tests, developed according to existing standards (e.g. IEC 60529). Users can be informed about the level of protection of the computer against the ingress of liquids (e.g. dripping water or spraying water or water jets) and in this way prevent one of the most common causes of computer failure. The yearly rate of estimated material saving if dedicated functionality for the optimisation of the lifetime of batteries (a.1) were used ranges from around 2 360 to 5 400 tonnes (t) of different materials per year. About 450 t of cobalt, 100 t of lithium, 210 t of nickel and 730 t of copper could be saved every year. The estimated potential savings of materials when EPS are decoupled from notebooks and tablets (a.2) are in the range 2 300-4 600 t/year (80 % related to the notebook category, and 20 % to tablets). These values can be obtained when 10-20 % of notebooks and tablets are sold without an EPS, as users can reuse already-owned and compatible EPS. Under these conditions, for example, about 190-370 t of copper can be saved every year. This estimate may increase when the same EPS can be used for both notebooks and tablets (at the moment the assessment is based on the assumption that the two product types were kept separated). Further work is needed to assess the potential improvements thanks to the provision of information about the durability of batteries (a.3), and about the ‘liquid-IP class’ (a.4). The former option (a.3) has the potential to boost competition among battery manufacturers, resulting in more durable products. The latter option (a.4) has the potential to reduce computer damage due to liquid spillage, ranked among the most recurrent failure modes. (b) Repair/reuse b.1 and b.2 Provision of information to facilitate computer disassembly: the disassembly of relevant components (such as the display panel, keyboard, data storage, batteries, memory and internal power-supply units) plays a key role to enhance repair and reuse of personal computers. Some actions have therefore been discussed (b.1) to provide professional repair operators with documentation about the sequence of disassembly, extraction, replacement and reassembly operations needed for each relevant component of personal computers, and (b.2) to provide end-users with specific information about the disassembly and replacement of batteries in notebooks and tablets. b.3 Secure data deletion for personal computers: this is the process of deliberately, permanently and irreversibly erasing all traces of existing data from storage media, overwriting the data completely in such a way that access to the original data, or parts of them, becomes infeasible for a given level of effort. Secure data deletion is essential for the security of personal data and to allow the reuse of computers by a different user. Secure data deletion for personal computers can be ensured by means of built-in functionality. A number of existing national standards (HMG IS Standard No 5 (the United Kingdom), DIN 66399 (Germany), NIST 800-88r1 (the United States (US)) can be used as a basis to start standardisation activities on secure data deletion. The estimated potential savings of materials due to the provision of information and tools to facilitate computer disassembly were quantified in the range of 150-620 t/year for mobile computers (notebooks and tablets) within the first 2 years of use, and in the range of 610 2 460 t/year for mobile computers older than 2 years. Secure data deletion of personal computers, instead, is considered a necessary prerequisite to enhance reuse. The need to take action on this is related to policies on privacy and protection of personal data, as the General Data Protection Regulation (EU) 2016/679 and in particular its Article 25 on ‘data protection by design and by default’. Future work is needed to strengthen the analysis, however it was estimated that secure data deletion has the potential to double volume of desktop, notebook and tablet computers reused after the first useful lifetime. (c) Recyclability c.1 Provision of information to facilitate computer dismantling: computers could be designed so that crucial components for material aspects (e.g. content of hazardous substances and/or valuable materials) can be easily identified and extracted in order to be processed by means of specific recycling treatments. Design for dismantling can focus on components listed in Annex VII of the WEEE directive ( ). The ‘ease of dismantling’ can be supported by the provision of relevant information (such as a diagram of the product showing the location of the components, the content of hazardous substances, instructions on the sequence of operations needed to remove these components, including type and number of fastening techniques to be unlocked, and tool(s) required). c.2 Marking of plastic components: although all plastics are theoretically recyclable, in practice the recyclability of plastics in computers is generally low, mainly due to the large amount of different plastic components with flame retardants (FRs) and other additives. Marking of plastic components according to existing standards (e.g. ISO 11469 and ISO 1043 series) can facilitate identification and sorting of plastic components during the manual dismantling steps of the recycling. c.3 FR content: according to all the recyclers interviewed, FRs are a major barrier to plastics recycling. Current mechanical-sorting processes of shredded plastics are characterised by low efficiency, while innovative sorting systems are still at the pilot stage and have been shown to be effective only in certain cases. Therefore, the provision of information on the content of FRs in plastic components is a first step to contribute to the improvement of plastics recycling. Plastics marking (as discussed above) can contribute to the separation of plastics with FRs during the manual dismantling, allowing for their recycling at higher rates (in line with the prescription of IEC/TR 62635, 2015). However, detailed information about FRs content could be given in a more systematised way, for example through the development of specific indexes. These indexes could support recyclers in checking the use of FRs in computers and in developing future processes and technologies suitable for plastics recycling. Moreover, these indexes could support policymakers in monitoring the use of FRs in the products and, in the medium-long term, to promote products that use smaller quantities of FRs. An example of a FR content index is provided in this report. c.4 Battery marks: the identification of the chemistry type of batteries in computers is necessary in order to have efficient identification and sorting, and thus to improve the material efficiency during the recycling. It is proposed to start standardisation activities to establish standard marking symbols for batteries. The examples of the ‘battery-recycle mark’, developed by the Battery Association of Japan (BAJ), and the current standardisation activities for the IEC 62902 (standard marking symbols for batteries with a volume higher than 900 cm3) may be used as references to develop ad hoc standards. The benefits of actions for the design for recycling can be relevant. In particular, the proposed actions should contribute to increase the amounts of materials that will be recycled (6 350-8 900 t/year), in particular plastics (5 950-7 960 t/year of additional plastics), but also metals such as cobalt (55-110 t), copper (240-610 t), rare earths as neodymium and dysprosium (2 7 t) and various precious metals (gold (0.1-0.4 t), palladium (0.1-0.4 t) and silver (2 7 t)). Compared to the amount of materials recycled in the EU (2012 data), these values would represent a recycling increase of 1-2 % for cobalt, 2-5 % for palladium, and 13-50 % for rare earths.JRC.D.3-Land Resource

    Substituting Failure Avoidance for Redundancy in Storage Fault Tolerance

    Get PDF
    The primary mechanism for overcoming faults in modern storage systems is to introduce redundancy in the form of replication and error correcting codes. The costs of such redundancy in hardware, system availability and overall complexity can be substantial, depending on the number and pattern of faults that are handled. This dissertation describes and analyzes, via simulation, a system that seeks to use disk failure avoidance to reduce the need for costly redundancy by using adaptive heuristics that anticipate such failures. While a number of predictive factors can be used, this research focuses on the three leading candidates of SMART errors, age and vintage. This approach can predict where near term disk failures are more likely to occur, enabling proactive movement/replication of at-risk data, thus maintaining data integrity and availability. This strategy can reduce costs due to redundant storage without compromising these important requirements
    corecore