476 research outputs found

    Optimization and Management of Large-scale Scientific Workflows in Heterogeneous Network Environments: From Theory to Practice

    Get PDF
    Next-generation computation-intensive scientific applications feature large-scale computing workflows of various structures, which can be modeled as simple as linear pipelines or as complex as Directed Acyclic Graphs (DAGs). Supporting such computing workflows and optimizing their end-to-end network performance are crucial to the success of scientific collaborations that require fast system response, smooth data flow, and reliable distributed operation.We construct analytical cost models and formulate a class of workflow mapping problems with different mapping objectives and network constraints. The difficulty of these mapping problems essentially arises from the topological matching nature in the spatial domain, which is further compounded by the resource sharing complicacy in the temporal dimension. We provide detailed computational complexity analysis and design optimal or heuristic algorithms with rigorous correctness proof or performance analysis. We decentralize the proposed mapping algorithms and also investigate these optimization problems in unreliable network environments for fault tolerance.To examine and evaluate the performance of the workflow mapping algorithms before actual deployment and implementation, we implement a simulation program that simulates the execution dynamics of distributed computing workflows. We also develop a scientific workflow automation and management platform based on an existing workflow engine for experimentations in real environments. The performance superiority of the proposed mapping solutions are illustrated by extensive simulation-based comparisons with existing algorithms and further verified by large-scale experiments on real-life scientific workflow applications through effective system implementation and deployment in real networks

    PA-iMFL: Communication-Efficient Privacy Amplification Method against Data Reconstruction Attack in Improved Multi-Layer Federated Learning

    Full text link
    Recently, big data has seen explosive growth in the Internet of Things (IoT). Multi-layer FL (MFL) based on cloud-edge-end architecture can promote model training efficiency and model accuracy while preserving IoT data privacy. This paper considers an improved MFL, where edge layer devices own private data and can join the training process. iMFL can improve edge resource utilization and also alleviate the strict requirement of end devices, but suffers from the issues of Data Reconstruction Attack (DRA) and unacceptable communication overhead. This paper aims to address these issues with iMFL. We propose a Privacy Amplification scheme on iMFL (PA-iMFL). Differing from standard MFL, we design privacy operations in end and edge devices after local training, including three sequential components, local differential privacy with Laplace mechanism, privacy amplification subsample, and gradient sign reset. Benefitting from privacy operations, PA-iMFL reduces communication overhead and achieves privacy-preserving. Extensive results demonstrate that against State-Of-The-Art (SOTA) DRAs, PA-iMFL can effectively mitigate private data leakage and reach the same level of protection capability as the SOTA defense model. Moreover, due to adopting privacy operations in edge devices, PA-iMFL promotes up to 2.8 times communication efficiency than the SOTA compression method without compromising model accuracy.Comment: 12 pages, 11 figure

    Economics and Engineering for Preserving Digital Content

    Get PDF
    Progress towards practical long-term preservation seems to be stalled. Preservationists cannot afford specially developed technology, but must exploit what is created for the marketplace. Economic and technical facts suggest that most preservation ork should be shifted from repository institutions to information producers and consumers. Prior publications describe solutions for all known conceptual challenges of preserving a single digital object, but do not deal with software development or scaling to large collections. Much of the document handling software needed is available. It has, however, not yet been selected, adapted, integrated, or deployed for digital preservation. The daily tools of both information producers and information consumers can be extended to embed preservation packaging without much burdening these users. We describe a practical strategy for detailed design and implementation. Document handling is intrinsically complicated because of human sensitivity to communication nuances. Our engineering section therefore starts by discussing how project managers can master the many pertinent details.

    Power System Stability Assessment with Supervised Machine Learning

    Get PDF
    Power system stability assessment has become an important area of research due to the increased penetration of photovoltaics (PV) in modern power systems. This work explores how supervised machine learning can be used to assess power system stability for the Western Electricity Coordinating Council (WECC) service region as part of the Data-driven Security Assessment for the Multi-Timescale Integrated Dynamics and Scheduling for Solar (MIDAS) project. Data-driven methods offer to improve power flow scheduling through machine learning prediction, enabling better energy resource management and reducing demand on real-time time-domain simulations. Frequency, transient, and small signal stability datasets were created using the 240-bus and reduced 18-bus models of the WECC system. Supervised machine learning was performed to predict the system’s frequency nadir, critical clearing time, and damping ratio, respectively. In addition to varying algorithm hyperparameters, experiments were performed to evaluate model prediction performance through various data entry methods, data allocation methods during model development, and preprocessing techniques. This work also begins analysis of Electric Reliability Council of Texas (ERCOT) grid behavior during extreme frequency events, and provides suggestions for potential supervised machine learning applications in the future. Timestamped frequency event data is collected every 100 milliseconds from Frequency Disturbance Recorders (FDRs) installed in the ERCOT service territory by the Power Information Technology Laboratory at the University of Tennessee, Knoxville. The data is filtered, and the maximum Rate of Change of Frequency (ROCOF) is calculated using the windowing technique. Trends in data are evaluated, and ROCOF prediction performance is verified against another ROCOF calculation technique

    A novel method for sample preparation of fresh lung cancer tissue for proteomics analysis by tumor cell enrichment and removal of blood contaminants

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In-depth proteomics analyses of tumors are frequently biased by the presence of blood components and stromal contamination, which leads to large experimental variation and decreases the proteome coverage. We have established a reproducible method to prepare freshly collected lung tumors for proteomics analysis, aiming at tumor cell enrichment and reduction of plasma protein contamination. We obtained enriched tumor-cell suspensions (ETS) from six lung cancer cases (two adenocarcinomas, two squamous-cell carcinomas, two large-cell carcinomas) and from two normal lung samples. The cell content of resulting ETS was evaluated with immunocytological stainings and compared with the histologic pattern of the original specimens. By means of a quantitative mass spectrometry-based method we evaluated the reproducibility of the sample preparation protocol and we assessed the proteome coverage by comparing lysates from ETS samples with the direct lysate of corresponding fresh-frozen samples.</p> <p>Results</p> <p>Cytological analyses on cytospin specimens showed that the percentage of tumoral cells in the ETS samples ranged from 20% to 70%. In the normal lung samples the percentage of epithelial cells was less then 10%. The reproducibility of the sample preparation protocol was very good, with coefficient of variation at the peptide level and at the protein level of 13% and 7%, respectively. Proteomics analysis led to the identification of a significantly higher number of proteins in the ETS samples than in the FF samples (244 vs 109, respectively). Albumin and hemoglobin were among the top 5 most abundant proteins identified in the FF samples, showing a high contamination with blood and plasma proteins, whereas ubiquitin and the mitochondrial ATP synthase 5A1 where among the top 5 most abundant proteins in the ETS samples.</p> <p>Conclusion</p> <p>The method is feasible and reproducible. We could obtain a fair enrichment of cells but the major benefit of the method was an effective removal of contaminants from red blood cells and plasma proteins resulting in larger proteome coverage compared to the direct lysis of frozen samples. This sample preparation method may be successfully implemented for the discovery of lung cancer biomarkers on tissue samples using mass spectrometry-based proteomics.</p
    • …
    corecore