3,536 research outputs found

    A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids

    Full text link
    The efficient implementation of collective communiction operations has received much attention. Initial efforts produced "optimal" trees based on network communication models that assumed equal point-to-point latencies between any two processes. This assumption is violated in most practical settings, however, particularly in heterogeneous systems such as clusters of SMPs and wide-area "computational Grids," with the result that collective operations perform suboptimally. In response, more recent work has focused on creating topology-aware trees for collective operations that minimize communication across slower channels (e.g., a wide-area network). While these efforts have significant communication benefits, they all limit their view of the network to only two layers. We present a strategy based upon a multilayer view of the network. By creating multilevel topology-aware trees we take advantage of communication cost differences at every level in the network. We used this strategy to implement topology-aware versions of several MPI collective operations in MPICH-G2, the Globus Toolkit[tm]-enabled version of the popular MPICH implementation of the MPI standard. Using information about topology provided by MPICH-G2, we construct these multilevel topology-aware trees automatically during execution. We present results demonstrating the advantages of our multilevel approach by comparing it to the default (topology-unaware) implementation provided by MPICH and a topology-aware two-layer implementation.Comment: 16 pages, 8 figure

    Cluster, Classify, Regress: A General Method For Learning Discountinous Functions

    Full text link
    This paper presents a method for solving the supervised learning problem in which the output is highly nonlinear and discontinuous. It is proposed to solve this problem in three stages: (i) cluster the pairs of input-output data points, resulting in a label for each point; (ii) classify the data, where the corresponding label is the output; and finally (iii) perform one separate regression for each class, where the training data corresponds to the subset of the original input-output pairs which have that label according to the classifier. It has not yet been proposed to combine these 3 fundamental building blocks of machine learning in this simple and powerful fashion. This can be viewed as a form of deep learning, where any of the intermediate layers can itself be deep. The utility and robustness of the methodology is illustrated on some toy problems, including one example problem arising from simulation of plasma fusion in a tokamak.Comment: 12 files,6 figure

    {\sc CosmoNet}: fast cosmological parameter estimation in non-flat models using neural networks

    Full text link
    We present a further development of a method for accelerating the calculation of CMB power spectra, matter power spectra and likelihood functions for use in cosmological Bayesian inference. The algorithm, called {\sc CosmoNet}, is based on training a multilayer perceptron neural network. We compute CMB power spectra (up to =2000\ell=2000) and matter transfer functions over a hypercube in parameter space encompassing the 4σ4\sigma confidence region of a selection of CMB (WMAP + high resolution experiments) and large scale structure surveys (2dF and SDSS). We work in the framework of a generic 7 parameter non-flat cosmology. Additionally we use {\sc CosmoNet} to compute the WMAP 3-year, 2dF and SDSS likelihoods over the same region. We find that the average error in the power spectra is typically well below cosmic variance for spectra, and experimental likelihoods calculated to within a fraction of a log unit. We demonstrate that marginalised posteriors generated with {\sc CosmoNet} spectra agree to within a few percent of those generated by {\sc CAMB} parallelised over 4 CPUs, but are obtained 2-3 times faster on just a \emph{single} processor. Furthermore posteriors generated directly via {\sc CosmoNet} likelihoods can be obtained in less than 30 minutes on a single processor, corresponding to a speed up of a factor of 32\sim 32. We also demonstrate the capabilities of {\sc CosmoNet} by extending the CMB power spectra and matter transfer function training to a more generic 10 parameter cosmological model, including tensor modes, a varying equation of state of dark energy and massive neutrinos. {\sc CosmoNet} and interfaces to both {\sc CosmoMC} and {\sc Bayesys} are publically available at {\tt www.mrao.cam.ac.uk/software/cosmonet}.Comment: 8 pages, submitted to MNRA

    Predicting the Performance of MPI Applications over Different Grid Architectures

    Get PDF
    في الوقت الحاضر خوارزميات التحسين عالية السرعة تكون مطلوبة. في معظم الحالات ، يحتاج الباحثــــــــون إلى طريقة للتنبؤ ببعض المعايير بدقة مقبولة لاستخدامها في خوارزمياتهم. ومع ذلك ، في مجال الحوسبة المتوازية يمكن اعتبار وقت التنفيذ من أهم المعايير. لذا، يعرض هذا البحث نموذجًا جديدًا للتنبؤ بالوقت للتنفيذ لتطبيقات المتوازيـــــة الموزعة المنفذه على العديد من سيناريوهات الشبكة. حيث يمتلك النموذج المقترح القدرة علـــــــــى التنبؤ بوقت تنفيذ التطبيقات المتوازية التي تعمل عبر أي تكوين للشبكة من حيث عدد العقد المختلفة وقوى الحوسبة الخاصة بها. لقد تم تنفيذ التجارب على المحاكي  سمكرد الذي يمتلك خاصية السهولة في بناء نماذج شبكية متعدد ومختلفة. نتائــج الاختبارات بين اوقات التنفيذ الاصلية والاوقات المتنبئة بينت دقة تجيربية جيده. معدل الخطأ النسبي بين وقت التنفيذ الاصلي والمتنبأ لثلاث برامج معيارية تكون هي 4.36٪، 5.79٪ و 6.81٪.Nowadays, the high speed and accurate optimization algorithms are required. In most of the cases, researchers need a method to predict some criteria with acceptable accuracy to use it after in their algorithms. However, in the field of parallel computing the execution time can be considered the most important criteria. Consequently, this paper presents new execution time prediction model for message passing interface applications execute over numerous grid scenarios. The model has ability to predict the execution time of the message passing applications running over any grid configuration in term of different number of nodes and their computing powers. The experiments are evaluated over SimGrid simulator to simulate the grid configuration scenarios. The results of comparing the real and the predicted execution time show a good accuracy. The average error ratio between the real and the predicted execution time for three benchmarks are 4.36%, 5.79% and 6.81%

    The edge cloud: A holistic view of communication, computation and caching

    Get PDF
    The evolution of communication networks shows a clear shift of focus from just improving the communications aspects to enabling new important services, from Industry 4.0 to automated driving, virtual/augmented reality, Internet of Things (IoT), and so on. This trend is evident in the roadmap planned for the deployment of the fifth generation (5G) communication networks. This ambitious goal requires a paradigm shift towards a vision that looks at communication, computation and caching (3C) resources as three components of a single holistic system. The further step is to bring these 3C resources closer to the mobile user, at the edge of the network, to enable very low latency and high reliability services. The scope of this chapter is to show that signal processing techniques can play a key role in this new vision. In particular, we motivate the joint optimization of 3C resources. Then we show how graph-based representations can play a key role in building effective learning methods and devising innovative resource allocation techniques.Comment: to appear in the book "Cooperative and Graph Signal Pocessing: Principles and Applications", P. Djuric and C. Richard Eds., Academic Press, Elsevier, 201
    corecore