Search CORE

907 research outputs found

Inter-motherboard Memory Scheduling

Author: Serrano Gómez Mónica
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 28/12/2011
Field of study

Exploring the performance benefits of applying memory scheduling beyond the motherboardSerrano Gómez, M. (2009). Inter-motherboard Memory Scheduling. http://hdl.handle.net/10251/14163Archivo delegad

RiuNet

A cost-effective heuristic to schedule local and remote memory in cluster computers

Author: A Levitin
CN Keltcher
H Midorikawa
Houcine Hassan
IBM journal of Research and Development staff
J Duato
J Oleszkiewicz
José Duato
Julio Sahuquillo
M Blocksome
M Nussle
M Oguchi
M Serrano
Mónica Serrano
P Werstein
S Kumar
S Liang
Salvador Petit
SC Woo
V Tipparaju
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2012
Field of study

Cluster computers represent a cost-effective alternative solution to supercomputers. In these systems, it is common to constrain the memory address space of a given processor to the local motherboard. Constraining the system in this way is much cheaper than using a full-fledged shared memory implementation among motherboards. However, memory usage among motherboards can be unfairly balanced. On the other hand, remote memory access (RMA) hardware provides fast interconnects among the motherboards of a cluster. RMA devices can be used to access remote RAM memory from a local motherboard. This work focuses on this capability in order to achieve a better global use of the total RAM memory in the system. More precisely, the address space of local applications is extended to remote motherboards and is used to access remote RAM memory. This paper presents an ideal memory scheduling algorithm and proposes a cost-effective heuristic to allocate local and remote memory among local applications. Compared to the devised ideal algorithm, the heuristic obtains the same (or closely resembling) results while largely reducing the computational cost. In addition, we analyze the impact on the performance of stand alone applications varying the memory distribution among regions (local, local to board, and remote). Then, this study is extended to any number of concurrent applications. Experimental results show that a QoS parameter is needed in order to avoid unacceptable performance degradation. © 2011 Springer Science+Business Media, LLC.This work was supported by Spanish CICYT under Grant TIN2009-14475-C04-01 and by Consolider-Ingenio under Grant CSD2006-00046.Serrano Gómez, M.; Sahuquillo Borrás, J.; Petit Martí, SV.; Hassan Mohamed, H.; Duato Marín, JF. (2012). A cost-effective heuristic to schedule local and remote memory in cluster computers. Journal of Supercomputing. 59(3):1533-1551. https://doi.org/10.1007/s11227-011-0566-8S15331551593IBM journal of Research and Development staff (2008) Overview of the IBM blue gene/P project. IBM J Res Dev 52(1/2):199–220Blocksome M, Archer C, Inglett T, McCarthy P, Mundy M, Ratterman J, Sidelnik A, Smith B, Almási G, Castaños J, Lieber D, Moreira J, Krishnamoorthy S, Tipparaju V, Nieplocha J (2006) Design and implementation of a one-sided communication interface for the IBM eServer Blue Gene® supercomputer. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing, SC ’06, Tampa, FL, USA, November 2006, pp 54–54Kumar S, Dózsa G, Almasi G, Heidelberger P, Chen D, Giampapa M, Blocksome M, Faraj A, Parker J, Ratterman J, Smith BE, Archer C (2008) The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer. In: Proceedings of the 22nd annual international conference on supercomputing, Island of Kos, Greece, June 2008, pp 94–103Tipparaju V, Kot A, Nieplocha J, Bruggencate MT, Chrisochoides N (2007) Evaluation of remote memory access communication on the cray XT3. In: Proceedings of the 21th international parallel and distributed processing symposium, Long Beach, California, USA, March 2007, pp 1–7Nussle M, Scherer M, Bruning U (2009) A resource optimized remote-memory-access architecture for low-latency communication. In: International conference on parallel processing, Sept 2009, pp 220–227http://www.hypertransport.org/Serrano M, Sahuquillo J, Hassan H, Petit S, Duato J (2010) A scheduling heuristic to handle local and remote memory in cluster computers. In: Proceedings of the 12th IEEE international conference on high performance computing, Melbourne, Australia, Sept 2010, pp 35–42Keltcher CN, McGrath KJ, Ahmed A, Conway P (2003) The AMD opteron processor for multiprocessor servers. IEEE MICRO 23(2):66–76Duato J, Silla F, Yalamanchili S (2009) Extending hypertransport protocol for improved scalability. In: First international workshop on hypertransport research and applications.Litz H, Fröening H, Nuessle M, Brüening U (2007) A hypertransport network interface controller for ultra-low latency message transfers. HyperTransport Consortium White Paperhttps://www.simics.net/http://www.cs.wisc.edu/gems/http://www.cs.virginia.edu/stream/Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: Characterization and methodological considerations. In: Proceedings of the 22nd annual international symposium on computer architecture, New York, NY, USA, 1995, pp 24–36Levitin A (2003) Introduction to the design and analysis of algorithms. Addison Wesley, ReadingOleszkiewicz J, Xiao L, Liu Y (2004) Parallel network RAM: Effectively utilizing global cluster memory for large data-intensive parallel programs. In: Proceedings of 33rd international conference on parallel processing, Montreal, Quebec, Canada, pp 353–360Liang S, Noronha R, Panda DK (2005) Swapping to remote memory over infiniband: An approach using a high performance network block device. In: Proceedings of the 2005 IEEE international conference on cluster computing, Boston, Massachusetts, USA, pp 1–10Oguchi M, Kitsuregawa M (2000) Using available remote memory dynamically for parallel data mining application on ATM-connected PC cluster. In: Proceedings of the 14th international parallel & distributed processing symposium, Cancun, Mexico, pp 411–420Werstein P, Jia X, Huang Z (2007) A remote memory swapping system for cluster computers. In: Proceedings of the eighth international conference on parallel and distributed computing, applications and technologies, Adelaide, Australia, pp 75–81Midorikawa H, Kurokawa M, Himeno R, Sato M (2008) DLM: A distributed large memory system using remote memory swapping over cluster nodes. In: Proceedings of the 2008 IEEE international conference on cluster computing, Tsukuba, Japan, October 2008, pp 268–27

Crossref

RiuNet

Performance measurement and analysis of PC based cluster server using SET of Architecture and modeling a scalable High performance cluster

Author: Mehta Mihir J.
Publication venue
Publication date: 01/04/2006
Field of study

Not availabl

Etheses - A Saurashtra University Library Service

Designing SSI clusters with hierarchical checkpointing and single I/O space

Author: Chow E
Hwang K
Jin H
Wang CL
Xu Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

Adopting a new hierarchical checkpointing architecture, the authors develop a single I/O address space for building highly available clusters of computers. They propose a systematic approach to achieving a single system image by integrating existing middleware support with the newly developed features.published_or_final_versio

HKU Scholars Hub

Recommended from our members

Computing infrastructure issues in distributed communications systems : a survey of operating system transport system architectures

Author: Schmidt Douglas C.
Suda Tatsuya
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

The performance of distributed applications (such as file transfer, remote login, tele-conferencing, full-motion video, and scientific visualization) is influenced by several factors that interact in complex ways. In particular, application performance is significantly affected both by communication infrastructure factors and computing infrastructure factors. Several communication infrastructure factors include channel speed, bit-error rate, and congestion at intermediate switching nodes. Computing infrastructure factors include (among other things) both protocol processing activities (such as connection management, flow control, error detection, and retransmission) and general operating system factors (such as memory latency, CPU speed, interrupt and context switching overhead, process architecture, and message buffering). Due to a several orders of magnitude increase in network channel speed and an increase in application diversity, performance bottlenecks are shifting from the network factors to the transport system factors.This paper defines an abstraction called an "Operating System Transport System Architecture" (OSTSA) that is used to classify the major components and services in the computing infrastructure. End-to-end network protocols such as TCP, TP4, VMTP, XTP, and Delta-t typically run on general-purpose computers, where they utilize various operating system resources such as processors, virtual memory, and network controllers. The OSTSA provides services that integrate these resources to support distributed applications running on local and wide area networks.A taxonomy is presented to evaluate OSTSAs in terms of their support for protocol processing activities. We use this taxonomy to compare and contrast five general-purpose commercial and experimental operating systems including System V UNIX, BSD UNIX, the x-kernel, Choices, and Xinu

eScholarship - University of California

A new degree of freedom for memory allocation in clusters

Author: A. Acharya
C. Bienia
C. Keltcher
D. Slogsnat
Federico Silla
H. Fröning
H. Fröning
H. Garcia-Molina
H. Litz
Holger Fröning
Héctor Montaner
IBM journal of Research and Development staff
J. Gray
J. Oleszkiewicz
J. Tuck
J.D. McCalpin
José Duato
M. Chapman
M. Martin
M. Oguchi
M.J. Feeley
P. Charles
P. Conway
P. Magnusson
S. Liang
T. Anderson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2012
Field of study

Improvements in parallel computing hardware usually involve increments in the number of available resources for a given application such as the number of computing cores and the amount of memory. In the case of shared-memory computers, the increase in computing resources and available memory is usually constrained by the coherency protocol, whose overhead rises with system size, limiting the scalability of the final system. In this paper we propose an efficient and cost-effective way to increase the memory available for a given application by leveraging free memory in other computers in the cluster. Our proposal is based on the observation that many applications benefit from having more memory resources but do not require more computing cores, thus reducing the requirements for cache coherency and allowing a simpler implementation and better scalability. Simulation results show that, when additional mechanisms intended to hide remote memory latency are used, execution time of applications that use our proposal is similar to the time required to execute them in a computer populated with enough local memory, thus validating the feasibility of our proposal. We are currently building a prototype that implements our ideas. The first results from real executions in this prototype demonstrate not only that our proposal works but also that it can efficiently execute applications that make use of remote memory resources. © 2011 Springer Science+Business Media, LLC.This work has been supported by PROMETEO from Generalitat Valenciana (GVA) under Grant PROMETEO/2008/060.Montaner Mas, H.; Silla Jiménez, F.; Fröning, H.; Duato Marín, JF. (2012). A new degree of freedom for memory allocation in clusters. Cluster Computing. 15(2):101-123. https://doi.org/10.1007/s10586-010-0150-7S1011231523leaf Systems: http://www.3leafsystems.comAcharya, A., Setia, S.: Availability and utility of idle memory in workstation clusters. ACM SIGMETRICS Perform. Eval. Rev. 27(1), 35–46 (1999). doi: 10.1145/301464.301478Anderson, T., Culler, D., Patterson, D.: A case for NOW (Networks of Workstations). IEEE MICRO 15(1), 54–64 (1995). doi: 10.1109/40.342018HyperTransport Technology Consortium. HyperTransport I/O Link Specification Revision 3.10 (2008). Available at http://www.hypertransport.orgBienia, C., Kumar, S., et al.: The parsec benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th PACT (2008)Chapman, M., Heiser, G.: vNUMA: A virtual shared-memory multiprocessor. In: Proceedings of the 2009 USENIX Annual Technical Conference, San Diego, USA, 2000, pp. 349–362. (2009)Charles, P., Grothoff, C., Saraswat, V., et al.: X10: an object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Not. 40(10), 519–538 (2005)Consortium, H.: HyperTransport High Node Count, Slides. http://www.hypertransport.org/default.cfm?page=HighNodeCountSpecificationConway, P., Hughes, B.: The AMD opteron northbridge architecture. IEEE MICRO 27(2), 10–21 (2007). doi: 10.1109/MM.2007.43Conway, P., Kalyanasundharam, N., Donley, G., et al.: Blade computing with the AMD Opteron processor (Magny-Cours). Hot chips 21 (2009)Duato, J., Silla, F., Yalamanchili, S., et al.: Extending HyperTransport protocol for improved scalability. First International Workshop on HyperTransport Research and Applications (2009)Feeley, M.J., Morgan, W.E., Pighin, E.P., Karlin, A.R., Levy, H.M., Thekkath, C.A.: Implementing global memory management in a workstation cluster. In: SOSP ’95: Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, pp. 201–212. ACM, New York (1995). doi: 10.1145/224056.224072Fröning, H., Litz, H.: Efficient hardware support for the partitioned global address space. In: 10th Workshop on Communication Architecture for Clusters (2010)Fröning, H., Nuessle, M., Slogsnat, D., Litz, H., Brüening, U.: The HTX-board: a rapid prototyping station. In: 3rd annual FPGAworld Conference (2006)Garcia-Molina, H., Salem, K.: Main memory database systems: an overview. IEEE Trans. Knowl. Data Eng. 4(6), 509–516 (1992). doi: 10.1109/69.180602Gaussian 03: http://www.gaussian.comGray, J., Liu, D.T., Nieto-Santisteban, M., et al.: Scientific data management in the coming decade. SIGMOD Rec. 34(4), 34–41 (2005). doi: 10.1145/1107499.1107503IBM journal of Research and Development staff: Overview of the IBM Blue Gene/P project. IBM J. Res. Dev. 52(1/2), 199–220 (2008)IBM z Series: http://www.ibm.com/systems/zIn-Memory Database Systems (IMDSs) Beyond the Terabyte Size Boudary: http://www.mcobject.com/130/EmbeddedDatabaseWhitePapers.htmKeltcher, C., McGrath, K., Ahmed, A., Conway, P.: The AMD opteron processor for multiprocessor servers. Micro IEEE 23(2), 66–76 (2003). doi: 10.1109/MM.2003.1196116Kottapalli, S., Baxter, J.: Nehalem-EX CPU architecture. Hot chips 21 (2009)Liang, S., Noronha, R., Panda, D.: Swapping to remote memory over infiniband: an approach using a high performance network block device. In: Cluster Computing, 2005. IEEE International, pp. 1–10. (2005) doi: 10.1109/CLUSTR.2005.347050Litz, H., Fröning, H., Nuessle, M., Brüening, U.: A hypertransport network interface controller for ultra-low latency message transfers. HyperTransport Consortium White Paper (2007)Litz, H., Fröning, H., Nuessle, M., Brüening, U.: VELO: A novel communication engine for ultra-low latency message transfers. In: 37th International Conference on Parallel Processing, 2008. ICPP ’08, pp. 238–245 (2008). doi: 10.1109/ICPP.2008.85Magnusson, P., Christensson, M., Eskilson, J., et al.: Simics: a full system simulation platform. Computer 35(2), 50–58 (2002). doi: 10.1109/2.982916Martin, M., Sorin, D., Beckmann, B., et al.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. ACM SIGARCH Comput. Archit. News 33(4), 92–99 (2005) doi: 10.1145/1105734.1105747MBA3 NC Series Catalog: http://www.fujitsu.com/global/services/computing/storage/hdd/ehdd/mba3073nc-mba3300nc.htmlMcCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25 (1995)NUMAChip: http://www.numachip.com/Oguchi, M., Kitsuregawa, M.: Using available remote memory dynamically for parallel data mining application on ATM-connected PC cluster. In: IPDPS 2000. Proceedings, 14th International, pp. 411–420 (2000). doi: 10.1109/IPDPS.2000.846014Oleszkiewicz, J., Xiao, L., Liu, Y.: Parallel network RAM: effectively utilizing global cluster memory for large data-intensive parallel programs. In: International Conference on Parallel Processing, 2004. ICPP 2004, vol. 1, pp. 353–360 (2004). doi: 10.1109/ICPP.2004.1327942Ronstrom, M., Thalmann, L.: MySQL cluster architecture overview. Technical White Paper. MySQL (2004)ScaleMP: http://www.scalemp.comSGI: Technical advances in the SGI Altix UV architecture, White Paper. http://www.sgi.com/products/servers/altix/uv/Slogsnat, D., Giese, A., Nüssle, M., Brüning, U.: An open-source HyperTransport core. ACM Trans. Reconfigurable Technol. Syst. 1(3), 1–21 (2008). doi: 10.1007/s10586-010-0150-7Szalay, A.S., Gray, J., vandenBerg, J.: Petabyte Scale Data Mining: Dream or Reality? CoRR cs.DB/0208013 (2002)Tuck, J., Ceze, L., Torrellas, J.: Scalable cache miss handling for high memory-level parallelism. In: Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on (2006)Violin Memory: http://violin-memory.comDynamic Logical Partitioning. White Paper: http://www.ibm.com/systems/p/hardware/whitepapers/dlpar.htmlYelick, K.: Computer architecture: Opportunities and challenges for scalable applications. Sandia CSRI Workshop on Next-generation scalable applications: When MPI-only is not enough (2008)Yelick, K.: Programming models: Opportunities and challenges for scalable applications. Sandia CSRI Workshop on Next-generation scalable applications: When MPI-only is not enough (2008

Crossref

RiuNet

FIT4Green - Energy aware ICT Optimization Policies

Author: Basmadjian Robert
Bunse Christian
Georgiadou Vasiliki
Giuliani Giovanni
Klingert Sonja
Lovasz Gergo
Majanen Mikko
Publication venue: 'Airiti Press, Inc.'
Publication date: 01/01/2010
Field of study

MAnnheim DOCument Server

VTT Research System

COSPO/CENDI Industry Day Conference

Author
Publication venue
Publication date
Field of study

The conference's objective was to provide a forum where government information managers and industry information technology experts could have an open exchange and discuss their respective needs and compare them to the available, or soon to be available, solutions. Technical summaries and points of contact are provided for the following sessions: secure products, protocols, and encryption; information providers; electronic document management and publishing; information indexing, discovery, and retrieval (IIDR); automated language translators; IIDR - natural language capabilities; IIDR - advanced technologies; IIDR - distributed heterogeneous and large database support; and communications - speed, bandwidth, and wireless

NASA Technical Reports Server