738,565 research outputs found

    A cost-effective heuristic to schedule local and remote memory in cluster computers

    Full text link
    Cluster computers represent a cost-effective alternative solution to supercomputers. In these systems, it is common to constrain the memory address space of a given processor to the local motherboard. Constraining the system in this way is much cheaper than using a full-fledged shared memory implementation among motherboards. However, memory usage among motherboards can be unfairly balanced. On the other hand, remote memory access (RMA) hardware provides fast interconnects among the motherboards of a cluster. RMA devices can be used to access remote RAM memory from a local motherboard. This work focuses on this capability in order to achieve a better global use of the total RAM memory in the system. More precisely, the address space of local applications is extended to remote motherboards and is used to access remote RAM memory. This paper presents an ideal memory scheduling algorithm and proposes a cost-effective heuristic to allocate local and remote memory among local applications. Compared to the devised ideal algorithm, the heuristic obtains the same (or closely resembling) results while largely reducing the computational cost. In addition, we analyze the impact on the performance of stand alone applications varying the memory distribution among regions (local, local to board, and remote). Then, this study is extended to any number of concurrent applications. Experimental results show that a QoS parameter is needed in order to avoid unacceptable performance degradation. © 2011 Springer Science+Business Media, LLC.This work was supported by Spanish CICYT under Grant TIN2009-14475-C04-01 and by Consolider-Ingenio under Grant CSD2006-00046.Serrano Gómez, M.; Sahuquillo Borrás, J.; Petit Martí, SV.; Hassan Mohamed, H.; Duato Marín, JF. (2012). A cost-effective heuristic to schedule local and remote memory in cluster computers. Journal of Supercomputing. 59(3):1533-1551. https://doi.org/10.1007/s11227-011-0566-8S15331551593IBM journal of Research and Development staff (2008) Overview of the IBM blue gene/P project. IBM J Res Dev 52(1/2):199–220Blocksome M, Archer C, Inglett T, McCarthy P, Mundy M, Ratterman J, Sidelnik A, Smith B, Almási G, Castaños J, Lieber D, Moreira J, Krishnamoorthy S, Tipparaju V, Nieplocha J (2006) Design and implementation of a one-sided communication interface for the IBM eServer Blue Gene® supercomputer. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing, SC ’06, Tampa, FL, USA, November 2006, pp 54–54Kumar S, Dózsa G, Almasi G, Heidelberger P, Chen D, Giampapa M, Blocksome M, Faraj A, Parker J, Ratterman J, Smith BE, Archer C (2008) The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer. In: Proceedings of the 22nd annual international conference on supercomputing, Island of Kos, Greece, June 2008, pp 94–103Tipparaju V, Kot A, Nieplocha J, Bruggencate MT, Chrisochoides N (2007) Evaluation of remote memory access communication on the cray XT3. In: Proceedings of the 21th international parallel and distributed processing symposium, Long Beach, California, USA, March 2007, pp 1–7Nussle M, Scherer M, Bruning U (2009) A resource optimized remote-memory-access architecture for low-latency communication. In: International conference on parallel processing, Sept 2009, pp 220–227http://www.hypertransport.org/Serrano M, Sahuquillo J, Hassan H, Petit S, Duato J (2010) A scheduling heuristic to handle local and remote memory in cluster computers. In: Proceedings of the 12th IEEE international conference on high performance computing, Melbourne, Australia, Sept 2010, pp 35–42Keltcher CN, McGrath KJ, Ahmed A, Conway P (2003) The AMD opteron processor for multiprocessor servers. IEEE MICRO 23(2):66–76Duato J, Silla F, Yalamanchili S (2009) Extending hypertransport protocol for improved scalability. In: First international workshop on hypertransport research and applications.Litz H, Fröening H, Nuessle M, Brüening U (2007) A hypertransport network interface controller for ultra-low latency message transfers. HyperTransport Consortium White Paperhttps://www.simics.net/http://www.cs.wisc.edu/gems/http://www.cs.virginia.edu/stream/Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: Characterization and methodological considerations. In: Proceedings of the 22nd annual international symposium on computer architecture, New York, NY, USA, 1995, pp 24–36Levitin A (2003) Introduction to the design and analysis of algorithms. Addison Wesley, ReadingOleszkiewicz J, Xiao L, Liu Y (2004) Parallel network RAM: Effectively utilizing global cluster memory for large data-intensive parallel programs. In: Proceedings of 33rd international conference on parallel processing, Montreal, Quebec, Canada, pp 353–360Liang S, Noronha R, Panda DK (2005) Swapping to remote memory over infiniband: An approach using a high performance network block device. In: Proceedings of the 2005 IEEE international conference on cluster computing, Boston, Massachusetts, USA, pp 1–10Oguchi M, Kitsuregawa M (2000) Using available remote memory dynamically for parallel data mining application on ATM-connected PC cluster. In: Proceedings of the 14th international parallel & distributed processing symposium, Cancun, Mexico, pp 411–420Werstein P, Jia X, Huang Z (2007) A remote memory swapping system for cluster computers. In: Proceedings of the eighth international conference on parallel and distributed computing, applications and technologies, Adelaide, Australia, pp 75–81Midorikawa H, Kurokawa M, Himeno R, Sato M (2008) DLM: A distributed large memory system using remote memory swapping over cluster nodes. In: Proceedings of the 2008 IEEE international conference on cluster computing, Tsukuba, Japan, October 2008, pp 268–27

    A cluster computer performance predictor for memory scheduling

    Full text link
    Remote Memory Access (RMA) hardware allow a given motherboard in a cluster to directly access the memory installed in a remote motherboard of the same cluster. In recent works, this characteristic has been used to extend the addressable memory space of selected motherboards, which enable a better balance of main memory resources among cluster applications. This way is much more cost-effective than than implementing a full-fledged shared memory system. In this context, the memory scheduler is in charge of finding a suitable distribution of local and remote memory that maximizes the performance and guarantees a minimum QoS among the applications. Note that since changing the memory distribution is a slow process involving several motherboards, the memory scheduler needs to make sure that the target distribution provides better performance than the current one. In this paper, a performance predictor is designed in order to find the best memory distribution for a given set of applications executing in a cluster motherboard. The predictor uses simple hardware counters to estimate the expected impact on performance of the different memory distributions. The hardware counters provide the predictor with the information about the time spent in processor, memory access and network. The performance model used by the predictor has been validated in a detailed microarchitectural simulator using real benchmarks. Results show that the prediction accuracy never deviates more than 5% compared to the real results, being less than 0.5% in most of the cases.This work was supported by Spanish CICYT under Grant TIN2009-14475-C04-01, and by Consolider-Ingenio under Grant CSD2006-00046Serrano Gómez, M.; Sahuquillo Borrás, J.; Hassan Mohamed, H.; Petit Martí, SV.; Duato Marín, JF. (2011). A cluster computer performance predictor for memory scheduling. En Algorithms and Architectures for Parallel Processing. Springer Verlag (Germany). 7017:353-362. doi:10.1007/978-3-642-24669-2_34S3533627017Meuer, H.W.: The top500 project: Looking back over 15 years of supercomputing experience. Informatik-Spektrum 31, 203–222 (2008), doi:10.1007/s00287-008-0240-6Nussle, M., Scherer, M., Bruning, U.: A Resource Optimized Remote-Memory-Access Architecture for Low-latency Communication. In: International Conference on Parallel Processing, pp. 220–227 (September 2009)Blocksome, M., Archer, C., Inglett, T., McCarthy, P., Mundy, M., Ratterman, J., Sidelnik, A., Smith, B., Almási, G., Castaños, J., Lieber, D., Moreira, J., Krishnamoorthy, S., Tipparaju, V., Nieplocha, J.: Design and implementation of a one-sided communication interface for the IBM eServer Blue Gene®supercomputer. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 120. ACM, New York (2006)Kumar, S., Dózsa, G., Almasi, G., Heidelberger, P., Chen, D., Giampapa, M., Blocksome, M., Faraj, A., Parker, J., Ratterman, J., Smith, B.E., Archer, C.: The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer. In: ICS, pp. 94–103 (2008)Tipparaju, V., Kot, A., Nieplocha, J., Bruggencate, M.T., Chrisochoides, N.: Evaluation of Remote Memory Access Communication on the Cray XT3. In: IEEE International Parallel and Distributed Processing Symposium, pp. 1–7 (March 2007)HyperTransport Technology Consortium. HyperTransport I/O Link Specification Revision (October 3, 2008)Serrano, M., Sahuquillo, J., Hassan, H., Petit, S., Duato, J.: A scheduling heuristic to handle local and remote memory in cluster computers. In: High Performance Computing and Communications (2010) (accepted for publication)Serrano, M., Sahuquillo, J., Petit, S., Hassan, H., Duato, J.: A cost-effective heuristic to schedule local and remote memory in cluster computers. The Journal of Supercomputing, 1–19 (2011), doi:10.1007/s11227-011-0566-8Ubal, R., Sahuquillo, J., Petit, S., López, P.: Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors. In: Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing (2007)Keltcher, C.N., McGrath, K.J., Ahmed, A., Conway, P.: The AMD Opteron Processor for Multiprocessor Servers. IEEE Micro 23(2), 66–76 (2003)Duato, J., Silla, F., Yalamanchili, S.: Extending HyperTransport Protocol for Improved Scalability. In: First International Workshop on HyperTransport Research and Applications (2009)Litz, H., Fröening, H., Nuessle, M., Brüening, U.: A HyperTransport Network Interface Controller for Ultra-low Latency Message Transfers. In: HyperTransport Consortium White Paper (2007)Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. In: Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 129–142 (2010)Xie, Y., Loh, G.H.: Dynamic Classification of Program Memory Behaviors in CMPs. In: 2nd Workshop on Chip Multiprocessor Memory Systems and Interconnects in conjunction with the 35th International Symposium on Computer Architecture (2008)Xu, C., Chen, X., Dick, R.P., Mao, Z.M.: Cache contention and application performance prediction for multi-core systems. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 76–86 (2010)Rai, J.K., Negi, A., Wankar, R., Nayak, K.D.: Performance prediction on multi-core processors. In: 2010 International Conference on Computational Intelligence and Communication Networks (CICN), pp. 633–637 (November 2010)Liang, S., Noronha, R., Panda, D.K.: Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device. In: CLUSTER, pp. 1–10. IEEE, Los Alamitos (2005)Werstein, P., Jia, X., Huang, Z.: A Remote Memory Swapping System for Cluster Computers. In: Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 75–81 (2007)Midorikawa, H., Kurokawa, M., Himeno, R., Sato, M.: DLM: A distributed Large Memory System using remote memory swapping over cluster nodes. In: IEEE International Conference on Cluster Computing, pp. 268–273 (October 2008

    The k-ary n-direct s-indirect family of topologies for large-scale interconnection networks

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-016-1640-zIn large-scale supercomputers, the interconnection network plays a key role in system performance. Network topology highly defines the performance and cost of the interconnection network. Direct topologies are sometimes used due to its reduced hardware cost, but the number of network dimensions is limited by the physical 3D space, which leads to an increase of the communication latency and a reduction of network throughput for large machines. Indirect topologies can provide better performance for large machines, but at higher hardware cost. In this paper, we propose a new family of hybrid topologies, the k-ary n-direct s-indirect, that combines the best features from both direct and indirect topologies to efficiently connect an extremely high number of processing nodes. The proposed network is an n-dimensional topology where the k nodes of each dimension are connected through a small indirect topology of s stages. This combination results in a family of topologies that provides high performance, with latency and throughput figures of merit close to indirect topologies, but at a lower hardware cost. In particular, it doubles the throughput obtained per cost unit compared with indirect topologies in most of the cases. Moreover, their fault-tolerance degree is similar to the one achieved by direct topologies built with switches with the same number of ports.This work was supported by the Spanish Ministerio de Economa y Competitividad (MINECO) and by FEDER funds under Grant TIN2012-38341-C04-01 and by Programa de Ayudas de Investigacion y Desarrollo (PAID) from Universitat Politecnica de Valencia.Peñaranda Cebrián, R.; Gómez Requena, C.; Gómez Requena, ME.; López Rodríguez, PJ.; Duato Marín, JF. (2016). The k-ary n-direct s-indirect family of topologies for large-scale interconnection networks. Journal of Supercomputing. 72(3):1035-1062. https://doi.org/10.1007/s11227-016-1640-z10351062723Connect-IB. http://www.mellanox.com/related-docs/prod_adapter_cards/PB_Connect-IB.pdf . Accessed 3 Feb 2016Mellanox store. http://www.mellanoxstore.com . Accessed 3 Feb 2016Mellanox technology. http://www.mellanox.com . Accessed 3 Feb 2016Myricom. http://www.myri.com . Accessed 3 Feb 2016Quadrics homepage. http://www.quadrics.com . Accessed 22 Sept 2008TOP500 supercomputer site. http://www.top500.org . Accessed 3 Feb 2016Balkan A, Qu G, Vishkin U (2009) Mesh-of-trees and alternative interconnection networks for single-chip parallelism. IEEE Trans Very Large Scale Integr(VLSI) Syst 17(10):1419–1432. doi: 10.1109/TVLSI.2008.2003999Bermudez Garzon D, Gomez ME, Lopez P, Duato J, Gomez C (2014) FT-RUFT: a performance and fault-tolerant efficient indirect topology. In: 22nd Euromicro international conference on parallel, distributed and network-based processing (PDP). IEEE, pp 405–409Bhandarkar SM, Arabnia HR (1995) The Hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114Boku T, Nakazawa K, Nakamura H, Sone T, Mishima T, Itakura K (1996) Adaptive routing technique on hypercrossbar network and its evaluation. Syst Comput Jpn 27(4):55–64Dally W, Towles B (2004) Principles and practices of interconnection networks. Morgan Kaufmann, San FranciscoDas R, Eachempati S, Mishra A, Narayanan V, Das C (2009) Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs. In: IEEE 15th international symposium on high performance computer architecture (HPCA’09), pp 175–186. doi: 10.1109/HPCA.2009.4798252Mahdaly AI, Mouftah HT, Hanna NN (1990) Topological properties of WK-recursive networks. In: Proceedings of IEEE workshop on future trends of distributed computing systems, pp 374–380. doi: 10.1109/FTDCS.1990.138349Duato J (1996) A necessary and sufficient condition for deadlock-free routing in cut-through and store-and-forward networks. IEEE Trans Parallel Distrib Syst 7:841–854. doi: 10.1109/71.532115Duato J, Yalamanchili S, Lionel N (2002) Interconnection networks: an engineering approach. Morgan Kaufmann Publishers Inc., USAFlich J, Malumbres M, López P, Duato J (2000) Improving routing performance in Myrinet networks. In: International on parallel and distributed processing symposium, p 27. doi: 10.1109/IPDPS.2000.845961García M, Beivide R, Camarero C, Valero M, Rodríguez G, Minkenberg C (2015) On-the-fly adaptive routing for dragonfly interconnection networks. J Supercomput 71(3):1116–1142Gómez C, Gilabert F, Gómez M, López P, Duato J (2007) Deterministic versus adaptive routing in fat-trees. In: IEEE international on parallel and distributed processing symposium (IPDPS’07), pp 1–8. doi: 10.1109/IPDPS.2007.370482Gómez C, Gilabert F, Gómez M, López P, Duato J (2008) RUFT: simplifying the fat-tree topology. In: 14th IEEE international conference on parallel and distributed systems (ICPADS’08), pp 153–160. doi: 10.1109/ICPADS.2008.44Guo C, Lu G, Li D, Wu H, Zhang X, Shi Y, Tian C, Zhang Y, Lu S (2009) BCube: a high performance, server-centric network architecture for modular data centers. In: SIGCOMM ’09: proceedings of the ACM SIGCOMM 2009 conference on data communication. ACM, New York, pp 63–74. doi: 10.1145/1592568.1592577 . http://www.bibsonomy.org/bibtex/23a5da89fbf099e3c70f4559ab38082c5/chesteve . Accessed 22 Sept 2008Gupta A, Dally W (2006) Topology optimization of interconnection networks. Comput Arch Lett 5(1):10–13. doi: 10.1109/L-CA.2006.8Kim J, Dally W, Abts D (2007) Flattened butterfly: a cost-efficient topology for high-radix networks. In: Proceedings of the 34th annual international symposium on computer architecture (ISCA’07). ACM, New York, pp 126–137. doi: 10.1145/1250662.1250679Kim J, Dally W, Scott S, Abts D (2008) Technology-driven, highly-scalable dragonfly topology. In: Proceedings of the 35th annual international symposium on computer architecture (ISCA’08). IEEE Computer Society, Washington, DC, pp 77–88. doi: 10.1109/ISCA.2008.19Leighton F (1992) Introduction to parallel algorithms and architectures: arrays, trees, hypercubes v. 1. M. Kaufmann Publishers, San FranciscoLeiserson CE (1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput 34(10):892–901Matsutani H, Koibuchi M, Amano H (2007) Performance, cost, and energy evaluation of fat H-tree: a cost-efficient tree-based on-chip network. In: IEEE international on parallel and distributed processing symposium (IPDPS’07), pp 1–10. doi: 10.1109/IPDPS.2007.370271Rahmati D, Kiasari A, Hessabi S, Sarbazi-Azad H (2006) A performance and power analysis of wk-recursive and mesh networks for network-on-chips. In: International conference on computer design (ICCD’06), pp 142–147. doi: 10.1109/ICCD.2006.4380807Towles B, Dally WJ (2002) Worst-case traffic for oblivious routing functions. In: Proceedings of the fourteenth annual ACM symposium on parallel algorithms and architectures (SPAA’02). ACM, New York, pp 1–8. doi: 10.1145/564870.564872Yang Y, Funahashi A, Jouraku A, Nishi H, Amano H, Sueyoshi T (2001) Recursive diagonal torus: an interconnection network for massively parallel computers. IEEE Trans Parallel Distrib Syst 12(7):701–715. doi: 10.1109/71.94074

    Scalability approaches for causal multicast: a survey

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/s00607-015-0479-0Many distributed services need to be scalable: internet search, electronic commerce, e-government... In order to achieve scalability, high availability and fault tolerance, such applications rely on replicated components. Because of the dynamics of growth and volatility of customer markets, applications need to be hosted by adaptive, highly scalable systems. In particular, the scalability of the reliable multicast mechanisms used for supporting the consistency of replicas is of crucial importance. Reliable multicast might propagate updates in a pre-determined order (e.g., FIFO, total or causal). Since total order needs more communication rounds than causal order, the latter appears to be the preferable candidate for achieving multicast scalability, although the consistency guarantees based on causal order are weaker than those of total order. This paper provides a historical survey of different scalability approaches for reliable causal multicast protocols.This work was supported by European Regional Development Fund (FEDER) and Ministerio de Economia y Competitividad (MINECO) under research Grant TIN2012-37719-C03-01.Juan Marín, RD.; Decker, H.; Armendáriz Íñigo, JE.; Bernabeu Aubán, JM.; Muñoz Escoí, FD. (2016). Scalability approaches for causal multicast: a survey. Computing. 98(9):923-947. https://doi.org/10.1007/s00607-015-0479-0S923947989Adly N, Nagi M (1995) Maintaining causal order in large scale distributed systems using a logical hierarchy. In: IASTED Intnl Conf on Appl Inform, pp 214–219Aguilera MK, Chen W, Toueg S (1997) Heartbeat: a timeout-free failure detector for quiescent reliable communication. In: 11th Intnl Wshop on Distrib Alg (WDAG), Saarbrücken, pp 126–140Almeida JB, Almeida PS, Baquero C (2004) Bounded version vectors. In: 18th Intnl Conf Distrib Comput (DISC), Amsterdam, pp 102–116Almeida PS, Baquero C, Fonte V (2008) Interval tree clocks. In: 12th Intnl Conf Distrib Syst (OPODIS), Luxor, pp 259–274Almeida S, Leitão J, Rodrigues LET (2013) ChainReaction: a causal+ consistent datastore based on chain replication. In: 8th EuroSys Conf, Czech Republic, pp 85–98Álvarez A, Arévalo S, Cholvi V, Fernández A, Jiménez E (2008) On the interconnection of message passing systems. Inf Process Lett 105(6):249–254Amir Y, Stanton J (1998) The Spread wide area group communication system. Tech. rep., CDNS-98-4, The Center for Networking and Distributed Systems, The Johns Hopkins UnivAmir Y, Dolev D, Kramer S, Malki D (1992) Transis: a communication subsystem for high availability. In: 22nd Intnl Symp Fault-Tolerant Comp (FTCS), Boston, pp 76–84Anastasi G, Bartoli A, Spadoni F (2001) A reliable multicast protocol for distributed mobile systems: design and evaluation. IEEE Trans Parallel Distrib Syst 12(10):1009–1022Bailis P, Ghodsi A, Hellerstein JM, Stoica I (2013) Bolt-on causal consistency. In: Intnl Conf Mgmnt Data (SIGMOD), New York, pp 761–772Baldoni R, Raynal M, Prakash R, Singhal M (1996) Broadcast with time and causality constraints for multimedia applications. In: 22nd Intnl Euromicro Conf, Prague, pp 617–624Baldoni R, Friedman R, van Renesse R (1997) The hierarchical daisy architecture for causal delivery. In: 17th Intnl Conf Distrib Comput Syst (ICDCS), Maryland, pp 570–577Ban B (2002) JGroups—a toolkit for reliable multicast communication. http://www.jgroups.orgBaquero C, Almeida PS, Shoker A (2014) Making operation-based CRDTs operation-based. In: 14th Intnl Conf Distrib Appl Interop Syst (DAIS), Berlin, pp 126–140Benslimane A, Abouaissa A (2002) Dynamical grouping model for distributed real time causal ordering. Comput Commun 25:288–302Birman KP, Joseph TA (1987) Reliable communication in the presence of failures. ACM Trans Comput Syst 5(1):47–76Birman KP, Schiper A, Stephenson P (1991) Lightweigt causal and atomic group multicast. ACM Trans Comput Syst 9(3):272–314Cachin C, Guerraoui R, Rodrigues LET (2011) Introduction to reliable and secure distributed programming, 2nd edn. Springer, BerlinChandra P, Gambhire P, Kshemkalyani AD (2004) Performance of the optimal causal multicast algorithm: a statistical analysis. IEEE Trans Parall Distr 15(1):40–52Chandra TD, Toueg S (1996) Unreliable failure detectors for reliable distributed systems. J ACM 43(2):225–267de Juan-Marín R, Cholvi V, Jiménez E, Muñoz-Escoí FD (2009) Parallel interconnection of broadcast systems with multiple FIFO channels. In: 11th Intnl Symp on Distrib Obj, Middleware and Appl (DOA), Vilamoura, LNCS, vol 5870, pp 449–466Défago X, Schiper A, Urbán P (2004) Total order broadcast and multicast algorithms: taxonomy and survey. ACM Comput Surv 36(4):372–421Demers AJ, Greene DH, Hauser C, Irish W, Larson J, Shenker S, Sturgis HE, Swinehart DC, Terry DB (1987) Epidemic algorithms for replicated database maintenance. In: 6th ACM Symp on Princ of Distrib Comput (PODC), Canada, pp 1–12Du J, Elnikety S, Roy A, Zwaenepoel W (2013) Orbe: scalable causal consistency using dependency matrices and physical clocks. In: ACM Symp on Cloud Comput (SoCC), Santa Clara, pp 11:1–11:14Fernández A, Jiménez E, Cholvi V (2000) On the interconnection of causal memory systems. In: 19th Annual ACM Symp on Princ of Distrib Comput (PODC), Portland, pp 163–170Fidge CJ (1988) Timestamps in message-passing systems that preserve the partial ordering. In: 11th Australian Comput Conf, pp 56–66Friedman R, Vitenberg R, Chockler G (2003) On the composability of consistency conditions. Inf Process Lett 86(4):169–176Gilbert S, Lynch N (2002) Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2):51–59Gray J, Helland P, O’Neil PE, Shasha D (1996) The dangers of replication and a solution. In: SIGMOD Conf, pp 173–182Hadzilacos V, Toueg S (1993) Fault-tolerant broadcasts and related problems. In: Mullender S (ed) Distributed systems, chap 5, 2nd edn. ACM Press, pp 97–145Johnson S, Jahanian F, Shah J (1999) The inter-group router approach to scalable group composition. In: 19th Intnl Conf on Distrib Comput Syst (ICDCS), Austin, pp 4–14Kalantar MH, Birman KP (1999) Causally ordered multicast: the conservative approach. In: 19th Intnl Conf on Distrib Comput Syst (ICDCS), Austin, pp 36–44Kawanami S, Enokido T, Takizawa M (2004) A group communication protocol for scalable causal ordering. In: 18th Intnl Conf on Adv Inform Netw Appl (AINA), Fukuoka, pp 296–302Kawanami S, Nishimura T, Enokido T, Takizawa M (2005) A scalable group communication protocol with global clock. In: 19th Intnl Conf on Adv Inform Netw Appl (AINA), Taipei, pp 625–630Kshemkalyani AD, Singhal M (1998) Necessary and sufficient conditions on information for causal message ordering and their optimal implementation. Distrib Comput 11(2):91–111Kshemkalyani AD, Singhal M (2011) Distributed computing: principles, algorithms, and systems, 2nd edn. Cambridge University Press, New YorkLadin R, Liskov B, Shrira L, Ghemawat S (1992) Providing high availability using lazy replication. ACM Trans Comput Syst 10(4):360–391Lamport L (1978) Time, clocks, and the ordering of events in a distributed system. Commun ACM 21(7):558–565Laumay P, Bruneton E, de Palma N, Krakowiak S (2001) Preserving causality in a scalable message-oriented middleware. In: Intnl Conf on Distrib Syst Platf (Middleware), pp 311–328Liu N, Liu M, Cao J, Chen G, Lou W (2010) When transportation meets communication: V2P over VANETs. In: 30th Intnl Conf Distrib Comput Syst (ICDCS), GenovaLwin CH, Mohanty H, Ghosh RK (2004) Causal ordering in event notification service systems for mobile users. In: Intnl Conf Inform Tech: Coding Comput (ITCC), Las Vegas, pp 735–740Mahajan P, Alvisi L, Dahlin M (2011) Consistency, availability and covergence. Tech. rep., UTCS TR-11-22, The University of Texas at AustinMatos M, Sousa A, Pereira J, Oliveira R, Deliot E, Murray P (2009) CLON: overlay networks and gossip protocols for cloud environments. In: 11th Intnl Symp on Dist Obj, Middleware and Appl (DOA), Vilamoura, LNCS, vol 5870, pp 549–566Mattern F (1989) Virtual time and global states of distributed systems. In: Parallel and distributed algorithms, North-Holland, pp 215–226Mattern F, Fünfrocken S (1994) A non-blocking lightweight implementation of causal order message delivery. Lect Notes Comput Sci 938:197–213Meldal S, Sankar S, Vera J (1991) Exploiting locality in maintaining potential causality. In: 10th ACM Symp on Princ of Distrib Comp (PODC), Montreal, pp 231–239Meling H, Montresor A, Helvik BE, Babaoglu Ö (2008) Jgroup/ARM: a distributed object group platform with autonomous replication management. Softw Pract Exp 38(9):885–923Mosberger D (1993) Memory consistency models. Oper Syst Rev 27(1):18–26Mostéfaoui A, Raynal M (1993) Causal multicast in overlapping groups: towards a low cost approach. In: 4th Intnl Wshop on Future Trends of Distrib Comp Syst (FTDCS), Lisbon, pp 136–142Mostéfaoui A, Raynal M, Travers C, Patterson S, Agrawal D, El Abbadi A (2005) From static distributed systems to dynamic systems. In: 24th Symp on Rel Distrib Syst (SRDS), Orlando, pp 109–118Nishimura T, Hayashibara N, Takizawa M, Enokido T (2005) Causally ordered delivery with global clock in hierarchical group. In: ICPADS (2), Fukuoka, pp 560–564Parker DS Jr, Popek GJ, Rudisin G, Stoughton A, Walker BJ, Walton E, Chow JM, Edwards DA, Kiser S, Kline CS (1983) Detection of mutual inconsistency in distributed systems. IEEE Trans Softw Eng 9(3):240–247Pascual-Miret L (2014) Consistency models in modern distributed systems. An approach to eventual consistency. Master’s thesis, Depto. de Sistemas Informáticos y Computación, Univ. Politècnica de ValènciaPascual-Miret L, González de Mendívil JR, Bernabéu-Aubán JM, Muñoz-Escoí FD (2015) Widening CAP consistency. Tech. rep., IUMTI-SIDI-2015/003, Univ. Politècnica de València, ValenciaPeterson LL, Buchholz NC, Schlichting RD (1989) Preserving and using context information in interprocess communication. ACM Trans Comput Syst 7(3):217–246Pomares Hernández S, Fanchon J, Drira K, Diaz M (2001) Causal broadcast protocol for very large group communication systems. In: 5th Intnl Conf on Princ of Distrib Syst (OPODIS), Manzanillo, pp 175–188Prakash R, Baldoni R (2004) Causality and the spatial-temporal ordering in mobile systems. Mobile Netw Appl 9(5):507–516Prakash R, Raynal M, Singhal M (1997) An adaptive causal ordering algorithm suited to mobile computing environments. J Parallel Distrib Comput 41(2):190–204Raynal M, Schiper A, Toueg S (1991) The causal ordering abstraction and a simple way to implement it. Inf Process Lett 39(6):343–350Rodrigues L, Veríssimo P (1995a) Causal separators and topological timestamping: An approach to support causal multicast in large-scale systems. Tech. Rep. AR-05/95, Instituto de Engenharia de Sistemas e Computadores (INESC), LisbonRodrigues L, Veríssimo P (1995b) Causal separators for large-scale multicast communication. In: 15th Intnl Conf on Distrib Comput Syst (ICDCS), Vancouver, pp 83–91Schiper A, Eggli J, Sandoz A (1989) A new algorithm to implement causal ordering. In: 3rd Intnl Wshop on Distrib Alg (WDAG), Nice, pp 219–232Schiper N, Pedone F (2010) Fast, flexible and highly resilient genuine FIFO and causal multicast algorithms. In: 25th ACM Symp on Applied Comp (SAC), Sierre, pp 418–422Shapiro M, Preguiça NM, Baquero C, Zawirski M (2011) Convergent and commutative replicated data types. Bull EATCS 104:67–88Shen M, Kshemkalyani AD, Hsu TY (2015) Causal consistency for geo-replicated cloud storage under partial replication. In: Intnl Paral Distrib Proces Symp (IPDPS) Wshop, Hyderabad, pp 509–518Singhal M, Kshemkalyani AD (1992) An efficient implementation of vector clocks. Inf Process Lett 43(1):47–52Sotomayor B, Montero RS, Llorente IM, Foster IT (2009) Virtual infrastructure management in private and hybrid clouds. IEEE Internet Comput 13(5):14–22Stephenson P (1991) Fast ordered multicasts. PhD thesis, Dept. of Comp. Sc., Cornell Univ., IthacaStonebraker M (1986) The case for shared nothing. IEEE Database Eng Bull 9(1):4–9Vogels W (2009) Eventually consistent. Commun ACM 52(1):40–44Wischhof L, Ebner A, Rohling H (2005) Information dissemination in self-organizing intervehicle networks. IEEE Trans Intell Transp 6(1):90–101Yavatkar R (1992) MCP: a protocol for coordination and temporal synchronization in multimedia collaborative applications. In: 12th Intnl Conf on Distrib Comput Syst (ICDCS), Yokohama, pp 606–613Yen LH, Huang TL, Hwang SY (1997) A protocol for causally ordered message delivery in mobile computing systems. Mobile Netw Appl 2(4):365–372Zawirski M, Preguiça N, Duarte S, Bieniusa A, Balegas V, Shapiro M (2015) Write fast, read in the past: causal consistency for client-side applications. In: 16th Intnl Middleware Conf, VancouverZhou S, Cai W, Turner SJ, Lee BS, Wei J (2007) Critical causal order of events in distributed virtual environments. ACM Trans Mult Comp Commun Appl 3(3):1

    A study of the communication cost of the FFT on torus multicomputers

    Get PDF
    The computation of a one-dimensional FFT on a c-dimensional torus multicomputer is analyzed. Different approaches are proposed which differ in the way they use the interconnection network. The first approach is based on the multidimensional index mapping technique for the FFT computation. The second approach starts from a hypercube algorithm and then embeds the hypercube onto the torus. The third approach reduces the communication cost of the hypercube algorithm by pipelining the communication operations. A novel methodology to pipeline the communication operations on a torus is proposed. Analytical models are presented to compare the different approaches. This comparison study shows that the best approach depends on the number of dimensions of the torus and the communication start-up and transfer times. The analytical models allow us to select the most efficient approach for the available machine.Peer ReviewedPostprint (published version

    Distributed-Memory Breadth-First Search on Massive Graphs

    Full text link
    This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered direction optimizing algorithm. We analyze the performance and scalability trade-offs in using different local data structures such as CSR and DCSC, enabling in-node multithreading, and graph decompositions such as 1D and 2D decomposition.Comment: arXiv admin note: text overlap with arXiv:1104.451

    Control versus Data Flow in Parallel Database Machines

    Get PDF
    The execution of a query in a parallel database machine can be controlled in either a control flow way, or in a data flow way. In the former case a single system node controls the entire query execution. In the latter case the processes that execute the query, although possibly running on different nodes of the system, trigger each other. Lately, many database research projects focus on data flow control since it should enhance response times and throughput. The authors study control versus data flow with regard to controlling the execution of database queries. An analytical model is used to compare control and data flow in order to gain insights into the question which mechanism is better under which circumstances. Also, some systems using data flow techniques are described, and the authors investigate to which degree they are really data flow. The results show that for particular types of queries data flow is very attractive, since it reduces the number of control messages and balances these messages over the node

    Distributed Bayesian Matrix Factorization with Limited Communication

    Full text link
    Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank representations of matrices and for predicting missing values and providing confidence intervals. Scaling up the posterior inference for massive-scale matrices is challenging and requires distributing both data and computation over many workers, making communication the main computational bottleneck. Embarrassingly parallel inference would remove the communication needed, by using completely independent computations on different data subsets, but it suffers from the inherent unidentifiability of BMF solutions. We introduce a hierarchical decomposition of the joint posterior distribution, which couples the subset inferences, allowing for embarrassingly parallel computations in a sequence of at most three stages. Using an efficient approximate implementation, we show improvements empirically on both real and simulated data. Our distributed approach is able to achieve a speed-up of almost an order of magnitude over the full posterior, with a negligible effect on predictive accuracy. Our method outperforms state-of-the-art embarrassingly parallel MCMC methods in accuracy, and achieves results competitive to other available distributed and parallel implementations of BMF.Comment: 28 pages, 8 figures. The paper is published in Machine Learning journal. An implementation of the method is is available in SMURFF software on github (bmfpp branch): https://github.com/ExaScience/smurf
    corecore