Search CORE

47 research outputs found

VLANbased Minimal Paths in PC Cluster with Ethernet on Mesh and Torus

Author: Akiya Jouraku
Hideharu Amano
Michihiro Koibuchi
Tomohiro Otsuka
Publication venue
Publication date: 01/01/2005
Field of study

Abstract In a PC cluster with Ethernet, well-distribute

CiteSeerX

Throttling Control for Bufferless Routing in On-Chip Networks

Author: Ahmadou Dit ADI Cisse
Guan Yicheng
Irie Hidetsugu
Koibuchi Michihiro
Miyoshi Takefumi
Yoshinaga Tsutomu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/09/2012
Field of study

As the number of core integration on a single die grows, buffers consume significant energy, and occupy chip area. A bufferless deflection outing that eliminates router’s input port buffers can considerably help saving energy and chip area while providing similar performance of xisting buffered routing, especially for low-to-medium network loads. However when congestion increases, the bufferless frequently causes flits deflections, and misrouting leading to a degradation of network performance. In this paper, we propose IRT(Injection Rate Throttling), a ocal throttling mechanism that reduces deflection and misrouting for high-load bufferless networks. IRT provides injection rate control independently for each network node, allowing to reduce network congestion. Our simulation results based on a cycle-accurate simulator show that using IRT, IRT reduces average transmission latency by 8.65% compared to traditional bufferless routing

Crossref

Creative Repository of Electro-Communications

Highly available network design and resource management of SINET4

Author: A. Jarry
E. K. Çetinkaya
J. P. G. Sterbenz
K. Harada
Kensuke Fukuda
Michihiro Aoki
Michihiro Koibuchi
Motonori Nakamura
N. Kawaguchi
S. Urushidani
S. Urushidani
Shigeki Yamada
Shigeo Urushidani
Shunji Abe
Y. Nagayama
Yusheng Ji
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

FOREWORD

Author: Michihiro KOIBUCHI
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date
Field of study

Crossref

Enhancing Job Scheduling on Inter-Rackscale Datacenters with Free-Space Optical Links

Author: Michihiro KOIBUCHI
Yao HU
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date
Field of study

Crossref

Cabinet Layout Optimization of Supercomputer Topologies for Shorter Cable Length

Author: Henri Casanova
Ikki Fujiwara
Michihiro Koibuchi
Publication venue
Publication date: 01/01/2012
Field of study

Abstract—As the scales of supercomputers increase total cable length becomes enormous, e.g., up to thousands of kilometers. Recent high-radix switches with dozens of ports make switch layout and system packaging more complex. In this study, we study the optimization of the physical layout of topologies of switches on a machine room floor with the goal of reducing cable length. For a given topology, using graph clustering algorithms, we group switches logically into cabinets so that the number of inter-cabinet cables is small. Then, we map the cabinets onto a physical floor space so as to minimize total cable length. This is done by modeling and optimizing the mapping problem as a facility location problem. Our evaluation results show that, when compared to standard clustering/mapping approaches and for popular network topologies, our clustering approach can reduce the number of inter-cabinet cables by up to 40.3 % and our mapping approach can reduce the inter-rack cable length by up to 39.6%. Index Terms—Topology, cabinet layout, interconnection networks, high performance computing, high-radix switches I

CiteSeerX

Crossref

Job Mapping and Scheduling on Free-Space Optical Networks

Author: Ikki FUJIWARA
Michihiro KOIBUCHI
Yao HU
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2016
Field of study

Crossref

A Case for Offloading Federated Learning Server on Smart NIC

Author: Koibuchi Michihiro
Matsutani Hiroki
Shibahara Naoki
Publication venue
Publication date: 13/07/2023
Field of study

Federated learning is a distributed machine learning approach where local weight parameters trained by clients locally are aggregated as global parameters by a server. The global parameters can be trained without uploading privacy-sensitive raw data owned by clients to the server. The aggregation on the server is simply done by averaging the local weight parameters, so it is an I/O intensive task where a network processing accounts for a large portion compared to the computation. The network processing workload further increases as the number of clients increases. To mitigate the network processing workload, in this paper, the federated learning server is offloaded to NVIDIA BlueField-2 DPU which is a smart NIC (Network Interface Card) that has eight processing cores. Dedicated processing cores are assigned by DPDK (Data Plane Development Kit) for receiving the local weight parameters and sending the global parameters. The aggregation task is parallelized by exploiting multiple cores available on the DPU. To further improve the performance, an approximated design that eliminates an exclusive access control between the computation threads is also implemented. Evaluation results show that the federated learning server on the DPU accelerates the execution time by 1.32 times compared with that on the host CPU with a negligible accuracy loss

arXiv.org e-Print Archive