Abstract
We demonstrate SiP switch-enabled server regrouping using bandwidth steering for performance improvement in distributed deep learning training in a Fat-tree testbed. Our proposed SiP switch control scheme enables scaling to large-scale datacenter and HPC systems.
© 2021 The Author(s)
PDF Article | Presentation VideoMore Like This
Cen Wang, Noboru Yoshikane, Filippos Balasis, and Takehiro Tsuritani
W1E.3 Optical Fiber Communication Conference (OFC) 2021
Shijia Yan, Ziyi Zhu, Madeleine S. Glick, Zhenguo Wu, and Keren Bergman
Th1G.2 Optical Fiber Communication Conference (OFC) 2022
Roberto Proietti, Che-Yu Liu, Xiaoliang Chen, and S.J.Ben Yoo
W4A.4 Optical Fiber Communication Conference (OFC) 2021