Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Reliable and efficient RAR-based distributed model training in computing power network

Not Accessible

Your library or personal account may give you access

Abstract

The computing power network (CPN) is a novel network technology that integrates computing power from the cloud, edge, and terminals using IP/optical cross-layer networks for distributed computing. CPNs can provide an effective solution for distributed model training (DMT). As a bandwidth optimization architecture based on data parallelism, ring all-reduce (RAR) is widely used in DMT. However, any node or link failure on the ring can interrupt or block the requests deployed on the ring. Meanwhile, due to the resource competition of batch RAR-based DMT requests, inappropriate scheduling strategies will also lead to low training efficiency or congestion. As far as we know, there is currently no research that considers the survivability of rings in scheduling strategies for RAR-based DMT. To fill this gap, we propose a scheduling scheme for RAR-based DMT requests in CPNs to optimize the allocation of computing and wavelength resources considering the time dimension while ensuring reliability. In practical scenarios, service providers may focus on different performance metrics. We formulate an integer linear programming (ILP) model and a RAR-based DMT deployment algorithm (RDDA) to solve this problem considering four optimization objectives under the premise of the minimum blocking rate: minimum computing resource consumption, minimum wavelength resource consumption, minimum training time, and maximum reliability. Simulation results demonstrate that our model satisfies the reliability requirements while achieving corresponding optimal performance for DMT requests under four optimization objectives.

© 2024 Optica Publishing Group

Full Article  |  PDF Article
More Like This
Spatio-temporal fragmentation-aware time-varying service provisioning in computing power networks based on model-assisted reinforcement learning

Huangxu Ma, Jiawei Zhang, Zhiqun Gu, Daniel C. Kilper, and Yuefeng Ji
J. Opt. Commun. Netw. 15(11) 788-803 (2023)

Low-latency partial resource offloading in cloud-edge elastic optical networks

Bowen Chen, Ling Liu, Yuexuan Fan, Weidong Shao, Mingyi Gao, Hong Chen, Weiguo Ju, Pin-Han Ho, Jason P. Jue, and Gangxiang Shen
J. Opt. Commun. Netw. 16(2) 142-158 (2024)

Edge-enhanced graph neural network for DU-CU placement and lightpath provision in X-Haul networks

Ruikun Wang, Jiawei Zhang, Zhiqun Gu, Shuangyi Yan, Yuming Xiao, and Yuefeng Ji
J. Opt. Commun. Netw. 14(10) 828-839 (2022)

Cited By

You do not have subscription access to this journal. Cited by links are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Figures (16)

You do not have subscription access to this journal. Figure files are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Tables (3)

You do not have subscription access to this journal. Article tables are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Equations (42)

You do not have subscription access to this journal. Equations are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.