Fast control plane for flexible and scalable optical interconnects

Yunfeng Lu; Huaxi Gu; Xiaoshan Yu; Xiaoshan Yu; Peng Li

doi:10.1364/OE.445950

1. Introduction

As one of the most critical components in data centers and high-performance computing, the influence of interconnection networks on application performance is increasing with the expansion of system scale [1,2]. Faced with tens of thousands of network devices, an effective interconnection network can not only exploit the computing power of the system, but also save their equipment costs and power consumption [3,4]. Although traditional electrical networks like fat-tree can also achieve efficient communication between any nodes, the uniformly high bandwidth has to be provided between servers for the worst-case communication pattern, which causes a waste of network resources [5,6]. Moreover, the communication of many applications in data centers and high-performance computing is concentrated among a small number of servers. This static network often results in inefficient use of links, with some links idle and others overloaded. These idle links require global or adaptive routing to be fully utilized, but this can lead to longer communication latency and even affect data exchange between other servers [7,8].

Optical interconnects can make the resource allocation in the network more flexible, while avoiding the high cost and power consumption caused by overprovisioned bandwidth [9,10]. In addition, optical switches eliminate the need for transceivers to convert between light and electricity [11]. The rate-independent feature of optical switches also helps save the cost of data center upgrades. With optical switching devices such as micro electromechanical systems (MEMS) [12,13], arrayed waveguide grating routers (AWGRs) [14,15], and wavelength selective switches (WSS) [16,17], the network topology and communication bandwidth in the system can be adjusted by changing the connection relationship between ports and the number of wavelengths in the link. However, optical switching devices do not process network packets due to the nature of transparent transmission, so a control plane is needed to quickly collect traffic information and calculate the appropriate topology. The less time it takes to collect information, the more accurate the traffic matrix will be, or the communication requirements received by the controller may be outdated. In addition, the topology calculation time is also related to the speed of the network responding to traffic fluctuations, and the reduction in calculation overhead can make the network more agile. In the face of increasingly complex network environment and higher communication requirements, the control plane of optical interconnects needs to deal with the traffic changes faster.

There have also been many efforts in reconfigurable optical networks, but the topology calculation algorithm increases the complexity of the control plane, especially in large-scale networks. For example, the Helios architecture transforms the topology calculation into a series of maximum weighted matching problems after receiving the traffic matrix [18]. Each matching problem corresponds to the configuration of an optical switch, and the traffic matrix needs to subtract the traffic corresponding to the previous problem before being passed to the next problem as an input. The Edmonds algorithm is chosen to solve this problem in Helios, which costs 300 µs on a 24-node experimental platform. The results showed that the control process required 264.9 ms, most of which was caused by traffic collection (77.4 ms) and optical switch configuration (168.4 ms). In Optical Switching Architecture (OSA), link allocation is regarded as a weighted b-matching problem, and the distance between Top-of-Racks (ToRs) with high-volume traffic can be shortened by reconfiguring MEMS switches [19]. The weighted b-matching is a graph-theoretic problem, where b is the number of ToRs that each ToR can communicate with at the same time. The control plane latency on the 32-node experimental platform is 290 ms, of which topology calculation and traffic demand assessment occupy 48 ms and 161 ms, respectively. HydRA uses a heuristic topology configuration algorithm, which sorts the communication requirements and checks them round by round [20]. In the i-th round, the algorithm establishes the i-hop connection until all requirements are met or the optical ports are used up, or i exceeds the set threshold. The reconfiguration time on the 40-node experimental platform is as long as 1.2 s, because marking the corresponding VLAN ID on the port of the ToR switch occupies the main overhead. Although some studies have used customized faster optical switches, the port size is small and it is still a long way from being used on a large scale.

In this work, we further reduce the overhead of the control plane and build a small-scale testbed to evaluate the acceleration effect of optical network flexibility on applications. The configuration problem of multiple MEMS switches is transformed into a combination of several fixed MEMS configurations to meet the changing traffic demand in the network. MEMS switches can sacrifice flexibility by implementing only part of the configurations, which can effectively increase the number of ports while reducing configuration time. When the network is reconfigured, we use the Open Shortest Path First (OSPF) protocol to sense topology changes and select the optimal communication path. This method can not only reduce the burden on the control plane, but also contribute to the wide application of optical switching technology in traditional electrical networks. We have implemented several applications on different network configurations, and experimental results show that a proper reconfiguration cycle can reduce the completion time of the 3-D Fast Fourier Transform application by up to 53%.

In section 2, we introduced the various components of the network architecture and their functions. In section 3, we describe the specific configuration of the testbed. Section 4 shows the performance of the testbed in different scenarios and section 5 discusses the improvement of the scheme in this paper and the subsequent research ideas. Section 6 concludes this work.

2. Network Architecture

The network architecture proposed in this paper includes control plane and data plane, as shown in Fig. 1. All devices belong to the data plane except the controller, which belongs to the control plane. The controller at the top layer is responsible for managing the behavior and state of all devices in the network. Optical circuit switches at the middle layer are periodically reconfigured to change the network topology, while electrical packet switches are used to forward control information and indirect traffic. Servers at the bottom layer are connected to the optical circuit switches for data exchange and connected to the electrical packet switch to notify the controller of their communication requirements. Servers are connected to optical and electrical switches through ToRs, which can effectively reduce the number of ports and wiring complexity, and improve the scalability of the network. ToR is not shown separately in Fig. 1 because it is also an electrical switch.

Fig. 1. Network architecture includes data plane and control plane.

Download Full Size | PDF

2.1 Servers

The server provides users with services such as computing and storage. When facing large-scale computing tasks or data migration, collaboration between servers is inevitable. Traffic information reflects the communication requirements between servers and plays an important role in network configuration, because it helps optical switches better match the switching ports. The method of collecting traffic information in this paper is similar to that used in c-Through [21]. Each server periodically retrieves the destination addresses and their TCP send buffer size using the netstat command, and then submits them to the controller using UDP packets, which is faster, simpler, and more efficient than TCP packets [22]. When the controller does not receive the traffic information from a server, it will actively query the server. If there is still no reply within a round trip time, the controller adopts the information obtained last time. In this case, the server does not respond for many times in a row is considered to have failed and no more optical links are allocated. Synchronization between servers is achieved through the precision time protocol [23], with an accuracy of sub-microseconds. Otherwise, servers cannot periodically and synchronously send traffic information to the controller.

Since the optical link is disconnected due to the optical switch reconfiguration, the server can migrate the data to be transmitted on the optical link to the electrical network before the reconfiguration. Even if the reconfiguration time is small, a large amount of data will be lost under high-rate transmission. After the optical switch is configured, the optical link is preferred for communication.

2.2 Switches

The optical circuit switch can redirect the light from an input port to any output port by controlling the rotation angle of the micromirror, establishing a continuous optical link between the two ports. However, the characteristic of transparent transmission makes the configuration of optical circuit switches dependent on external instructions. Therefore, the controller is connected to the optical circuit switch through a 1 Gb/s electrical link to send configuration instructions. The Polatis optical switch based on MEMS provides Standard Commands for Programmable Instruments (SCPI) to realize the pairing between any ports and view the connection status [24]. Although MEMS-based optical circuit switches face a dilemma: fast configuration speed and large port size are not compatible, the design of RotorNet solves this problem by reducing the flexibility of MEMS [25]. The more configurations that can be implemented, the more angles the mirror needs to rotate, and the longer the reconfiguration takes. If this optical switch only needs to match certain ports but not any, it can have thousands of ports and be reconfigured in tens of microseconds. In addition, Polatis optical switch can store several fixed configurations in advance. The selection of configuration is calculated by the controller according to the communication requirements between the servers and then sent to the optical switch through the TCP socket. Since the Polatis optical switch only provides a manual configuration interface and a TCP communication interface, we use the latter to realize the automatic configuration of the optical switch. This approach simplifies the commands sent by the controller to the optical switch and saves a lot of overhead compared to sending commands for each pair of ports.

Electrical packet switches are mainly responsible for sending traffic information and control commands between the servers and the controller. During reconfiguration, traffic on the optical links can be routed through the electrical network to avoid communication interruptions. The working status of the network can also be maintained by alternately configuring MEMS switches. When the number of optical switches is limited or there is different quality of service requirements, the electrical network can also undertake part of the communication services [26,27]. Due to the deployment of the OSPF protocol, ToRs uses Layer 3 switches can quickly perceive changes in the network topology and complete the migration of data between optical and electrical networks.

2.3 Controller

In order to allocate network resources efficiently, a C++ application was developed to act as a centralized controller in a dedicated server. The controller as the management center can reasonably configure the optical circuit switch to meet the traffic demands between the servers. Traffic information of N servers or ToRs will be sorted into a traffic matrix by the controller according to the source and destination addresses periodically. N is an even number. Then the N×N traffic matrix is split into N/2 sub-traffic matrices, which are all symmetrical about the diagonal, because this perfectly matches the pairwise configuration characteristic of MEMS switches. For a given N system, Polatis optical switch need to store N-1 configurations in advance. Since ToR connects all MEMS switches, the controller randomly assigns the calculated configuration to each MEMS switch. In Fig. 2, the horizontal and vertical coordinates of the traffic matrix respectively denote the source and destination servers or racks. The color blocks in the traffic matrix indicate that the corresponding source and destination servers or racks have communication requirements, and the Roman numerals in the color blocks correspond to the MEMS configurations that can meet their requirements. As can be seen from Fig. 2, the combination of these configurations can cover all communication possibilities, avoiding traffic accumulation or detours as much as possible. Sufficient MEMS switches can use a combination of these configurations at the same time, and a limited number of MEMS switches can also use this combination within a period of time to meet the communication requirements between servers. To save network cost and make full use of electrical network, a limited number of MEMS switches are preferentially allocated to servers with higher communication requirements, while servers with less communication use electrical packet switches to forward data.

Fig. 2. Correspondence between traffic matrix and MEMS configuration.

Download Full Size | PDF

For flexibility and scalability, MEMS switches implement only a few fixed configurations as shown in Fig. 2. The number of MEMS switches in different configurations (N_c) is calculated according to the proportion of its sub-traffic to the total traffic, and the traffic volume of both can be figured out by Algorithm 1. The value of N_c is rounded from the product of its proportion and the total number of MEMS switches. This allocation process is carried out in descending order of proportion. When the number of MEMS switches is small, the configuration with the smallest proportion is allocated to the electrical switches. The proportion of traffic routed to the electrical switched network depends on the number of optical and electrical switches. These configurations are then sent to each optical switch by the controller, and the servers or TORs are notified to make corresponding adjustments at the same time. It is worth noting that the controller will only perform the above operations when the two configurations are different before and after, otherwise the network will maintain the original working state. This algorithm has low overhead and is suitable for large-scale networks.

oe-30-3-3316-i001

3. Testbed

We have built an experimental platform to prove the feasibility of introducing optical switching technology into traditional electrical networks, as shown in Fig. 3. Our prototype has four servers, each with two 10-core Intel Xeon E5-2630 2.20 GHz processors, 32 GB of DDR4-2133 RAM, one Intel I350 dual-port GigE NIC, and one Intel 82599ES dual-port 10 GigE NIC. The server runs as a virtual rack with two virtual machines and one virtual ToR. Each virtual machine acts as an end-host and is connected to optical and electrical switches through virtual ToR. The 32-port Polatis optical circuit switch is partitioned into multiple virtual 8-port optical switches to provide the capability of network topology reconfiguration with the help of the controller. Each server connects to optical circuit switch through two 10 Gb/s SFP+ transceivers, and the wavelength used in the test is 1550 nm. The electrical packet switch connects the servers and the controller through 1G Ethernet to transmit the traffic and control information.

Fig. 3. (a) Network topology and (b) prototype.

Download Full Size | PDF

In order to make timely awareness of changes in network topology and realize the forwarding of indirect traffic, we deployed the OSPF protocol in the virtual ToR. In an OSPF network, due to the frequent exchange of link-state information between routers, all routers can eventually obtain the topology of the entire network. Compared with routing information protocol (RIP), OSPF is more suitable for large-scale heterogeneous networks [28].

As shown in Fig. 4, the combination of MEMS configurations in this experimental platform can build three network topologies. For ease of observation, the server and electrical switches are omitted in the figure. It can be found that switching between any two of the three network topologies only needs to change one of the MEMS switches, which also avoids the overhead caused by all data migration to the electrical network. We apply the electrical link in different scenarios by modifying its cost value in OSPF [29]. Scenario one (S1) is to use electrical link when the number of communication hops is greater than one, and the optical network is only responsible for the one-hop reachable transmission between ToRs. The other (S2) is to use the electrical switch to forward data when the number of communication hops is greater than two, and the optical network is only responsible for the communication that is reachable within two hops between ToRs. The similarity between the two scenarios is that during the reconfiguration of the optical network, data is no longer transmitted on the failed optical link, but is transmitted through the electrical network. In the scenario of fixed network topology, the optical switch performs all data forwarding tasks because there is no network reconfiguration.

Fig. 4. (a), (b) and (c) are the network topologies under different MEMS configurations.

Download Full Size | PDF

4. Experimental results

With the help of a fast and efficient control plane, the flexibility of the optical network has been fully utilized. We select (a) in Fig. 4 as the fixed topology and compare the performance of several Massage Passing Interface (MPI) parallel applications on fixed and reconfigurable networks. The Numerical Aerodynamic Simulation (NAS) parallel benchmarks executed on the testbed consist of five kernels and three pseudo-applications, which are based on several large aviation science applications used on supercomputers at NASA Ames Research Center [30]. We selected parts from NAS parallel benchmarks to run on the experimental platform, namely Multigrid (MG), Conjugate Gradient (CG), and 3-D fast Fourier Transform (FT). MG includes regular short-distance and long-distance data communication tests. CG uses unstructured matrix vector multiplication, which tests irregular long-distance communications. FT tests the performance of the network under all-to-all communication. The problem sizes of these benchmarks are predefined and represented with classes A to D. The letters from front to back also indicate the scale of the problems from small to large. In the simulation, CG.C refers to the class C of CG and the problem size of CG.C is smaller than that of CG.D.

The application completion time in different scenarios is shown in Fig. 5. For this and subsequent figures, we execute the above applications, and show 95% confidence intervals from 15 trials for different problem sizes. The network is reconfigured every 300 seconds in both of the scenarios (S1 and S2) mentioned in section 3. The experimental results show that the flexibility of the network does not significantly reduce the completion time when the amount of data is small, and even brings more overhead. Most of the applications in Fig. 5(a) are only configured once at the beginning, because their completion time is less than 300 seconds. The completion time of CG and MG in scenario 2 is similar to that of the fixed network. This is because the traffic patterns of CG and MG are relatively stable, and the topology selected during network initialization is the same as that of the fixed network. Communication in FT is more random, so the initial configuration of the network is different from the topology of the fixed network, which also leads to more communication overhead. The one-time configuration in the network is easily affected by the randomness of traffic. In Fig. 6(a), the increase in the number of reconfigurations improves the completion time of the FT application, which is similar to that of the fixed network. Each application in Fig. 5(a) takes the longest to complete in scenario 1, because the bandwidth of the electrical link is smaller than that of the optical link.

Fig. 5. (a) and (b) are the completion time of each application under different problem sizes.

Download Full Size | PDF

Fig. 6. (a) and (b) are the completion time of each application under different problem sizes and different reconfiguration frequencies.

Download Full Size | PDF

As the size of the problem grows, the flexibility of the optical interconnects comes into play. The results in Fig. 5(b) show that the completion time of CG and MG in scenario 2 is reduced by 10% and 18%, respectively, compared to the fixed network. FT benefited the most in these applications, with a reduction in completion time of about 36%. The increase in traffic leads to better bandwidth utilization, and the increase in the number of reconfigurations reduces traffic forwarding. In scenario 1, even though the bandwidth of the electrical link is small, the benefits from the flexibility of optical interconnects reduce the completion time of each application to varying degrees.

The expansion of the problem size enables the advantage of network flexibility to be reflected because the increase in application completion time changes the number of network reconfigurations. Therefore, we also studied the impact of reconfiguration frequency on application completion time in scenario 2. Figure 6 gives the application completion time under different problem sizes and different reconfiguration frequencies. The number after R- in the figure indicates the reconfiguration cycle. The completion time of the CG in Fig. 6(a) is similar under the reconfiguration cycle of 30 and 100 seconds, as well as 50 and 150 seconds. This may be related to the point in time at which the reconfiguration is triggered, since no reconfiguration will occur in our scheme if the topology calculation is the same as the last one. This also means that reconfiguration cycles with trigger times close to each other may lead to similar completion time, and we will continue to study in future work. The completion time of MG and FT are similar for several reconfiguration cycles in Fig. 6(a). When the size of the problem is small, the impact of the reconfiguration cycle on the completion time of MG and FT is not obvious.

Figure 6(b) shows the effect of reconfiguration frequency in the case of a large amount of data. It is worth noting that the optimal reconfiguration cycle for different applications is not the same. The completion time of CG is the smallest when the reconfiguration cycle is 200 seconds, which is 40% lower than that of the fixed network in Fig. 5(b). Both MG and FT have the best performance when the reconfiguration cycle is 150 seconds, which is 28% and 53% lower than that of the fixed network, respectively. The lower reconfiguration cycle does not further reduce the completion time, which may be affected by other factors. Both the traffic characteristics of the application and the triggering method of reconfiguration are worthy of further study. For the reconfiguration in a fixed cycle, the result of each topology calculation depends on the traffic characteristics in a short period of time before the reconfiguration. If a traffic burst occurs before the reconfiguration is triggered, the result of topology calculation may be interfered. Although frequent reconfiguration can make the network respond to traffic changes more quickly and improve network performance, such benefits depend heavily on the accuracy of collected traffic data. However, this does not affect the advantages brought by the flexibility of network. According to the overall trend of experimental results, we find that the reconfiguration effectively reduces the application completion time. In the next section, we discuss the method of traffic collection and the triggering method of reconfiguration.

In general, the flexibility of optical interconnects has shown a good acceleration effect when running applications that solve large-scale problems. However, the acceleration effect is not the same for different reconfiguration cycles. The process of reconfiguration involves traffic collection, topology calculation, and optical switch configuration, all of which have an influence on application performance. Combined with the description of other experimental platforms in Section 1, we compared and analyzed them with our experimental platform, as shown in Table 1. In our experiment, we measured the cost of traffic collection to be 21 ms, which is strongly related to the scale of the network. Topology calculation benefits from the low complexity of our scheme and its overhead is only 90 µs. The switching delay of the Polatis optical circuit switch is less than 25 ms [31]. These results are average values calculated after 15 experiments. It can be seen from Table 1 that our experimental platform reduces the overhead of each stage in the control plane except for MEMS configuration under a similar scale. The switching delay of commercial MEMS switches is generally tens of milliseconds, but that of some customized optical switches can be as low as nanoseconds [5]. Therefore, the overhead of MEMS configuration can be further reduced. In addition, the lower cost and power consumption of optical switches also contribute to the widespread use of optical interconnects. Currently, the cost per port of an optical switch is about $100 [32], while that of an electrical switch is about $500 [33]. As the data rate and port count increase, the power consumption per port of an electrical switch is much higher than that of an optical switch [34].

Table 1. Parameters of Experimental Platforms in Different Networks

View Table

Since this paper focuses on the control plane rather than the specific network topology, we only verify the scalability of the algorithm. The study on the scalability of network topology can refer to our previous work X-NEST [2]. The performance of topology calculations under different network scales is studied by generating random traffic matrixes in the simulation. The curve in Fig. 7 is derived from the average of 15 simulations. In the figure, the overhead of topology calculation increases slowly with the expansion of the network scale. Even if the value of N is 1000, the topology calculation takes less than 20 ms. N can be used to represent the number of servers in a small network, but it typically represents the number of ToRs in a large network. Therefore, the number of servers in a network with 1000 ToRs is far more than 1000. This algorithm shows good performance even in a larger scale network.

Fig. 7. The time cost of topology calculation under different network sizes

Download Full Size | PDF

5. Discussion

In a large-scale network, the method of traffic collection needs to be further improved. As a critical step to guide the configuration of the optical switch, the traffic matrix must accurately reflect the communication requirements in the network and not consume too much time to avoid the expiration of this requirement. The most common solution is to simplify traffic information. However, when the number of communication connections is large, there is still a higher overhead because of the need to check the demand between each pair of connections [35]. Another method is to find the law of traffic changes with the help of machine learning to predict future demands and calculate the corresponding topology in advance [36–38]. Observing the experimental results in Section 4, the triggering method of reconfiguration is also a problem worthy of study. Reconfiguration in a fixed cycle may not respond to traffic fluctuations in time. Triggering reconfiguration by setting a threshold is also a good candidate, but a fixed threshold can also cause too many or too few reconfigurations, and it is susceptible to burst traffic. How to set a reasonable or dynamically changing threshold in combination with applications is also an interesting question. In addition, the switching between optical and electrical links in a hybrid network is also significant in practice. We use the OSPF protocol and set the cost value of the electrical links to implement the switch before and after reconfiguration. In this way, optical interconnects can be better applied to the traditional electrical network. Software-Defined Networking (SDN) can be more flexible in setting routing rules for this operation [39,40]. In the SDN context, the controller needs to send the calculated topology to each SDN switch in the form of forwarding table. The SDN switch starts to transmit data after the optical switch is reconfigured. However, traditional electrical switches do not support SDN, which means replacing a lot of switching equipment and incurring higher costs.

6. Conclusion

The reason why the flexibility of optical interconnects has not been applied in DC and HPC is that optical switching will generate a lot of additional overhead, such as traffic collection, topology calculation, and optical switch configuration. However, this flexibility also brings benefits such as reducing the number of communication hops and avoiding network congestion. If this benefit is greater than the cost (i.e. the overhead of optical switching), the operation of the application can be accelerated. The work of this paper is to establish a fast control plane and reduce the cost of network reconfiguration. We built an experimental platform to verify our scheme. Experimental results show that applications can be accelerated to varying degrees in solving large-scale problems, and the completion time of some applications has even been reduced by more than 50%. In the experiment, we found that the reconfiguration cycle also has a significant influence on the acceleration effect, and we will further study this problem in the future work.

Funding

National Key Research and Development Program of China (2018YFE0202800); National Natural Science Foundation of China (61634004, 61901314, 61934002); Natural Science Foundation of Shaanxi Province for Distinguished Young Scholars (No. 2020JC-26); Fundamental Research Funds for the Central Universities (No. JB210110, XJS200119); State Key Laboratory of Computer Architecture (No. CARCH201919); Youth Innovation Team of Shaanxi Universities.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G. Liu, R. Proietti, M. Fariborz, P. Fotouhi, X. Xiao, and S. J. Ben Yoo, “Architecture and Performance Studies of 3D-Hyper-FleX-LION for Reconfigurable All-to-All HPC Networks,” SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020, pp. 1–16.

2. Y. Lu, H. Gu, X. Yu, and P. Li, “X-NEST: A Scalable, Flexible, and High-Performance Network Architecture for Distributed Machine Learning,” J. Lightwave Technol. 39(13), 4247–4254 (2021). [CrossRef]

3. M. Y. Teh, Y. H. Hung, G. Michelogiannakis, S. Yan, M. Glick, J. Shalf, and K. Bergman, “TAGO: Rethinking Routing Design in High Performance Reconfigurable Networks,” SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020, pp. 1–16.

4. Y. Shen, M. H. N. Hattink, P. Samadi, Q. Cheng, Z. Hu, A. Gazman, and K. Bergman, “Software-defined networking control plane for seamless integration of multiple silicon photonic switches in Datacom networks,” Opt. Express 26(8), 10914–10929 (2018). [CrossRef]

5. X. Xue, F. Nakamura, K. Prifti, B. Pan, F. Yan, F. Wang, X. Guo, H. Tsuda, and N. Calabretta, “SDN enabled flexible optical data center network with dynamic bandwidth allocation based on photonic integrated wavelength selective switch,” Opt. Express 28(6), 8949–8958 (2020). [CrossRef]

6. G. Michelogiannakis, Y. Shen, M. Y. Teh, X. Meng, B. Aivazi, T. Groves, J. Shalf, M. Glick, M. Ghobadi, L. Dennison, and K. Bergman, “Bandwidth steering in HPC using silicon nanophotonics,” SC19: International Conference for High Performance Computing Networking, Storage, and Analysis,2019, pp. 1–25.

7. K. Wen, P. Samadi, S. Rumley, C. P. Chen, Y. Shen, M. Bahadori, K. Bergman, and J. Wike, “Flexfly: Enabling a Reconfigurable Dragonfly through Silicon Photonics,” SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, 2016, pp. 166–177.

8. M. Ghobadi, R. Mahajan, A. Phanishayee, N. Devanur, J. Kulkarni, G. Ranade, P. A. Blanche, H. Rastegarfar, M. Glick, and D. Kilper, “ProjecToR: Agile Reconfigurable Data Center Interconnect,” In Proceedings of the 2016 ACM SIGCOMM Conference,216–229, 2016.

9. X. Pan, S. Zhao, H. Yang, S. Tang, and Z. Zhu, “Scheduling Virtual Network Reconfigurations in Parallel in Hybrid Optical/Electrical Datacenter Networks,” J. Lightwave Technol. 39(17), 5371–5382 (2021). [CrossRef]

10. L. Gong, X. Zhou, X. Liu, W. Zhao, W. Lu, and Z. Zhu, “Efficient Resource Allocation for All-Optical Multicasting Over Spectrum-Sliced Elastic Optical Networks,” J. Opt. Commun. Netw. 5(8), 836–847 (2013). [CrossRef]

11. F. Yan, W. Miao, O. Raz, and N. Calabretta, “Opsquare: A flat DCN architecture based on flow-controlled optical packet switches,” J. Opt. Commun. Netw. 9(4), 291–303 (2017). [CrossRef]

12. J. Kim, C. J. Nuzman, B. Kumar, D. F. Lieuwen, J. S. Kraus, A. Weiss, C. P. Lichtenwalner, A. R. Papazian, R. E. Frahm, N. R. Basavanhally, D. A. Ramsey, V. A. Aksyuk, F. Pardo, M. E. Simon, V. Lifton, H. B. Chan, M. Haueis, A. Gasparyan, H. R. Shea, S. Arney, C. A. Bolle, P. R. Kolodner, R. Ryf, D. T. Neilson, and J. V. Gates, “2003 × 1100 port MEMS-based optical crossconnect with 4-dB maximum loss,” IEEE Photonics Technol. Lett. 15(11), 1537–1539 (2003). [CrossRef]

13. M. C. Wu, T. J. Seok, K. Kwon, J. Henriksson, and J. Luo, “Large Scale Silicon Photonics Switches Based on MEMS Technology,” 2019 Optical Fiber Communications Conference and Exhibition (OFC), 2019, pp. 1–3.

14. R. Stabile, A. Rohit, and K. A. Williams, “Monolithically Integrated 8 × 8 Space and Wavelength Selective Cross-Connect,” J. Lightwave Technol. 32(2), 201–207 (2014). [CrossRef]

15. Y. Yin and S. J. B. Yoo, “LIONS: An AWGR-based low-latency optical switch for high-performance computing and data centers,” IEEE JSTQE 19(2), 3600409 (2013). [CrossRef]

16. K. Prifti and N. Calabretta, “System Performance Evaluation of a Nanoseconds Modular Photonic Integrated WDM WSS for Optical Data Center Networks,” OFC, 2019.

17. Z. Zhu and K. Chen, “Fully programmable and scalable optical switching fabric for petabyte data center,” Opt. Express 23(3), 3563–3580 (2015). [CrossRef]

18. N. Farrington, G. Porter, S. Radhakrishnan, H. H. Bazzaz, V. Subramanya, Y. Fainman, G. Papen, and A. Vahdat, “Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers,” In Proceedings of the 2010 ACM SIGCOMM Conference, 339–350, 2010.

19. K. Chen, A. Singla, A. Singh, K. Ramachandran, L. Xu, Y. Zhang, X. Wen, and Y. Chen, “OSA: An Optical Switching Architecture for Data Center Networks with Unprecedented Flexibility,” IEEE/ACM Trans. Networking 22(2), 498–511 (2014). [CrossRef]

20. K. Christodoulopoulos, D. Lugones, K. Katrinis, M. Ruffini, and D. O’Mahony, “Performance Evaluation of a Hybrid Optical/Electrical Interconnect,” J. Opt. Commun. Netw. 7(3), 193–204 (2015). [CrossRef]

21. G. Wang, D. G. Andersen, M. Kaminsky, K. Papagiannaki, T. S. E. Ng, M. Kozuch, and M. Ryan, “c-Through: Part-time Optics in Data Centers, “ In Proceedings of the 2010 ACM SIGCOMM Conference, 327–338, 2010.

22. Y. Sahraoui, A. Ghanam, S. Zaidi, S. Bitam, and A. Mellouk, “Performance evaluation of TCP and UDP based video streaming in vehicular ad-hoc networks,” 2018 International Conference on Smart Communications in Network Technologies (SaCoNeT), 2018, pp. 67–72.

23. D. A. Popescu and A. W. Moore, “Measuring Network Conditions in Data Centers Using the Precision Time Protocol,” IEEE Trans. on Network and Service Management 18(3), 3753–37702021. [CrossRef]

24. Polatis, URL: https://www.polatis.com

25. W. M. Mellette, R. McGuinness, A. Roy, A. Forencich, G. Papen, A. C. Snoeren, and G. Porter, “RotorNet: A Scalable, Low-complexity, Optical Datacenter Network,” In Proceedings of the 2017 ACM SIGCOMM Conference, 267–280, 2017.

26. Z. Zhu, W. Lu, L. Zhang, and N. Ansari, “Dynamic Service Provisioning in Elastic Optical Networks with Hybrid Single-/Multi-Path Routing,” J. Lightwave Technol. 31(1), 15–22 (2013). [CrossRef]

27. J. Liu, W. Lu, F. Zhou, P. Lu, and Z. Zhu, “On Dynamic Service Function Chain Deployment and Readjustment,” IEEE Trans. Netw. Serv. Manag. 14(3), 543–553 (2017). [CrossRef]

28. C. G. Dumitrache, G. Predusca, L. D. Circiumarescu, N. Angelescu, and D. C. Puchianu, “Comparative study of RIP, OSPF and EIGRP protocols using Cisco Packet Tracer,” 2017 5th International Symposium on Electrical and Electronics Engineering (ISEEE), 2017, pp. 1–6.

29. R. Adrian, A. Dahlan, and K. Anam, “OSPF cost impact analysis on SDN network,” 2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE), 2017, pp. 198–201.

30. NAS Parallel Benchmarks, URL: https://www.nas.nasa.gov/software/npb.html

31. Polatis Data Sheet, URL: https://www.viavisolutions.com/en-us/literature/polatis-series-6000-osm-network-switch-module-en.pdf

32. Q. Cheng, S. Rumley, M. Bahadori, and K. Bergman, “Photonic switching in high performance datacenters [Invited],” Opt. Express 26(12), 16022–16043 (2018). [CrossRef]

33. M. Y. Teh, J. J. Wilke, K. Bergman, and S. Rumley, Design Space Exploration of the Dragonfly Topology (Springer International Publishing, 2017).

34. R. G. Beausoleil, M. McLaren, and N. P. Jouppi, “Photonic Architectures for High-Performance Data Centers,” IEEE J. Sel. Top. Quantum Electron. 19(2), 3700109 (2013). [CrossRef]

35. G. Porter, R. Strong, N. Farrington, A. Forencich, P. C. Sun, T. Rosing, Y. Fainman, G. Papen, and A. Vahdat, “Integrating Microsecond Circuit Switching into the Data Center,” In Proceedings of the 2013 ACM SIGCOMM Conference, 447–458, 2013.

36. J. Guo and Z. Zhu, “When Deep Learning Meets Inter-Datacenter Optical Network Management: Advantages and Vulnerabilities,” J. Lightwave Technol. 36(20), 4761–4773 (2018). [CrossRef]

37. A. Yu, H. Yang, W. Bai, L. He, H. Xiao, and J. Zhang, “Leveraging Deep Learning to Achieve Efficient Resource Allocation with Traffic Evaluation in Datacenter Optical Networks,” 2018 Optical Fiber Communications Conference and Exposition(OFC), 2018, pp. 1–3.

38. W. Lu, L. Liang, B. Kong, B. Li, and Z. Zhu, “Leveraging Predictive Analytics to Achieve Knowledge-Defined Orchestration in a Hybrid Optical/Electrical DC Network: Collaborative Forecasting and Decision Making,” 2018 Optical Fiber Communications Conference and Exposition (OFC), 2018, pp. 1–3.

39. H. Chen, Z. Qiao, and S. Fu, “Applying SDN based data network on HPC Big Data Computing – Design, Implementation, and Evaluation,” 2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 6007–6009.

40. Y. Xiong, Y. Li, B. Zhou, R. Wang, and G. Rouskas, “SDN Enabled Restoration with Triggered Precomputation in Elastic Optical Inter-Datacenter Networks,” J. Opt. Commun. Netw. 10(1), 24–34 (2018). [CrossRef]

Fast control plane for flexible and scalable optical interconnects

Abstract

1. Introduction

2. Network Architecture

2.1 Servers

2.2 Switches

2.3 Controller

3. Testbed

4. Experimental results

5. Discussion

6. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (1)

Optics Express

Network	Helios	OSA	HydRA	Fast Control Plane
Number of hosts	24	32	40	8
Number of ToRs	4	8	4	4
Traffic collection	77.4 ms	161 ms	/	21 ms
Topology calculation	19.4 ms	48 ms	/	90 µs
MEMS configuration	168.4 ms	9 ms	1.2 s	25 ms
Application	Synthetic traffic	Synthetic traffic	Benchmark	Benchmark