Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Passive optical interconnects at top of the rack: offering high energy efficiency for datacenters

Open Access Open Access

Abstract

This paper introduces a new concept, namely passive optical interconnect at top of the rack in the datacenter networks, and investigates several architectures, which use only passive optical components to interconnect different servers. In such a manner, the proposed schemes are able to offer higher bandwidth and significantly improve energy efficiency compared to their electronic counterpart that is based on commodity switches. The proposed passive optical interconnect schemes are experimentally demonstrated in order to validate the transmission performance. Besides, an assessment in terms of energy consumption and cost has also been carried out, which shows our proposed concept can significantly outperform the conventional commodity switches on energy efficiency while keeping the cost in the similar level.

© 2015 Optical Society of America

1. Introduction

Driven by popularity of the Internet based applications, e.g., cloud computing, social network service, multimedia content distribution, etc., the bandwidth requirement of datacenter networks is increasing exponentially. It has been reported that for every byte of data carried over the Internet, approximately 1 gigabyte (= 109 byte) of traffic are transmitted inside a datacenter or among different datacenters [1]. The bandwidth capacity within a supercomputing node (such as, large-scale datacenters developed by Google, Facebook, etc.) is expected to be 20 times larger every 4 years (e.g., 20Pb/s in 2016 and 400Pb/s in 2020 [2]). Meanwhile, another key factor that should be considered in the development of datacenters is energy consumption. Currently, the total power consumption of a datacenter usually could reach up to several megawatts and beyond [3]. About 20% of the total power in a datacenter is consumed by the network devices [4]. In 2012, the power consumption of the network equipment for datacenters worldwide (excluding cooling and power provisioning overhead) has already exceeded 1 TWh and this number is expected to continue increasing in the future [5]. However, due to the thermal dissipation problem, the power consumption cannot be afforded to increase as high as the capacity grows [2]. Obviously, keeping business as usual, both capacity and energy may not sustain the future datacenter traffic.

For a datacenter or supercomputer, the interconnection network is typically divided into several tiers: edge tier, aggregation tier, and core tier. In the edge tier, interconnects at top of the rack (ToR) are responsible for the communications among the servers within the same rack, while the aggregation/core tiers handle the traffic flows among different racks as well as the ones from/to the Internet. To solve the aforementioned problems on capacity and power consumption, many optical interconnects have been proposed for datacenters, such as hybrid electronic/optical approaches (e.g., Helios [6], hybrid photonic Ethernet switch [7]), and purely optical schemes. The existing purely optical schemes are mainly based on optical packet switching (OPS) (e.g., LIONS [8], flat datacenter network architecture [9]), optical circuit switching (OCS) (e.g., OSA [10]) and a combination of these two optical switching paradigms (e.g., hybrid OPS/OCS switch [11]). However, so far most of the research work only focused on the solutions for aggregation and core switches in datacenters, where the conventional commodity switches are still required at ToR. Currently, many ToR switches, offering 10GBase-T on the down-facing ports (e.g., Arista 7050TX, H3C S5820V2-52Q and IBM G8246T), are able to support the data rate up to 10Gb/s per server. However, they are still quite costly and consume a large amount of energy. Keeping business as it is, it could be extremely difficult to have energy-efficient switches at ToR for higher data rate in the future.

It should be noted that the ratio of the traffic handled by interconnects at edge tier (such as intra-rack flows) over the total traffic can be up to 80%, particularly for cloud based services [2]. Moreover, as the number of switches at ToR needed in datacenters is much larger than that in aggregation/core tier, the power consumed at the edge tier is typically dominant. It has been shown that up to 90% of the total power that spent on switching is consumed by ToR interconnects in the conventional commodity switch based datacenter networks. When the core/aggregation switch are upgraded to the optical based approaches (e.g., using the ones presented in [7, 8]) and ToR switches guarantee the data rate per server beyond 10Gb/s, even more than 95% of total power consumed by all types of interconnects could occur at the edge tier [12].

Meanwhile, the traffic generated by servers is very bursty [13], and hence an appropriate technique that can handle highly dynamic flows is of extreme importance for optical interconnect design at ToR. Considering the existing optical switching technologies (e.g., as proposed in [811] for aggregation/core switches in datacenters), the granularity of OCS is too coarse, while OPS still has some fundamental problems to support buffering and signal processing in optical domain. In contrast to optical switching, passive optical network (PON) is an alternative architecture using passive components connecting central office and individual users. It is widely known as a very successful approach for broadband access that can handle highly bursty traffic generated by individual users [1416]. PON has several attractive advantages, such as easy maintenance, high reliability, large bandwidth, low cost and low energy consumption due to its passive manner. There are several types of PONs, e.g., time division multiplexing (TDM)-PON, wavelength division multiplexing (WDM)-PON and time and wavelength division multiplexing (TWDM)-PON. TWDM-PON combines both WDM and TDM technologies and offers large capacity, low energy consumption and flexible resource allocation [17]. 10Gb/s TWDM PON has already been demonstrated by many vendors (e.g., Ericsson, Huawei, etc.) and higher speed PONs (such as 40Gb/s and beyond) have also been reported in the literature [18, 19]. However, PON is a point-to-multipoint based structure and cannot be used for optical interconnects at ToR, which should offer multipoint-to-multipoint connections, enabling each server to send/receive traffic to/from all the others in the same rack as well as the interface to the outside of rack.

With this in mind, this paper proposes passive optical interconnects at ToR and explores three different architectural options which further extend our previous work in [20] and conventional broadcast-and-select architecture in [21]. All three presented passive optical interconnects utilize WDM components and optical coupler to provide multipoint-to-multipoint connections at the edge tier so that they can inherit the advantages of TWDM-PON aforementioned. Due to the fact that the passive components interconnecting the servers do not introduce any processing and queuing delay, our proposed ToR schemes could have a potential advantage on latency compared to the conventional commodity switch based solutions. We experimentally demonstrate the proposed passive optical interconnects at ToR and verify their transmission performance. Besides, we also carry out an assessment in terms of energy consumption, which shows the proposed schemes can significantly improve the energy efficiency, particularly for the case with high data rate. The remainder of this paper is organized as follows. In Section 2, we introduce our proposed passive optical interconnects at ToR and describe three different schemes as well as their interface to core/aggregation tiers. In Section 3, we experimentally evaluate the transmission performance and validate that the proposed concept has high scalability supporting a large number of servers in the same rack. Section 4 analyzes the energy consumption and cost of the proposed schemes and performs a comparison with the commodity switch based solution. Finally, we draw the conclusions in Section 5.

2. Passive optical interconnects at ToR

In this section, we present three passive optical interconnects at ToR (shown in Fig. 1) as well as their interfaces to core/aggregation tiers. All our three schemes only utilize passive optical components including arrayed waveguide gratings (AWGs) and couplers (i.e., combiners/splitters) to interconnect different servers within the rack. We implement wavelength tunability at different places. The first two schemes (shown in Fig. 1(a) and Fig. 1(b), respectively) use wavelength tunable transmitters (WTTs), while the third one (shown in Fig. 1(c)) is inspired by a conventional broadcast-and-select architecture [21], where a wavelength tunable filter (WTF) is employed right before the receiver to select the wavelength of interest.

 figure: Fig. 1

Fig. 1 Passive optical interconnects at top of the rack: (a) Scheme I with a single receiver at each server, (b) Scheme II with two receivers at each server and (c) Scheme III with wavelength tunable filter at each server.

Download Full Size | PDF

2.1 Interconnections inside the rack

In the first two schemes, optical interface (OI) at each server can send the traffic to different destinations (i.e., another server in the same rack or the outside of the rack) by tuning optical carrier to a certain wavelength. Here, each server has a given wavelength label as its address for routing. The label is assigned when the system is initializing. In the same rack, each wavelength label is unique. In different racks of the datacenter, the same set of wavelengths can be reused and allocated independently. For Server k in Rack m with a wavelength label of λk, all the signals sent on λk is merged by combiners and routed to server k through an AWG. In the scheme, the combiners we used to aggregate all the signals are couplers (or say, splitters). In contrast to AWG, the combiners are colorless. Hence, the server can send signals on different wavelengths at different time slots via the same input port of the combiner. To avoid the conflicts (i.e., signals from different servers within the rack carried by the same wavelength arrive to the same destination simultaneously), a certain bandwidth allocation mechanism similar as multi-point control protocol which has been widely used in TDM based PONs [22, 23] is required. In each OI, the fiber Bragg grating (FBG) is used to separate the inter-rack flows (i.e., traffic to the outside of the rack) from the intra-rack flows. The central wavelength of the FBG is set as the same as the labeled wavelength of the server. If Server k transmits λj (j≠k), the signals first pass through the FBG and then destined to Server j in the same rack. Otherwise, the signals on λk are reflected by the FBG and then sent to the outside of the rack. All the servers in the rack are allowed to send signals to the outside of the rack simultaneously as different wavelengths are used. Meanwhile, up to N (where N is the total number of the servers in the rack) intra-rack flows can be carried out simultaneously.

In Scheme I (shown in Fig. 1(a)), OI in each server includes one WTT and one receiver. A commercial tunable 10Gb/s small form factor pluggable (XFP) can be used as transceiver at OI. The types of receiver (photodiode PIN, or avalanche photodiode APD) should be carefully chosen according to the required optical power budget. In this scheme, two stages of the combiners are used. The first one is used to combine the intra-rack signals from different servers in the rack (at Point A in Fig. 1(a)), while the second one (at Point B in Fig. 1(a)) merges the signals from the outside of the rack to the intra-rack flows. After passing through the Nx1 combiner at Point A, intra-rack signal suffers a huge insertion loss. On the other hand, since the traffic to the outside of the rack only passes through the combiner at Point B, its optical power can still be kept on relatively high level. Therefore, we select a coupler with an uneven combination ratio, e.g., 90:10 (90 for intra-rack flows) at Point B, which could minimize the optical power loss for the intra-rack signals. Furthermore, two AWGs are used in Scheme I. The first one is used to route all the traffic to different servers while the second one is used to mux the traffic sent from different servers to the outside of the rack.

To further reduce optical power loss in Scheme I, Scheme II is introduced, where the uneven combiner is avoided. The traffic from/to the outside of the rack is completely separated from intra-rack flow and is multiplexed/demultiplexed at the second AWG (i.e., AWG2 in Fig. 1(b)). Two independent receivers and one more circulator are needed in each server, which cause additional complexity and cost. But it brings some other advantages. For instance, the servers can receive the signals from the other servers in the same rack and the ones from the outside of the rack simultaneously. Meanwhile, the receiver in Scheme I requires a large dynamic range as one single receiver is responsible for both intra-rack flows and the ones from the outside of the rack, which could experience completely different optical power loss. It can be avoided in Scheme II due the two receivers dealing with these two types of traffic separately. Besides, the complexity and cost of additional receiver in Scheme II might be decreased by the advanced optical integration technology and massive development in the future.

In contrast to Schemes I and II, Scheme III employs the wavelength tunability at the receiver side (shown in Fig. 1(c)). In this scheme, the wavelength of the transmitter in the OI could be fixed. Thus, the signals from different servers are multiplexed by an AWG. The merged signals are broadcast to all the servers as well as the outside of the rack through a splitter. Each server uses a WTF that is tuned the assigned wavelength dynamically. This scheme is compatible with multicast, where the same signals are required to be sent from one server to several destinations at the same time. It should be noted that both AWG and splitter in Scheme III is 1 × (N + 1) as one port is reserved as the interface to the outside of the rack.

2.2 Interfaces to core/aggregation tiers

In general our proposed concept of passive optical interconnects is self-contained and can work with any aggregation/core switches by adding a proper interface (e.g., optical-electric (OE) conversion). Particularly, the proposed passive optical interconnects can be directly connected to optical core/aggregation switches, which can minimize or eliminate OE conversions for high energy efficiency of the overall datacenter network. Here, we take one simple example, which only employs an optical switch matrix (OSM) in the core tier, to show how the optical datacenter network works. The schematic view of the structure is shown in Fig. 2, where we use Scheme I at the edge tier for explanation.

 figure: Fig. 2

Fig. 2 The structure of the datacenter network using Scheme I at ToR

Download Full Size | PDF

For the datacenter network architecture shown in Fig. 2, we assume each rack has the same configuration of the wavelength labels for the servers. The inter-rack flows can be divided into two types. The first type is the inter-rack communication between the servers which have the same wavelength label in different racks (e.g., the flow from Server 1 in Rack 1 with wavelength label of λ1 to Server 1 in Rack 2). The signals sent by Server 1 in Rack 1 is first switched to Rack 2 via core switch, and then they are directly routed to Server 1 by AWG1 in Rack 2 (e.g., blue dashed line shown in Fig. 2). The second type of the inter-rack flows is communication between the servers with different wavelength labels in different racks. Here we take the traffic from Server 1 in Rack 1 to Server N in Rack 2 as an example. As the wavelength labels are different, the signals from Server 1 in Rack 1 via OSM to Rack 2 cannot be directly switched to the target server in Rack 2. Therefore, this type of communications require two-hop path including one intra-rack hop and one inter-rack hop, which can be realized by two methods. In the first method, the traffic from Server 1 in Rack 1 is first sent to Server 1 in Rack 2 (via the blue dashed line). Then the Server 1 in Rack 2 transmits the signals to Server N in Rack 2 through the intra-rack connection in Rack 2 (i.e., via the yellow solid line within Rack 2). In the second method, the flow is first sent within the rack to Server N in Rack 1 (via the yellow solid line within Rack 1), and then transmitted to the target server (via the yellow dashed line between Rack 1 and Rack 2). With the proposed hopping solution, we can keep the overall data center network all-optical, which can avoid OE conversion between the tiers and hence can achieve high energy efficiency. On the other hand, the hopping could affect the network performance. Considering some applications (e.g. cloud service [2] having the intra-rack traffic over the total data center traffic up to 80%), the influence of the hopping on network performance could be limited.

This hopping scheme for inter-rack communications applies to Scheme II as well. For Scheme III, two-hop path is not necessary because of its broadcast-and-select manner. However, Scheme III has a serious issue on insertion loss for inter-rack connection realized by any optical core/aggregation switch. The corresponding calculation of optical power budget is carried out in Section 3.

3. Transmission performance evaluation

In this section, we focus on transmission performance for the proposed passive optical interconnects at ToR and set up experiments to verify the feasibility. Our evaluation on transmission performance is composed of the link loss calculation along with experimental measurements in terms of bit error rate (BER), eye diagram and optical spectrum.

Calculation of optical link lossTable 1 includes the input data for link loss calculation of intra-rack communication in all three investigated passive optical interconnect schemes. As shown in Table 1, Scheme I requires the highest optical power budget (shows the highest total link loss) for intra-rack transmission among all the considered schemes (but still less than 25dB for the rack with up to 64 servers). Furthermore, 1xN coupler always contributes the dominant part to the overall insertion loss. If the launch power of the transmitter (Tx) is approximately 0dBm (a typical value for many commercial products), the sensitivity of the receiver (Rx) should be down to −18.0dBm for the case with 16 servers per rack, which still can be achieved by a PIN. If the rack has to support more servers (e.g., up to 64 servers), the sensitivity of Rx has to be lower than −24.7dBm. An APD is needed in that case. Considering the channel spacing as 100GHz, C-band (with in total 61 wavelengths according to ITU-T G.692) is sufficient for a typical number of servers in a rack (no more than 50). If a higher number of servers are required per rack, we can either decrease the channel spacing or extend waveband (e.g., using L-band as well).

Tables Icon

Table 1. Optical Link Loss of Intra-Rack Communications

We also calculate the link loss of the inter-rack communications where OSM is employed as core/aggregation switch as shown in Fig. 2. The cases with different number of servers in a rack are considered. The connection loss of 2dB for inter-rack communications is assumed. As shown in Table 2, the link loss is not sensitive with the number of servers per rack in Scheme I and II, as the inter-rack flow does not go through the 1 × N combiner which has insertion loss depending on the splitting ratio. On the other hand, Scheme III has much higher link loss, which varies with the number of servers per rack. It is because the inter-rack signals pass two 1 × N combiners in two different racks. By employing the optical amplifiers before all of the switching components like in [24], Scheme III could overcome high inter-rack link loss and prevent transient power fluctuations in the optical amplifiers during circuit reconfiguration. But it is still difficult to cope with the very-high insertion loss (e.g., 48.9dB in the case with 64 servers with a rack). Therefore, Scheme III might not proper for large-scale all-optical datacenter networks, where there is no any OE/EO conversion.

Tables Icon

Table 2. Optical Link Loss of Inter-rack Communications

Experimental validation of transmission performance —We have carried out the experiments to investigate the feasibility of the proposed Scheme I and II, where the wavelength tunability is achieved at transmitter side. The experiment setup is shown in Fig. 3. The data rate is 10Gb/s. Due to the limitation of the equipment in our lab, we demonstrate a simplified structure with 3 servers. These three servers have the wavelength labels of 1546.1nm (using Channel 39 in ITU-T G.692), 1539.8nm (using Channel 47 in ITU-T G.692) and 1532.7nm (using Channel 56 in ITU-T G.692), respectively. As shown in Fig. 3, we use the pattern generator (PG, Anritsu MT1810A and CENTELLAX TG2P1A) to emulate the traffic generated at the server in order to test the transmission performance. Server 1 has a tunable transmitter (Tx, Agilent 81960A modulated by Conquer KG-AMBOX) sending a 231 Pseudo random binary sequence (PRBS) as the signals for BER test. The FBG filter in the experimental setup integrates both FBG and one circulator. Its central wavelength is 1546.1nm, the same as the wavelength label of Server 1. When the transmitting wavelength of Tx 1 is tuned to 1546.1nm, the signals will come to the outside of the rack (i.e., Point 4 in Fig. 3). In the other case, the signals will go through Point 2 and arrive to the other servers. The signals sent by Server 2 (i.e., with a 215 −1 PRBS on the wavelength of 1528.8nm) and Server 3 (i.e., with a 215 −1 PRBS on the wavelength of 1537.1nm) in this setup act as interference channels in order to investigate impact of several simultaneous intra-rack flows on BER performance. The three intra-rack flows sent by different servers are first merged by a combiner at Point 2, and then pass a variable optical attenuator, which can introduce different insertion loss for BER testing. An AWG is implemented to separate different intra-rack signals to their destined ports. The channel spacing of the AWG in our setup is 100GHz. When the wavelength of Tx 1 is tuned to 1532.7nm, the signals from Server 1 can reach Server 3. An APD (Optone XFP-10G-ZR Module) is employed as the receiver at Server 3. The power meter and bit error rate tester (BERT, Anritsu MT1810A) is used to measure the BER performance at different levels of the received optical power (ROP). The results of BER versus ROP in different situations are shown in Fig. 4(a). We tested the BER in 4 cases: 1) back-to-back (B2B), 2) 1-channel working, 3) 3-channel working and 4) to outside of rack. B2B scenario is carried out as benchmark. Case 2 and 3 test intra-rack communications without/with impact of crosstalk and the last one measures the signal quality for the traffic to the outside of the rack.

 figure: Fig. 3

Fig. 3 Schematic diagram of experimental setup (bit rate = 10Gb/s). The inset optical eye-diagrams and spectrum diagrams show the results measured at different points.

Download Full Size | PDF

 figure: Fig. 4

Fig. 4 (a) Results of BER vs. received optical power (ROP) in different tested cases and electrical eye-diagrams at (b) BER = 10−9, (c) BER = 10−3 for the case of 1-channel working and at (d) BER = 10−9, (e) BER = 10−3 for the case of 3-channel working.

Download Full Size | PDF

As shown in Fig. 4(a), the curves for the first three cases are quite close, which means the impairment introduced by the passive components as well as the crosstalk is negligible. It can be found that signal quality is mainly determined by the launch power of the transmitter and the receiving sensitivity of the receiver. The sensitivity of the APD implemented in the experiments is −28.5dBm (at BER = 10−9), and the launch power of all the employed transmitters are −1dBm. Thus the maximal link loss in the experiments can be up to 27.5dB, which offers more than 2dB margin compared to the calculated value for 64-server scenario (i.e., 24.7dB in Scheme I). Besides, we also record the eye diagrams for cases of 1-channel and 3-channel working (see Figs. 4(b)-4(e)) at BER = 10−9 and BER = 10−3. They are nearly identical at the same level of BER.

Furthermore, we include transmission performance of the signals to the outside of the rack. The optical eye diagrams measured at Point 1 and Point 4 are shown in Fig. 3. It can be seen that the signals measured at Point 1 is slightly better than the ones at Point 4, which is affected by passing the FBG. In Fig. 4(a), it can be seen that the quality of signals to the outside of the rack is worse than the intra-rack traffic at the same level of ROP. The reason is the side-mode suppression of the FBG. However, the maximal optical power budget for inter-rack connection measured by the experiments at BER = 10−9 still can reach more than 26 dB, which obviously exceeds the calculated inter-rack link loss in Scheme I and Scheme II, according to Table 2. However, it could be difficult for Scheme III to achieve all-optical datacenter network by merging it directly with optical core/aggregation switch due to its large insertion loss. In this case, either optical amplifier or OE conversion is needed, which is at expense of extra cost and energy consumption.

As shown in the experiment results, 10Gb/s can be supported in our schemes with the current technologies. Meanwhile, in some literatures, e.g., in [19, 25], 40Gb/s optical transceivers can offer more than 30dB link budget for long-haul transmission or PON, which could become technically and economically viable alternatives for datacenter applications in the near future.

4. Analysis of energy consumption and cost

Because of the absence of active devices, the passive interconnects are expected to have high energy efficiency. In this section, we evaluate the energy consumption and cost for all three presented optical interconnect schemes. For comparison purpose, we also carry out analysis for the commodity switch based solution, where each server has one network interface card to send/receive data and a commodity switch is employed at ToR. The considered commodity switch includes downlink network interfaces towards servers and uplink network interfaces towards the core/aggregation tier. Besides, we also investigate the influences of the number of servers and bandwidth per server on the energy consumption.

4.1 Analysis of energy consumption

Methodology and input data — The models we used for the calculation of power consumption have been taken from [12] and input data are listed in Table 3. All the network interface cards shown in Table 3 include the module for traffic control and management. All the values of optical components considered in our proposed schemes are obtained from [26, 27]. The devices employed in the commodity switch based solution are listed as footnotes of Table 3.

Tables Icon

Table 3. : Input Data for Power Consumption and Cost

ResultsFig. 5 shows the power consumption of all the evaluated schemes. It can be seen that the proposed passive optical interconnects have significantly lower power consumption than their electronic counterpart, leading to up to 80% of energy saving. The larger the number of servers per rack is, the higher the energy saving can be achieved by the proposed optical interconnects. When the bandwidth per server increases from 1Gb/s to 10Gb/s, the commodity switch based solution almost doubles its power consumption while our proposed optical interconnects grows at a much lower speed (≤ 20%). It verifies that the optical interconnects is much more energy-efficient compared to its electronic counterpart, particularly for the case with high capacity, e.g., 10Gb/s (and maybe even higher). Meanwhile, it can be noted that the amount of power consumed by three proposed optical approaches is very similar. Since one more receiver is required at each server, Scheme II consumes slightly more energy than the other two schemes.

 figure: Fig. 5

Fig. 5 Results of power consumption: (a) as a function of the number of servers within the rack with bit rate of 10 Gb/s and (b) as a function of bit rate for a rack with 48 servers.

Download Full Size | PDF

4.2 Analysis of cost

Methodology and input data — For cost, the commodity switch, which is commercially available could be much cheaper than the passive optical interconnects currently (due to lack of massive production). Therefore, we consider the cost values in the vision of future. All the values of optical components considered in our proposed schemes are obtained from [26, 27], which are predicted for next generation optical access network systems in Horizon 2020. To make the evaluation comparable, the cost listed in Table 3 for the commodity switch is estimated based on their recent prices (i.e., the ones in 2014) assuming 25% reduction per year to get the values of Horizon 2020 [28]. We consider a basic cost unit (CU) as the cost of a 1 Gb/s optical network interface card with fixed transmitter, which is estimated around 50 USD [26]. Moreover, the cost of the cable/fiber is much lower than the network equipment, and hence is not included in our calculations. We have noticed that the results presented in this section are based on the input received from vendors and operators. These data could vary depending on the country, the time, the manufacturer, scaling ratio, etc. Therefore, we also include a sensitive analysis, which studies the impact of input data variation on the cost assessment.

ResultsFig. 6 shows the cost results for our proposed passive optical interconnects as well as the commodity switch based approach. In contrast to power consumption, the advantage of the cost is not that obvious on passive optical interconnects compared to their electronic counterpart. The proposed schemes using wavelength tunable transmitters are more expensive than the commodity switch. However, it can be clearly seen that this difference is reduced by increasing the data rate. Therefore, we expect that a more obvious cost benefit for these two schemes might be shown in case a higher data rate (e.g., 40Gb/s and beyond) is needed. Meanwhile, Scheme III is least costly solution among all the evaluated approaches at data rate of 10Gb/s, benefiting from the lower price of optical network interface card with fixed transmitter.

 figure: Fig. 6

Fig. 6 Results of cost: (a) as a function of the number of servers within the rack with bit rate of 10 Gb/s and (b) as a function of bit rate for a rack with 48 servers.

Download Full Size | PDF

Figure 7 shows the sensitivity analysis of cost assessment, where we divide the total cost of each scheme into two parts, namely in-server and out-of-server. The in-server part includes the network interface card (NIC) at each server, while the out-of-server part covers components for interconnection at ToR. The results show the total cost variation when the expense of the components in in-server or out-of-server part varies in the range between −20% and 20%. All three passive optical interconnects perform very similar. When the number of the servers increases, the total cost becomes more sensitive to the cost of the in-server part for all the considered schemes. For commodity switch the total cost varies up to 8.1% - 13.4% with 20% cost variation of in-server part when the number of servers per rack increases from 24 to 48, while this value is in range from 12.1%/13.5%/13.6% to 16.4%/17.2%/17.3% for three passive solutions, respectively. Meanwhile, with higher data rate the impact of the in-server part on total cost increases for all three considered passive optical interconnects, while it decreases for the commodity switch. This is because that the cost increase of commodity switch (i.e., out-of-server part) handling 10Gb/s per port compared to 1Gb/s per port is much higher than the cost rise occurred at in-server part. On the other hand, the passive components for interconnection are not sensitive with the data rate. Therefore, increasing the data rate only affects the cost of the in-server part for the passive solutions.

 figure: Fig. 7

Fig. 7 Sensitivity analysis of cost assessment (a) with different numbers of servers per rack at data rate of 10Gb/s and (b) at different data rates per server with 48 servers per rack

Download Full Size | PDF

5. Conclusions and future work

This paper focuses on the edge tier of datacenter networks and investigates different architectural options, which inherit the advantages of utilizing passive optical components to interconnect different servers within a rack, such as high capacity and energy efficiency. Moreover, the proposed passive optical interconnects could co-ordinate with optical aggregation/core switches to realize all-optical datacenter network handling the traffic generated by a large amount of servers. We have experimentally validated the feasibility of the proposed schemes and analyzed the results for both intra- and inter-rack communications. The transmission performance has been evaluated in terms of the total link loss, BER, eye diagram and optical spectrum. Scheme III inspired by [21] has much higher insertion loss of inter-rack connection compared to the proposed Scheme I and II, making it difficult to directly connect to optical aggregation/core switches. Besides, we have also carried out an assessment of power consumption and cost along with a sensitivity analysis. Our results have shown that the proposed passive optical interconnects are much more energy-efficient compared to the commodity switch based on the current technologies while maintaining the cost at the similar level in vision of Horizon 2020, particularly for the case with higher data rate. The sensitivity analysis indicates that the network interface card is the key factor of the overall cost for the presented passive optical interconnects at ToR.

Some existing control plane architectures (e.g. the one proposed [29]) can be used for the proposed passive optical interconnects at ToR. However, a tailored control protocol and bandwidth allocation algorithm still needs to be developed in order to achieve good network performance. It should be noted that for many commercial tunable transceivers, including those we implement in our experiments, the average wavelength tuning time is still in the magnitude of milliseconds, which could affect network performance, e.g., bandwidth utilization, delay and jitter. There have been many studies on wavelength tunable transceivers for fast tuning time, such as the technology of sampled grating distributed Bragg reflector (SG-DBR) laser [30, 31] and V-cavity DBR [32] laser, where the wavelength switching time can be much shorter, e.g., in the magnitude of microsecond or nanoseconds. On the other hand, as mentioned before the passive components between the servers do not introduce any processing and queuing delay, and hence we expect the average latency in passive optical interconnects still could be lower than that of the conventional commodity switch based solutions. In order to fully explore this issue, we will further study the design of control protocol and bandwidth allocation algorithm, which can schedule the resource in both wavelength and time domains to efficiently accommodate the burst traffic and mitigate the effect of the tuning speed of WTT and WTF on the network performance.

Acknowledgment

The authors would like to thank Ting Lu, Hao Dai and Runzhou Zhang at Zhejiang University for the help on experimental setup. The work described in this paper was carried out with the support by the National High Technology Research and Development Program (863) of China (No. 2012AA012201), and the projects titled “Enabling Scalable and Sustainable Data Center Networks”, funded by Swedish Foundation of Strategic Research, and “Towards Flexible and Energy-Efficient Datacentre Networks”, funded by Swedish Research Council.

References and links

1. G. Astfalk, “Why optical data communications and why now?” Appl. Phys., A Mater. Sci. Process. 95(4), 933–940 (2009). [CrossRef]  

2. C. Kachris and I. Tomkos, “A survey on optical interconnects for data centers,” IEEE Commun. Surv. Tut. 14(4), 1021–1036 (2012). [CrossRef]  

3. K. Kant, “Data center evolution: a tutorial on state of the art, issues, and challenges,” Comput. Netw. 53(17), 2939–2965 (2009). [CrossRef]  

4. Green Data Project, “Where does power go?” (Green Data Project, 2008), http://www.greendataproject.org.

5. S. Lambert, W. Van Heddeghem, W. Vereecken, B. Lannoo, D. Colle, and M. Pickavet, “Worldwide electricity consumption of communication networks,” Opt. Express 20(26), B513–B524 (2012). [CrossRef]   [PubMed]  

6. N. Farrington, G. Porter, S. Radhakrishnan, H. H. Bazzaz, V. Sub- ramanya, Y. Fainman, G. Papen, and A. Vahdat, “Helios: a hybrid electrical/optical switch architecture for modular data centers,” in Proceeding of ACM SIGCOMM, (2010), pp. 339–350.

7. H. Ma, X. Yang, H. Mehrvar, Y. Wang, S. Li, A. Graves, D. Wang, H. Y. Fu, D. Geng, D. Goodwill, and E. Bernier, “Hybrid photonic ethernet switch for datacenters,” in Optical Fiber Communication Conference, (2014), M3E.6.

8. Y. Yin, R. Proietti, X. Ye, C. J. Nitta, V. Akella, and S. J. B. Yoo, “LIONS: an AWGR-based low-latency optical switch for high-performance computing and data centers,” IEEE J. Sel. Top. Quant. 19(2), 3600409 (2013). [CrossRef]  

9. M. Wang, L. Jun, D. L. Stefano, D. Harm, and C. Nicola, “Novel flat datacenter network architecture based on scalable and flow-controlled optical switch system,” in 39th European Conference and Exhibition on Optical Communication, (2013), pp. 1266–1268. [CrossRef]  

10. K. Chen, A. Singla, A. Singh, K. Ramachandran, L. Xu, Y. Zhang, X. Wen, and Y. Chen, “OSA: an optical switching architecture for data center networks with unprecedented flexibility,” IEEE/ACM Trans. Netw. 22(2), 498–511 (2014). [CrossRef]  

11. Y. Shu, G. Zervas, Y. Yan, S. Peng, S. Yan, E. Hugues-salas, and D. Simeonidou, “Programmable optical packet/circuit switched data centre interconnects: traffic modeling and evaluation,” in 40th European Conference and Exhibition on Optical Communication, (2014), pp. 1–3. [CrossRef]  

12. M. Fiorani, S. Aleksic, M. Casoni, L. Wosinska, and J. Chen, “Energy-efficient elastic optical interconnect architecture for data centers,” IEEE Commun. Lett. 18(9), 1531–1534 (2014). [CrossRef]  

13. T. Benson, A. Akella, and D. Maltz, “Network traffic characteristics of Data Centers in the Wild,” in Proceeding of ACM SIGCOMM, (2010), pp. 267–280.

14. X. Yin, B. Moeneclaey, X. Z. Qiu, J. Verbrugghe, K. Verheyen, J. Bauwelinck, J. Vandewege, M. Achouche, and Y. Chang, “A 10Gb/s APD-based linear burst-mode receiver with 31dB dynamic range for reach-extended PON systems,” Opt. Express 20(26), B462–B469 (2012). [CrossRef]   [PubMed]  

15. B. Chen, J. Chen, and S. He, “Efficient and fine scheduling algorithm for bandwidth allocation in ethernet passive optical networks,” IEEE J. Sel. Top. Quant. 12(4), 653–660 (2006). [CrossRef]  

16. J. Chen, B. Chen, and L. Wosinska, “Joint bandwidth scheduling to support differentiated services and multiple service providers in 1G and 10G EPONs,” J. Opt. Commun. Netw. 1(4), 343–351 (2009). [CrossRef]  

17. Y. Luo, X. Zhou, F. Effenberger, X. Yan, G. Peng, Y. Qian, and Y. Ma, “Time- and wavelength-division multiplexed passive optical network (TWDM-PON) for next-generation PON Stage 2 (NG-PON2),” J. Lightwave Technol. 31(4), 587–593 (2013). [CrossRef]  

18. P. Vetter, “Next generation optical access technologies,” in European Conference and Exhibition on Optical Communication, Amsterdam, (2012), Tu.3.G. [CrossRef]  

19. L. Yi, Z. Li, M. Bi, W. Wei, and W. Hu, “Symmetric 40-Gb/s TWDM-PON with 39-dB power budget,” IEEE Photonic Tech. L. 25(7), 644–647 (2013). [CrossRef]  

20. Y. Gong, Y. Lu, X. Hong, S. He, and J. Chen, “Passive optical interconnects at top of the rack for data center networks,” in 2014 International Conference on Optical Network Design and Modeling, (2014), pp. 78–83.

21. P. Green Jr, L. A. Coldren, K. M. Johnson, J. G. Lewis, C. M. Miller, J. F. Morrison, R. Olshansky, R. Ramaswami, and E. H. Smith, “All-optical packet-switched metropolitan-area network proposal,” J. Lightwave Technol. 11(5), 754–763 (1993). [CrossRef]  

22. IEEE Standard 802.3ah, “IEEE Standard for Information technology–Local and metropolitan area networks–Part 3: CSMA/CD Access Method and Physical Layer Specifications Amendment: Media Access Control Parameters, Physical Layers, and Management Parameters for Subscriber Access Networks”, 2004.

23. G. Kramer, B. Mukherjee, and G. Pesavento, “IPACT: A dynamic protocol for an ethernet PON (EPON),” IEEE Commun. Mag. 40(2), 74–80 (2002). [CrossRef]  

24. N. Farrington, A. Forencich, G. Porter, P.-C. Sun, J. Ford, Y. Fainman, G. Papen, and A. Vahdat, “A multiport microsecond optical circuit switch for data center networking,” IEEE Photonic Tech. L. 25(16), 1589–1592 (2013). [CrossRef]  

25. B. Mason, S. Chandrasekhar, A. Ougazzaden, C. Lentz, J. M. Geary, L. L. Buhl, L. Peticolas, K. Glogovsky, J. M. Freund, L. Reynolds, G. Przybylek, F. Walters, A. Sirenko, J. Boardman, T. Kercher, M. Rader, J. Grenko, D. Monroe, and L. Ketelsen, “Photonic integrated receiver for 40 Gbit/s transmission,” Electron. Lett. 38(20), 1196–1197 (2002). [CrossRef]  

26. OASE Project 2010–2013, “Optical Access Seamless Evolution”, European Community’s Seventh Framework Program (FP7/2010–2013) under grant agreement n° 249025 (ICT-OASE).

27. DISCUS Project 2012–2015, “The Distributed Core for unlimited bandwidth supply for all Users and Services” European Community's Seventh Framework Programme (FP7/2012–2015) under grant agreement n° 318137 (ICT-DISCUS).

28. M. Quagliotti and L. A. B. Telecom Italia, Via Olivetti, 6, Torino, Italy, 10148 (personal communication, 2014).

29. M. Fiorani, S. Aleksic, M. Casoni, L. Wosinska, and J. Chen, “Energy-efficient elastic optical interconnect architecture for data centers,” IEEE Commun. Lett. 18(9), 1531–1534 (2014). [CrossRef]  

30. F. Delorme, G. Alibert, C. Ougier, S. Slempkes, and H. Nakajima, “Sampled-grating DBR lasers with 181 wavelengths over 44 nm and optimized power variation for WDM applications,” in Optical Fiber Communication Conference and Exhibit, (1998), pp. 379–381.

31. M. Ogusu, K. Ide, and S. Ohshima, “Fast and precise wavelength switching of an SG-DBR laser for 1.07-b/s/Hz DWDM systems”, in Optical Fiber Communication Conference, (2005), paper OTuE4.

32. S. Zhang, J. Meng, S. Guo, L. Wang, and J. J. He, “Simple and compact V-cavity semiconductor laser with 50×100 GHz wavelength tuning,” Opt. Express 21(11), 13564–13571 (2013). [CrossRef]   [PubMed]  

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1
Fig. 1 Passive optical interconnects at top of the rack: (a) Scheme I with a single receiver at each server, (b) Scheme II with two receivers at each server and (c) Scheme III with wavelength tunable filter at each server.
Fig. 2
Fig. 2 The structure of the datacenter network using Scheme I at ToR
Fig. 3
Fig. 3 Schematic diagram of experimental setup (bit rate = 10Gb/s). The inset optical eye-diagrams and spectrum diagrams show the results measured at different points.
Fig. 4
Fig. 4 (a) Results of BER vs. received optical power (ROP) in different tested cases and electrical eye-diagrams at (b) BER = 10−9, (c) BER = 10−3 for the case of 1-channel working and at (d) BER = 10−9, (e) BER = 10−3 for the case of 3-channel working.
Fig. 5
Fig. 5 Results of power consumption: (a) as a function of the number of servers within the rack with bit rate of 10 Gb/s and (b) as a function of bit rate for a rack with 48 servers.
Fig. 6
Fig. 6 Results of cost: (a) as a function of the number of servers within the rack with bit rate of 10 Gb/s and (b) as a function of bit rate for a rack with 48 servers.
Fig. 7
Fig. 7 Sensitivity analysis of cost assessment (a) with different numbers of servers per rack at data rate of 10Gb/s and (b) at different data rates per server with 48 servers per rack

Tables (3)

Tables Icon

Table 1 Optical Link Loss of Intra-Rack Communications

Tables Icon

Table 2 Optical Link Loss of Inter-rack Communications

Tables Icon

Table 3 : Input Data for Power Consumption and Cost

Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.