Making path selection faster: a routing algorithm for ONoC

Lijing Zhu; Huaxi Gu; Yintang Yang; Yawen Chen

doi:10.1364/OE.419003

1. Introduction

With the development of the silicon nanophotonics technology for the on-chip communication, ONoC is becoming one of the emerging technologies for future high-performance many-core processors [1–6]. ONoC can solve performance limitations faced by ENoC (Electronic Network-on-Chip), such as high bandwidth, efficiency in power consumption, and lower latency [7–12].

ENoC design has a very long history of effectively using hardware description languages (such as VeriLog, VHDL) and CMOS, to deal with design complexity of ENoC. The various implementations of ONoC can be divided into two categories: Optical Packet Switching (OPS) and Optical Circuit Switching (OCS) [2–4,12]. When a router receives a packet, the typical processes of OPS-based network are storage, arbitration, and data being forwarded to the next router. The nature of OPS-based network can be described by this term Store-and-Forward (SaF). Due to the lack of optical packet store and process module, the implementation of optical packet switching is unpractical. Although the optical packet switching has benefits of ultra-high bandwidth and low latency, ONoC still uses optical circuit switching to transfer the message because of the inability to perform light buffering and logic.

Optical circuit switching is divided into three states [2,4,13–16]: the optical path-setup state, the optical signal transmission state, and the optical path teardown state. The optical path-setup state is a process of link bandwidth reservation, which requires two ends to be acknowledged through the request packet and response packet. In the optical signal transmission state, the relevant configuration of the selected path is fixed. If the nodes along this path don’t receive the path tear-down packet, the path will be reserved for a long time. In the optical path tear-down state, source node will send the path tear-down packet, and the reserved optical path can be released for other node pairs. This implementation of OCS is generally used in ONoC architectures. It is easy to understand that when a communication pair occupies one path, other pairs of communication nodes cannot use any link of this path [2,4,14,16]. Consequently, path-setup packet of others communication pairs will be blocked if its routing path is overlapped with the occupied path, which incurs high network congestion and resource contention. Therefore, it is important to choose the appropriate path for packets. However, the existing proposed routing algorithms are mostly based on ENoC, and do not take the characteristic of OCS-based ONoC into consideration. The characteristic of OCS will result in more critical network congestion and degrade the performance under high traffic load than in ENoC. Therefore, routing algorithm that considers the characteristics of OCS need to be carefully designed.

In this paper, we design a congestion-aware routing, which considers the characteristic of OCS. The contribution of this paper includes three parts: 1) loophole-routing is proposed to alleviate the congestion and latency, which can select the efficient paths for packets through network quickly. 2) The congestion-caused latency model is established to predict the router latency of packets. This proposed loophole-routing applies the prediction and learning method to formulate and optimize the routing decisions. 3) The simulation results show that the loophole-routing has a lower latency and improved throughput by 10%, 16.7%, 15.8% and 9.5% compared with existing popular routings under four kinds of traffic patterns.

The rest of the paper is organized as follows: In Sect. II, we review some proposed routing algorithms, which are used to lower congestion, reduce optical loss and save the power consumption for 2-D mesh-based ONoC. Sect. III provides the preliminaries of this paper, analyses the component of end-to-end latency in ONoC and establishes congestion-caused latency model. In Sect. IV, we introduce the new routing algorithm in details. Sect. V presents the theoretical evaluations of this algorithm. Simulation results are presented in Sect. VI compared with XY routing. Sect. VII concludes the paper and provides future research.

2. Related work

Recently, lots of routing algorithms have been proposed to handle the situation of unbalanced traffic, power consumption, and fault in NoCs, but the routing algorithms specifically proposed for ONoC are still not well studied. Routing algorithms for ONoC can be divided into two categories: traditional ONoC-compatible routing algorithms adopted from ENoC, and routing algorithms specially designed by considering the characteristics of ONoC. Traditional ONoC-compatible routing algorithms such as XY routing are popular used in ONoC because of its simplicity, deadlock-freedom and livelock-freedom. XY routing algorithm is a deterministic routing with packets first travelling along the x dimension and then along the y dimension [12], which is simple to implement. Compared with the deterministic routing, adaptive routing can provide more paths for packets to bypass hotspot and balance the traffic load. Routing algorithms based on turn model are account for adaptive routing algorithms [17–19]. These algorithms are deadlock-free by restricting the turns. The basic idea of Odd–Even turn model [20,21] is to limit turns to certain directions. The DyXY algorithm, which uses the XY and YX routing optionally according to the congestion, is proposed by Ming Li in [22].

In addition to these customary ONoC-compatible routing algorithms, there are also many methods to determine the paths in ONoC. PROTON [23] and PROTON+ [24] are automatic tools for placement and routing on ONoC topologies. They place micro-rings (MRs) with appropriate spacing and routes waveguides to minimize insertion loss. PROTON+ is an extension of PROTON. It dedicates for placement and routing in 3D ONoC with more flexibility. Similar works also proposed in [25,26]. Tan et al. [9] proposed a novel photonic-electrical hybrid on-chip architecture and wavelength-routed method based on optical interconnection. By using space division multiplexing (SDM) and hybrid wavelength assignments method, this ONoC architecture can achieve low end-to-end latency transmission. Zhu et al. [27] proposed a new 7 × 7 non-blocking optical router for ONoCs, and it’s also designed based on wavelength assignments to improve the latency and throughput performance of ONoCs. These approaches above are all designed for wavelength-routing ONoC [9,13,27]. Jae et al. [28] proposed three routing algorithms for fat-tree-based ONoCs to reduce the insertion loss and energy consumption by finding the path with minimum MR drop loss. Peng et al. [29–31] proposed some method on 3D ONoCs, including fault tolerant routings, OR structure design, high-reliability mapping algorithm design, and analysis of insertion loss and crosstalk performance. For instance, the fault tolerant routing, FTRA-BL, is proposed to improve the reliability of ONoC by utilizing the bidirectional waveguides in disabled ORs as backup resources.

Routing algorithms in [32,33] are proposed with the main purposes of reducing the loss and improving the thermal reliability. In [14], a routing technique is proposed to resolve path collisions, and this algorithm needs a definite electrical router structure and flow control mechanism to implement. An efficient hybrid switching mechanism [34] for ONoC systems is proposed to decrease the contention in OCS-based ONoCs. The proposed mechanism allows data and control signals to be transferred over the circuit control network layer in order to reduce packets congestion and the consequent latency and power overheads.

3. Prediction model and effectiveness analysis

3.1 Latency in optical network-on-chip

In this section, an analysis about packet latency of optical network-on-chip will be explored to address the key problem about the path selection on optical network-on-chip.

The common structure of the ONoC architecture is illustrated in Fig. 1(a). The structure is composed of three functional layers: a processing layer, an electronic control layer, and a photonic data layer. The processing layer is where the processing cores sit in and act as sources and sinks for all on-chip communications. The photonic data layer is the top layer, providing optical links between any pair of communicating cores. However, because the photonic layer cannot perform buffering and logical processing, an electronic control layer is provided to achieve the configuration and routing. Therefore, analysis of router latency is based on electronic router, as showed in Fig. 1(b).

Fig. 1. a) Schematic of Optical Network-on-Chip, b) electronic router structure of ONoC.

Download Full Size | PDF

The end-to-end latency of packets ${\textrm{T}_{\textrm{eteDelay}}}$ in OCS-based ONoC can be formulated as follows:

(1)$$\begin{aligned} {T_{\textrm{eteDelay}}} &= \frac{{{L_{\textrm{path}}}}}{{{v_{\textrm{wire}}}}} + \frac{{{L_{\textrm{path - setup}}}}}{{{B_{\textrm{electrical}}}}}\textrm{ + }H \times {T_{\textrm{E - router}}}\\ &\textrm{ + }\frac{{{L_{\textrm{path}}}}}{{{v_{\textrm{waveguide}}}}} + \frac{{{L_{\textrm{packet}}}}}{{{B_{\textrm{optical}}}}}\textrm{ + }{T_{\textrm{E/O}}} + {T_{\textrm{O/E}}} \end{aligned}$$

where ${\textrm{L}_{\textrm{path}}}$ is the length of wires or waveguides between a source and a destination IP core in network. ${\textrm{v}_{\textrm{wire}}}$ is the propagation velocity in wire of electrical layer. ${\textrm{L}_{\textrm{path} - \textrm{setup}}}$ is the size of path-setup packet that transmits in an electrical network. ${\textrm{B}_{\textrm{electrical}}}$ represents the channel bandwidth. $\textrm{H}$ is the hop count of every packet, which is determined by routing algorithm. ${\textrm{T}_{\textrm{E} - \textrm{router}}}{\; }$ is the processing latency of path-setup packet in a single electrical router. ${\textrm{v}_{\textrm{waveguide}}}$ is the propagation velocity in waveguide of the optical layer. ${\textrm{L}_{\textrm{packet}}}$ is the size of data load packet that transmits in optical network, and ${\textrm{B}_{\textrm{optical}}}$ represents the bandwidth of the optical waveguide. ${\textrm{T}_{\textrm{E}/\textrm{O}}}$ and ${\textrm{T}_{\textrm{O}/\textrm{E}}}{\; }$ represent the conversion time of packet from electrical to optical signals and from optical to electrical signals, respectively.

The first component of ${\textrm{T}_{\textrm{eteDelay}}}$ corresponds to the period spending traversing the electrical network layer. The second component of ${\textrm{T}_{\textrm{eteDelay}}}$ corresponds to the serialization time for the path-setup packet transmitting across a channel of ${\textrm{B}_{\textrm{electrical}}}$. The last five components, $\textrm{H} \times {\textrm{T}_{\textrm{E} - \textrm{router}}}$, ${\textrm{L}_{\textrm{path}}}/{\textrm{v}_{\textrm{waveguide}}}$, ${\textrm{L}_{\textrm{packet}}}/{\textrm{B}_{\textrm{optical}}}$, ${\textrm{T}_{\textrm{E}/\textrm{O}}}$, ${\textrm{T}_{\textrm{O}/\textrm{E}}}$, all depend on known or fixed parameters, which cannot be reduced by the routing algorithm. $\textrm{H}$ is a known metric obtained when the minimal routing is employed. ${\textrm{T}_{\textrm{E} - \textrm{router}}}{\; }$ is the sum of the processing pipeline latency and the additional waiting latency due to the network congestion, and the additional waiting latency can be optimized by congestion-aware routing algorithm.

3.2 Prediction of port congestion latency

Since the congestion in the path will introduce additional latency and degrade the performance, this section analyzes the router latency, ${\textrm{T}_{\textrm{E} - \textrm{router}}}$, which is the time for the path-setup in a router. Consider this, the congestion-caused latency model is established.

When a packet is sent to a router, the routing computation unit in this router will select an output port for this packet. If this output port is free, it will send the packet. If this output port is occupied, the packet will wait in the output buffer queue, which is a FIFO queue. Packets will be sent to node in first-in-first-out manner. The router latency consists of two major delays for a random packet: delay without congestion and delay caused by congestion. The delay without congestion (${\textrm{T}_{\textrm{DWC}}}$), which is the period that all packets before this packet occupy this output port, can be expressed as follows:

(2)$${T_{\textrm{DWC}}} = {t_0} + {t_1} + {t_2}\ldots + {t_{{N_\textrm{P}}}} = \sum\limits_{j = 0}^{{N_\textrm{P}}} {{t_j}} $$

In Eq. (2), ${\textrm{N}_\textrm{P}}$ represents the number of packets waiting in the corresponding output buffer queue. $\textrm{j}$ represents the serial number of packets waiting in the output buffer queue. ${\textrm{D}_\textrm{j}}$ represents the time that the output port is occupied by the $\textrm{j}$-th packet without congestion. This value is determined by the value of the roll around period, hop between the source node and destination node, modulator rate and detector rate. In other words, it is the time cost by the contention with other packets in the local node. This packet needs to wait for a packet serviced by the output port to transfer through the router and then occupy the output port.

Since there are always contentions happened in other routers, this packet still needs to wait longer than the expected ${\textrm{T}_{\textrm{DWC}}}$. The time that exceeds the ${\textrm{T}_{\textrm{DWC}}}$, which we define as the delay caused by congestion (${\textrm{T}_{\textrm{DBC}}}$), is too complicated to be computed because of the various kinds of contention situation. In this paper, we change a way of obtaining this by predicting the ${\textrm{T}_{\textrm{DBC}}}$ according the historical delay caused by congestion rather than computing it.

By the analysis above, the congestion-caused latency information model of output port can be established. For a packet, if all output ports are occupied, this model will perform for every candidate output port. This congestion-caused latency information model can obtain path congestion knowledge by taking both ${\textrm{T}_{\textrm{DWC}}}$ and ${\textrm{T}_{\textrm{DBC}}}$ into account. Namely, it considers the possible waiting time caused by packets that are served before the incoming packet and the possible waiting time caused by the network contention happened in other nodes. Therefore, this congestion-caused latency information model has a high potential to explore the congestion properties and improve selection quality. The congestion-caused latency information model is given as following:

(3)$${T_{\textrm{E - router}}} = {T^P} = T_{\textrm{DWC}}^P + T_{\textrm{DBC}}^P$$

In Eq. (3), $\textrm{P\; }$ is the serial number of the output port. ${\textrm{T}^\textrm{P}}$ is the possible latency that the incoming packet is going to wait to be sent by the $\textrm{P}$-th output port. $\textrm{T}_{\textrm{DBC}}^\textrm{P}$ is the processing latency of all the waiting packets waiting in the $\textrm{P}$-th output buffer when the incoming packet arrives. $\textrm{T}_{\textrm{DWC}}^\textrm{P}$ is the congestion latency in other routers, which can be given by

(4)$$T_{\textrm{DWC}}^P = t_0^P + t_1^P + t_2^P + \ldots + t_{N_\textrm{c}^p}^P = \sum\limits_{j = 0}^{N_\textrm{c}^p} {t_j^P} $$

In Eq. (4), $\textrm{N}_\textrm{c}^\textrm{P}$ is the number of packets waiting in the $\textrm{P}$-th output buffer responding to the $\textrm{P}$-th output port. $\textrm{j}$ is the serial number of the packet waiting in the $\textrm{P}$-th output queue. $\textrm{t}_\textrm{j}^\textrm{P}$ is the holding time that the $\textrm{P}$-th output port is occupied by the $\textrm{j}$-th packet of $\textrm{N}_\textrm{c}^\textrm{P}$ packets without congestion. The delay caused by congestion is modeled as following:

(5)$$T_{\textrm{DBC}}^P\textrm{ = }{T_\textrm{R}} - t_0^P - t_1^P - \ldots - t_{N_\textrm{f}^P}^P\textrm{ = }{T_\textrm{R}} - \sum\limits_{j = 0}^{N_\textrm{f}^P} {t_j^P} $$

In Eq. (5), represents the number of packets waiting in the $\textrm{P}$-th output buffer queue when the packet occupies the $\textrm{P}$-th output port arriving in the local node. $\; \textrm{j}$ represents the same meaning with the Eq. (2) and (4). $\; \textrm{t}_\textrm{j}^\textrm{P}$ represents the time that the $\textrm{P}$-th output port is occupied by the $\textrm{j}$-th packet in the $\textrm{N}_\textrm{f}^\textrm{P}$ packets without congestion. ${\; }{\textrm{T}_\textrm{R}}$ represents the real waiting time that the packet occupies the $\textrm{P}$-th output port waiting to be serviced in the $\textrm{P}$-th output buffer queue when the incoming packet arrives at the local node. Therefore, this model can estimate the level of congestion by collecting the historical latency of output port. Based on such congestion information of different ports, the output port of the incoming packet is selected for better latency performance.

3.3 Effectiveness of the predicted port latency

In this section, we demonstrate the accuracy of the port congestion latency prediction model by performing some network simulation and analysis about the port congestion.

In order to identify the effectiveness of these historical delay caused by congestion, we evaluate an 8×8 mesh ONoC with OPNET, a general network simulator, and collect output port delay caused by congestion and analyze it, as shown in Fig. 2 and Fig. 3.

Fig. 2. Output port delay caused by congestion of port1 and port2 of R(4,4).

Download Full Size | PDF

Fig. 3. Output port delay caused by congestion of port0 and port3 of R(4,4).

Download Full Size | PDF

These two figures indicate the output port delay of the packets waiting in the corresponding port because of congestion. In Fig. 2 and Fig. 3, the sequence number of the packet is marked as the x-axis, while the output congestion latency of the responded packet is marked as the y-axis. In Fig. 2, port congestion latency information of two ports is collected.

R(4,4) represents the router with coordinates (4,4). The line marked as 4.4.1 in the figure means the port congestion latency of port1 of the router R(4,4). The line marked as 4.4.2 in the figure means the port congestion latency of port2 of the router R(4,4). In this simulation experiment, port1 and port2 are two adjacent ports of R(4,4). From Fig. 2, we can note that when the congestion is low, the output congestion latency of port is low. The trend of this latency is also gentle (solid line). When the congestion increases, the output congestion latency of the corresponding port is further increasing rapidly. Although the trend of its latency is intense (dotted line), it is still higher than the solid line. Hence, if these two ports are candidate ports of a transmitting packet, selecting the port with lower output congestion latency will save more time to wait in the router for processing.

In Fig. 3, port congestion latency of port0 and port3 are collected for R(4,4). The line marked as 4.4.3 in the figure means the port congestion latency of port3 of the router R(4,4). Similarly, port0 and port3 are two adjacent ports of R(4,4). From Fig. 3, we can draw a conclusion that the condition of high congestion will always last for a while, and low congestion will change slowly. It can be observed that it is valid to select path according to the port congestion latency in general.

4. Congestion-aware loophole-routing algorithm

The congestion-aware routing algorithm loophole-routing consists of a deadlock-free routing function and congestion-aware selection function. The deadlock-freedom of loophole-routing is achieved by employing the retransmission mechanism in [16]. Since the loophole-routing is the shortest path routing, live-lock can be prevented.

Figure 4 shows the entire congestion-aware selection function, the proposed loophole-routing. The input is the addresses of the destination node and the local node. The output is the computed output port for the incoming packet. For each incoming packet, the algorithm initials the available output port set $\varphi $ by the rule of the minimal routing (line 2-6) according to the position relationship between the destination node and the local node. Loophole-routing judges the number of the candidate ports in set $\varphi $. If there is just one candidate port in set $\varphi $, this candidate port will be selected as the output port for the packet (line 7-9). If there are two candidate ports in set $\varphi $, the loophole-routing will judge the idle state of these two candidate ports in set $\varphi $. If these two candidate ports are both free, the loophole-routing will select the port that corresponding to the direction of the upstream node, which is the output direction in the upstream node (line 11-13). For example, if this incoming packet is from east port, the output direction of this packet in the upstream node is west, and then the west output port will be selected as the output port for the packet. This method will not change the direction of the packet, which can reduce the number of turns in the path. Because in most optical routers, the insertion loss is primarily caused by the change of direction. We reduce the loss by reducing the number of turns in the path here. If only one candidate port is free, the loophole-routing will select the free port as the output port (line 14-16). If two candidate ports are occupied, the loophole-routing will compute the probable latency of two candidate ports by Eq. (1), (2), (3) and (4), which are marked as ${\textrm{T}^1}$ and ${\textrm{T}^2}$ (line 17-19). In order to make a trade-off between latency and loss, congestion dispersion rate k of two candidate ports is computed by:

(6)$$k = \frac{{|{{T^1} - {T^2}} |}}{{\max \{{{T^1},{T^2}} \}}} \times 100{\%}$$

$\textrm{K}$ is an adjustable parameter which indicates the trade-off between latency and loss. The smaller $\textrm{K}$, the lower the selected port delay. It can be modified to obtain the expectant trade-off between latency and loss. If the loss or the power consumption is the key performance, we can set $\textrm{K}$ to 1 to get lower loss performance. If the network latency is the key performance, we can set $\textrm{K}$ to 0 to lower the network latency. If k is greater than $\textrm{K}$, the port with lower probable latency is selected as the output port. Otherwise, the algorithm will select the port that corresponds to the direction of the upstream node to reduce the communication loss.

Fig. 4. The pseudo-code of congestion-aware selection function.

Download Full Size | PDF

5. Characteristic of loophole-routing

5.1 Path diversity and adaptive degree of routing algorithm

Path diversity is the number of path candidates from a source node to a destination node. If topology A can provide more shortest paths than topology B between a given source and destination pair, it means A has greater path diversity than B. Path diversity of a topology gives the routing algorithm more design space to balance the traffic load, or bypass faults in the network. Apparently, the path diversity is influenced by three causes: the routing algorithm of network, the topology of the network, and the distance between the given source and destination pair. When the topology is determined, the path diversity can be defined as the following function:

(7)$${R_{s - d}} = f(R,D(source\_node,destiation\_node))$$

where ${\textrm{R}_{\textrm{s} - \textrm{d}}}$ represents the path diversity of routing algorithm $\textrm{R}$. $\textrm{D}$ is the minimum distance between the given source node and destination node pair. From the aspect of routing algorithms, if the restriction on the turn is decreasing, the path diversity will increase. The farther the source-destination pair is, the bigger ${\textrm{R}_{\textrm{s} - \textrm{d}}}$ is.

For the XY routing algorithm, it is a deterministic routing. It only has one path to achieve the transmission. For the turn models in [17–19], their path diversities are higher than XY routing. However, they still set some restriction in the position of turns, which just retain parts of path diversity. The loophole-routing we proposed doesn’t have any restriction on the selections of paths, while only the congestion situation of the network is considered. So, it provides better path diversity than other routing, and increases the effectiveness of path setup in the network. Because of minor restriction in path selecting, it also has the great adaptive degree to find a free path.

5.2 Computing complexity and implementation overhead

Adaptive routing algorithm can improve performance of the network, but it usually comes at the cost of time overhead. Therefore, it is important to evaluate the effectiveness of each routing algorithm through time efficiency. Time efficiency can be represented by the computing complexity of the algorithm.

In this case, we will use the loop number of algorithms to quantify the computing complexity. For XY-routing, its computing complexity is $\textrm{O}({{\textrm{n}^2}} )$, because it has two loops in the algorithm. The computing complexity of both loophole-routing and the OE-turn is $\textrm{O}({{\textrm{n}^3}} )$. Thus, the time overheads of these two algorithms are the same, and they are all higher than XY-routing.

When it comes to the implementation overhead of the proposed routing algorithm, we have the conclusion below. On account of the loophole-routing is a distributed routing algorithm. It does not need a central controller to collect the real-time information of the network to make any decision. It is performed by adding some bit in packets to accommodate the predicted latency information. So, the hardware overhead is also acceptable.

5.3 Fault-tolerant capability of the routing algorithm

There are two key factors contributing to performance and reliability concerns in ONoCs, the fabrication-induced variation in the process and the temperature fluctuation in running. The main optical device in ONoCs, microring resonators (MRs), is very sensitive to thermal variation and process variation. These two reasons may change the resonance wavelength. Therefore, when there are some variations happened in the fabrication process or temperature of the network, the corresponding MR would not transmit messages correctly, which leads to low bandwidth or even complete breaking down of the whole ONoC system. As a result, the capability of tolerating different kinds of faults is necessary for a reliable ONoC system.

The loophole-routing presented in this paper also has the potential to make fault-tolerant path selection when the network has component failures. When there are fault nodes or links in the network, it can be detected by some technology. Assuming the fault information of network is obtained, the routing can eliminate the routing path with faulty node and select the reliable path without any faulty node or link. In other words, the loophole-routing is also a fault-tolerant routing algorithm.

6. Simulation performance evaluation

6.1 Simulation environment

We evaluate the loophole-routing mechanism with OPNET, a network simulator. In this section, different packet injection rates, different sizes of mesh topology, six kinds of traffic patterns and different routing algorithms for OCS-based mesh network are constructed for performance analysis of our simulation. The End-to-End latency (ETE latency), Maximum saturated throughput, and power consumption are the metrics that we used to evaluate our routing algorithm and traditional methods. The loophole-routing scheme is compared with the XY routing, and the OE-TURN routing algorithm under the uniform, three kinds of transpose, and two kinds of hotspot traffic patterns. The packet size is configured on to be 1024 bits. The acknowledgement packet size and the path-setup packet size are assumed to be 32 bits. The roll around period is set to be 3 ns. The clock frequency is 1 Ghz. The modulation speed is 12.5 Gbs. We set K to 0 to achieve the ultra-low latency.

The detailed explanations of different traffic patterns are as following. In the uniform traffic pattern, a communication node sends a message to any other node in the network with equal probability. We have simulated three kinds of transpose patterns. In the first transpose traffic pattern, node identified by coordinates (i, j). Only sends messages to node identified by (14-i, 14-j). With the second transpose pattern, node identified by (i, j) only sends messages to node identified by (14-j, 14-i) which is called matrix-transpose patter. In the third transpose traffic pattern, node (i, j) only sends messages to node (j, i). Hotspot traffic means that there are some nodes with frequent communication to other nodes in the network. We also have simulated two types of hotspot traffic patterns. The first kind of hotspot is designed to have only one hotspot node in the central area of the network. The second kind of hotspot is designed to have several hotspot nodes in the central area of the network. In addition to the hotspot node, other nodes of the network communicate in the uniform traffic pattern.

6.2 Analysis on end-to-end latency performance

In this section, we analyzed three routing algorithms under uniform, three kinds of transpose and two kinds of hotspot traffic patterns under 4 × 4 size, 6 × 6 size, and 8 × 8 size mesh network.

Figure 5 shows the End-to-End latency (ETE latency) for different offered load of loophole-routing, XY routing and OE-TURN routing under six traffic patterns: uniform, transpose1, hotspot1, hotspot2, transpose2, and transpose3 traffic patterns for 4 × 4 mesh network. From Fig. 5, we can see that, for all six traffic patterns, all these three routing algorithms have low latency under low offered load. However, with the increasing of the offered load, loophole-routing still can achieve low latency under the uniform, hotspot1, hotspot2, transpose2, and transpose3 traffic patterns, while XY routing and OE routing reached the saturated latency offered load point already. In uniform traffic patter, loophole-routing algorithm has improved the saturation latency by 15.56% compared to XY routing and 36.4% compared to OE routing. For hotspot1 traffic patter, loophole-routing has improved the saturation latency by 25.71% compared to XY routing and 37.5% compared to OE routing. For hotspot2 traffic patter, loophole-routing has improved the saturation latency by 18.92% compared to XY routing and 46.67% compared to OE routing. The proposed algorithm also has improved the saturation latency by, 66.67% compared to XY routing and 13.64% compared to OE routing under transpose2 traffic pattern. For transpose3 traffic patter, loophole-routing has improved the saturation latency by 42.86% compared to XY routing and 60% compared to OE routing. The improved performance of the loophole-routing is mainly brought by higher adaptiveness and faster path-setup. But under the transpose1 traffic patterns, the loophole-routing has higher latency than XY routing. That is explained by the fact that, in this traffic pattern, the possibility of channel dependence cycle may be higher, which will induce more deadlock and more additional latency of packet retransmissions.

Fig. 5. (a) The ETE latency vs. offered load under uniform, (b) transpose1, (c) hotpot1, (d) hotpot2, (e) transpose2, (f) transpose3 traffic pattern of 4 × 4 mesh.

Download Full Size | PDF

Figure 6 shows the ETE latency of three routing algorithms under different traffic patterns of 6 × 6 mesh network. Similar to the 4 × 4 mesh network, the results of 6 × 6 mesh network indicate that for uniform, hotspot1, hotspot2, transpose2, and transpose3 traffic patterns, the proposed loophole-routing outperforms XY algorithm and OE routing at high traffic load. At low traffic load, all three algorithms perform about the same. Loophole-Routing algorithm has improved the saturation latency by 21.74% compared to XY routing and 53.57% compared to OE routing under uniform traffic patter, 31.82% compared to XY routing and 45% compared to OE routing under hotspot1 traffic patter, 59.1% compared to XY routing and 75% compared to OE routing under hotspot2 traffic patter, 76.5% compared to XY routing and 25% compared to OE routing under transpose2 traffic patter, 66.7% compared to XY routing and 45% compared to, OE routing under transpose3 traffic patter, respectively.

Fig. 6. (a) The ETE latency vs. offered load under uniform, (b) transpose1, (c) hotpot1, (d) hotpot2, (e) transpose2, (f) transpose3 traffic pattern of 6 × 6 mesh.

Download Full Size | PDF

Figure 7 shows the ETE latency performance of three routing algorithms under different traffic patterns of 8 × 8 mesh network. Similar to the 4 × 4 and 6 × 6 mesh networks, the simulation results of 8 × 8 mesh network indicate the similar latency performance compared with XY algorithm and OE-TURN routing under six traffic patterns. The proposed algorithm is still outperforming the other two routing algorithms under five traffic patterns. Thus, it can be concluded, the proposed loophole-routing algorithm is also scalable to adapt to larger scale ONoCs.

Fig. 7. (a) The ETE latency vs. offered load under uniform, (b) transpose1, (c) hotpot1, (d) hotpot2, (e) transpose2, (f) transpose3 traffic pattern of 8 × 8 mesh.

Download Full Size | PDF

6.3 Analysis on throughput performance

We analyzed the saturated throughput performance of three routing algorithms under uniform, three kinds of transpose and two kinds of hotspot traffic patterns with under different size of mesh network: 4 × 4 size, 6 × 6 size, and 8 × 8 size mesh network in this section.

Figure 8 shows the saturated throughput of the loophole-routing and XY under different traffic patterns. Consistent with the results of the ETE-latency, the throughput performances of different algorithms show the same result. Proposed loophole-routing can achieve higher saturated throughput than XY routing and OE-turn routing under the uniform, hotspot1, hotspot2, transpose2 and transpose3 traffic patterns, but has a lower saturated throughput under transpose traffic pattern. The improvements of the loophole-routing to the XY are 31.43%, 34.33%, 35.29%, 67.86% and 99.5% in saturation throughput, to the OE-turn routing are 53.33%, 57.14%, 28%, 2.17%, and 17.95% under the uniform, hotspot1, hotspot2, transpose2 and transpose3 traffic patterns, respectively.

Fig. 8. The saturation throughput of three routing algorithms under different traffic patterns of 4 × 4 mesh.

Download Full Size | PDF

Figure 9 and Fig. 10 shows the throughput results of three routing algorithms under different traffic patterns of 6 × 6 and 8 × 8 mesh networks. Similar to the 4 × 4 mesh network, the results of these two sizes of mesh networks show that for uniform, hotspot1, hotspot2, transpose2, and transpose3 traffic patterns, the proposed loophole-routing outperforms XY algorithm and OE-TURN routing. With the growth of the network scale, the saturated throughput points of these three algorithms under different traffic patterns are reduced, and the proposed can maintain higher saturated throughput at a larger scale compared with XY and OE-turn.

Fig. 9. The saturation throughput of three routing algorithms under different traffic patterns of 6 × 6 mesh.

Download Full Size | PDF

Fig. 10. The saturation throughput of three routing algorithms under different traffic patterns of 8 × 8 mesh.

Download Full Size | PDF

6.4 Analysis on power consumption performance

Energy consumption is another important issue in ONoC. We calculate the optical laser power when loophole-routing is employed in a Cygnus-based ONoC [35]. Considering the worst case in the network, the optical insertion loss $IL$ can be formulated as follows:

(8)$$IL = I{L_{bend}} + I{L_{cross}} + I{L_{through}} + I{L_{drop}}$$

In Eq. (8), $I{L_{bend}}$, $I{L_{cross}}$, $I{L_{through}}$ and $I{L_{drop}}$ represent the waveguide bending loss, waveguide crossing insertion loss, microring through the insertion loss and the microring drop insertion loss during the communication, respectively. Parameters with the corresponding implication and value are shown in Table 1.

Table 1. Parameters for Optical Power

View Table

We set the minimum power required by the receiver to -22.3dBm to calculate the laser power. The optical laser power can be calculated as following:

(9)$${P_{laser}} = \frac{{{{10}^{( - 22.3 + IL)/10}}}}{{{P_{LE}} \times {P_{CC}}}}$$

Fig. 11. (a) The insertion loss, (b) The minimum required laser power performance of three routing algorithms.

Download Full Size | PDF

In Eq. (9), ${\textrm{P}_{\textrm{LE}}}$ represents the laser efficiency, which is set to be 30%. ${\textrm{P}_{\textrm{CC}}}$ represents the coupling coefficient of the laser, which is set to be 90%. Figure 11 shows the insertion loss and the minimum laser power for three routing algorithms. The loophole-routing has the higher insertion loss and higher minimum required laser power compared with the XY routing and the OE turn model routing. This is because the fact that the routing paths of the loophole-routing may have more turns during the communication, which is for the purpose of avoiding the network congestion. So, we need to achieve a trade-off between loss cost and performance. If the system needs lower the loss and power consumption for its specific purpose, then we can improve loss and power consumption performance by choosing a bigger $\textrm{K}$. If the system needs improve the performance on latency and throughput, we can choose a smaller $\textrm{K}$ to achieve the specific performance demand.

7. Conclusion

In this paper, we propose a routing algorithm called loophole-routing for OCS-based ONoC to make faster path selection and set-up. The loophole-routing considers the influence of OCS, which jointly computes the delay without congestion and delay brought about by congestion. By using the historical delay information, this algorithm can predict the delay caused by congestion. Simulation results show that loophole-routing achieves 15.56%, 25.71%, 18.92%, 66.67% and 42.86% lower ETE latency compared with the tradition routing algorithm under uniform, hotspot1, hotspot2, transpose2 and transpose3 traffic patterns while improves the saturation throughput by 31.43%, 34.33%, 35.29%, 67.86% and 99.5%, respectively. Beyond that, our proposed loophole-routing can provide high path diversity and adaptive degree, low computing complexity and overhead while making path selection. In the near future, we intend to work on a more comprehensive performance analyses of the congestion latency along the routing path. Although the fault-tolerant did not considered in this proposal, the effect of the faults of ONoCs caused by fabrication-induced variation in the process and the temperature fluctuation in running will also be considered as interesting aspects to investigate.

Funding

National Natural Science Foundation of China (61634004, 61934002); Natural Science Foundation of Shaanxi Province for Distinguished Young Scholars (2020JC-26); Youth Innovation Team of Shaanxi Universities.

Disclosures

The authors declare no conflicts of interest.

References

1. M. R. Yahya, N. Wu, Z. A. Ali, and Y. Khizar, “Optical Versus Electrical: Performance Evaluation of Network On-Chip Topologies for UWASN Manycore Processors,” Wireless Pers. Commun. 116(2), 963–991 (2021). [CrossRef]

2. L. Zhou, Y. Lu, Y. Fu, H. Ma, and C. Du, “Design of a hybrid on-chip waveguide with giant backward stimulated Brillouin scattering,” Opt. Express 27(18), 24953–24971 (2019). [CrossRef]

3. Z. Zhu, W. Lu, L. Zhang, and N. Ansari, “Dynamic Service Provisioning in Elastic Optical Networks with Hybrid Single-/Multi-Path Routing,” J. Lightwave Technol. 31(1), 15–22 (2013). [CrossRef]

4. A. Shacham, K. Bergman, and L. P. Carloni, “Photonic Networks-on-Chip for Future Generations of Chip Multiprocessors,” IEEE Trans. Comput. 57(9), 1246–1260 (2008). [CrossRef]

5. L. Gong and Z. Zhu, “Virtual Optical Network Embedding (VONE) over Elastic Optical Networks,” J. Lightwave Technol. 32(3), 450–460 (2014). [CrossRef]

6. C. P. Chen, J. B. Driscoll, R. R. Grote, B. Souhan, R. M. Osgood, and K. Bergman, “Mode and Polarization Multiplexing in a Si Photonic Chip at 40Gb/s Aggregate Data Bandwidth,” IEEE Photonics Technol. Lett. 27(1), 22–25 (2015). [CrossRef]

7. Z. Chen, H. Gu, Y. Yang, and D. Fan, “A hierarchical optical network-on-chip using central-controlled subnet and wavelength assignment,” J. Lightwave Technol 32(5), 930–938 (2014). [CrossRef]

8. K. Chen, H. Gu, Y. Yang, and D. Fan, “A novel two-layer passive optical interconnection network for on-chip communication,” J. Lightwave Technol 32(9), 1770–1776 (2014). [CrossRef]

9. W. Tan, H. Gu, Y. Yang, K. Wang, and X. Wang, “Venus: A low- latency, low-loss 3D hybrid network-on-chip for kilocore systems,” J. Lightwave Technol. 35(24), 5448–5455 (2017). [CrossRef]

10. R. Yao and Y. Ye, “Toward a High-Performance and Low-Loss Clos–Benes-Based Optical Network-on-Chip Architecture,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 39(12), 4695–4706 (2020). [CrossRef]

11. W. Zhang and Y. Ye, “A Table-Free Approximate Q-Learning-Based Thermal-Aware Adaptive Routing for Optical NoCs,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 40(1), 199–203 (2021). [CrossRef]

12. H. Gu, J. Xu, and Z. Wang, “A novel optical mesh network-on-chip for gigascale systems-on-chip,” IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Macao, China, pp. 1728–1731.

13. S. Koohi and S. Hessabi, “All-optical wavelength-routed architecture for a power-efficient network on chip,” IEEE Trans. Comput. 63(3), 777–792 (2014). [CrossRef]

14. J. H. Lee, Y. S. Kim, C. L. Li, and T. H. Han, “A shortest path adaptive routing technique for minimizing path collisions in hybrid optical network-on-chip,” J. Systems Architecture 59(10), 1334–1347 (2013). [CrossRef]

15. Y. Yin, H. Zhang, M. Zhang, M. Xia, Z. Zhu, S. Dahfort, and S. J. B. Yoo, “Spectral and Spatial 2D Fragmentation-Aware Routing and Spectrum Assignment Algorithms in Elastic Optical Networks,” J. Opt. Commun. Netw. 5(10), A100–A106 (2013). [CrossRef]

16. W. Tan, H. Gu, Y. Yang, M. Fadhel, and B. Zhang, “Network condition-aware communication mechanism for circuit-switched optical networks-on-chips,” J. Opt. Commun. Netw. 8(10), 813–821 (2016). [CrossRef]

17. M. Kumar, V. Laxmi, M. S. Gaur, M. Daneshtalab, and M. Zwolinski, “A novel non-minimal turn model for highly adaptive routing in 2D NoCs,” International Conference on Very Large Scale Integration (VLSI-SoC), Playa del Carmen, Mexico, pp. 1–6, (2014).

18. P. Bahrebar and D. Stroobandt, “Improving hamiltonian-based routing methods for on-chip networks: a turn model approach,” Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, pp. 1–4, (2014).

19. L. Gong, X. Zhou, X. Liu, W. Zhao, W. Lu, and Z. Zhu, “Efficient Resource Allocation for All-Optical Multicasting over Spectrum-Sliced Elastic Optical Networks,” J. Opt. Commun. Netw. 5(8), 836–847 (2013). [CrossRef]

20. G. M. Chiu, “The odd–even turn model for adaptive routing,” IEEE Trans. Parallel Distrib. Syst. 11(7), 729–738 (2000). [CrossRef]

21. W. Zhang, L. Hou, J. Wang, S. Geng, and W. Wu, “Comparison Research between XY and Odd-Even Routing Algorithm of a 2-Dimension 3X3 Mesh Topology Network-on-Chip,” WRI Global Congress on Intelligent Systems, Xiamen, China, pp. 329–333, (2009).

22. M. Li, Q. A. Zeng, and W. B. Jone, “DyXY - a proximity congestion-aware deadlock-free dynamic routing method for network on chip,” Design Automation Conference, San Francisco, CA, USA, pp. 849–852, (2006).

23. A. Boos, L. Ramini, U. Schlichtmann, and D. Bertozzi, “Proton: An automatic place-and-route tool for optical networks-on-chip,” IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 2013, pp. 138–145, (2013).

24. A. V. Beuningen, L. Ramini, D. Bertozzi, and U. Schlichtmann, “PROTON+: A Placement and Routing Tool for 3D Optical Networks-on-Chip with a Single Optical Layer,” J. Emerg. Technol. Comput. Syst. 12(4), 1–28 (2016). [CrossRef]

25. A. V. Beuningen and U. Schlichtmann, “PLATON: A Force-Directed Placement Algorithm for 3D Optical Networks-on-Chip,” Proceedings on International Symposium on Physical Design (ISPD), New York, USA, pp. 27–34, (2016).

26. Y. Chuang, K. Chen, K. Lin, S. Fang, B. Li, and U. Schlichtmann, “PlanarONoC: Concurrent Placement and Routing Considering Crossing Minimization for Optical Networks-on-Chip*,” ACM/ESDA/IEEE Design Automation Conference (DAC), San Francisco, CA, pp. 1–6, (2018).

27. K. Zhu, H. Gu, Y. Yang, W. Tan, and B. Zhang, “A 3D multilayer optical network on chip based on mesh topology,” Photonic Netw. Commun. 32(2), 293–299 (2016). [CrossRef]

28. J. H. Lee, M. S. Kim, and T. H. Han, “Insertion Loss-Aware Routing Analysis and Optimization for a Fat-Tree-Based Optical Network-on-Chip,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 37(3), 559–572 (2018). [CrossRef]

29. P. Guo, W. Hou, L. Guo, W. Sun, C. Liu, H. Bao, L. Duong, and W. Liu, “Fault-Tolerant Routing Mechanism in 3D Optical Network-on-Chip Based on Node Reuse,” IEEE Trans. Parallel Distrib. Syst. 31(3), 547–564 (2020). [CrossRef]

30. P. Guo, W. Hou, L. Guo, Q. Yang, Y. Ge, and H. Liang, “Low Insertion Loss and Non-Blocking Microring-Based Optical Router for 3D Optical Network-on-Chip,” IEEE Photonics J. 10(2), 1–10 (2018). [CrossRef]

31. P. Guo, W. Hou, L. Guo, Q. Cai, Y. Zong, and D. Huang, “Reliable routing in 3D optical network-on-chip based on fault node reuse,” Proc. 7th Int. Workshop Reliable Netw. Des. Modeling, 92–98 (2015).

32. B. Asadi, M. Reshadi, and A. Khademzadeh, “A routing algorithm for reducing optical loss in photonic Networks-on-Chip,” Photon. Netw. Commun. 34(1), 52–62 (2017). [CrossRef]

33. M. Li, W. Liu, L. Yang, P. Chen, D. Liu, and N. Guan, “Routing in optical network-on-chip: minimizing contention with guaranteed thermal reliability,” Proceedings of the Asia and South Pacific Design Automation Conference (ASPDAC), New York, USA, pp. 364–369, (2019).

34. A. B. Ahmed, Y. Okuyama, and A. B. Abdallah, “Contention-Free Routing for Hybrid Photonic Mesh-Based Network-on-Chip Systems,” International Symposium on Embedded Multicore/Many-core Systems-on-Chip, Turin, pp. 235–242, (2015).

35. H. Gu, K. H. Mo, J. Xu, and W. Zhang, “A Low-power Low-cost Optical Router for Optical Networks-on-Chip in Multiprocessor Systems-on-Chip,” IEEE Computer Society Annual Symposium on VLSI, Tampa, FL, pp. 19–24, (2009).

36. P. Lu, L. Zhang, X. Liu, J. Yao, and Z. Zhu, “Highly-Efficient Data Migration and Backup for Big Data Applications in Elastic Optical Inter-Data-Center Networks,” IEEE Netw. 29(5), 36–42 (2015). [CrossRef]

Parameter	Implication	Value
ILdrop	Drop-port insertion loss	1.3 dB [35]
ILthrough	Through-port insertion loss	0.01 dB [35]
ILbend	Waveguide bending loss	0.005 dB [35]
ILcross	Waveguide crossing insertion loss	0.12 dB [35]
PLE	Laser efficiency	30% [36]
PCW	Coupling coefficient	90% [36]

Making path selection faster: a routing algorithm for ONoC

Abstract

1. Introduction

2. Related work

3. Prediction model and effectiveness analysis

3.1 Latency in optical network-on-chip

3.2 Prediction of port congestion latency

3.3 Effectiveness of the predicted port latency

4. Congestion-aware loophole-routing algorithm

5. Characteristic of loophole-routing

5.1 Path diversity and adaptive degree of routing algorithm

5.2 Computing complexity and implementation overhead

5.3 Fault-tolerant capability of the routing algorithm

6. Simulation performance evaluation

6.1 Simulation environment

6.2 Analysis on end-to-end latency performance

6.3 Analysis on throughput performance

6.4 Analysis on power consumption performance

7. Conclusion

Funding

Disclosures

References

Cited By

Figures (11)

Tables (1)

Equations (9)

Optics Express