Technical Disclosure Commons
Defensive Publications Series
January 2022

CO-PACKAGED OPTICS MAXIMUM RATE ALLOCATION
ALGORITHM
Marco Mazzini
Alberto Cervasio

Follow this and additional works at: https://www.tdcommons.org/dpubs_series

Recommended Citation
Mazzini, Marco and Cervasio, Alberto, "CO-PACKAGED OPTICS MAXIMUM RATE ALLOCATION
ALGORITHM", Technical Disclosure Commons, (January 27, 2022)
https://www.tdcommons.org/dpubs_series/4863

This work is licensed under a Creative Commons Attribution 4.0 License.
This Article is brought to you for free and open access by Technical Disclosure Commons. It has been accepted for
inclusion in Defensive Publications Series by an authorized administrator of Technical Disclosure Commons.

Mazzini and Cervasio: CO-PACKAGED OPTICS MAXIMUM RATE ALLOCATION ALGORITHM

CO-PACKAGED OPTICS MAXIMUM RATE ALLOCATION ALGORITHM
AUTHORS:
Marco Mazzini
Alberto Cervasio
ABSTRACT
With the introduction of co-packaged optics (CPO)-based hosts, systems can be
designed in a manner that is more flexible than conventional systems. It is possible to view
the CPO as an electro-optical matrix rather than a battery of pluggable modules as with
conventional hosts. Such an approach provides the opportunity to design CPO-based
switches for flexibility, protection, and resilience up to 102.4 Terabits or beyond.
Pluggable solutions, in addition to being more expensive, may not have such capabilities.
Leveraging such opportunities, this proposal provides new techniques for defining the
connectivity between CPO hosts in a manner that allows for maximum Ethernet flexibility
in terms of delay, skew, and electrical/optical Physical Medium Dependent (PMD)
sublayers that are available on a CPO circuit. Aspects of the presented techniques support
a fully-flexible CPO host where, for example, the link rate will not be limited to the
'nominal' Institute of Electrical and Electronics Engineers (IEEE) rates (e.g., 100, 200, 400,
800, etc.) but rather may assume any value (e.g., 325, 515, 865, etc.) that can also include
the aggregation of the IEEE 'nominal’ rates (e.g., 300, 500, 700, etc.).
DETAILED DESCRIPTION
As the industry focuses on different ways in which bandwidth may be grown, one
approach considers the use of embedded optics instead of optical transceivers that must
necessarily sit on the front plate of a host. A pictorial example is presented in Figure 1,
below, in which optical engines (i.e., co-packaged optics (CPO)) surround a switch
application-specific integrated circuit (ASIC).

1
Published by Technical Disclosure Commons, 2022

6709
2

Defensive Publications Series, Art. 4863 [2022]

CPO

CPO

CPO

CPO

High
Performance
Switch ASIC

CPO

CPO

CPO

CPO

CPO

CPO

CPO

CPO

Figure 1: Exemplary CPO Surrounding Switch ASIC
While the advantage of density is clear, another big one that is not currently
considered in this situation is that a system as depicted above, being fully integrated,
removes the 'standard' interface points if things are fully-owned by the same developer of
the ASIC and the CPO.
Figure 2, below, indicates the test points that still must be considered in such an
embedded system.

Host
Channel
ASIC

CPO
Channel

Transmitter
Host
Component

TP2

Receiver
TP0
TP5

TP1

Module
CPO
Component

TP4

Receiver

Transmitter
TP3

Figure 2: Standard Model Applicable to Pluggable Transceiver and CPO
Having an ASIC transmitter with certain characteristics at TP0, the host channel is
the entity connecting the transmitter with a channel up to the module (CPO) connector
(socket). The module (CPO) channel has its own insertion and return loss characteristics
until its functional circuit, where electrical-to-optical (E/O) conversion occurs, and then an
optical transmitter signal is sent at TP2. When receiving the signal at TP3, the inverse

2
https://www.tdcommons.org/dpubs_series/4863

6709
3

Mazzini and Cervasio: CO-PACKAGED OPTICS MAXIMUM RATE ALLOCATION ALGORITHM

optical-to-electrical (O/E) conversion is completed and, after propagating into the
module’s (CPO’s) own channel, the electrical signal is present at TP4.
Considering this model, while a transmitter must ensure that the electrical
impairments are controlled from TP0 to TP2 to meet the compliance standard, a receiver
must ensure that certain characteristics of a signal from a remote compliant transmitter
through a channel are properly delivered through TP3 and TP4, so as to be detected at TP5.
Such requirements limit a design, introducing a constraint on the maximum lengths
that one achieves between an ASIC and a CPO given the fact that a maximum transmission
rate must be met. However, if the embedded optics host is part of a network which is built
by the same kind of systems, then new capabilities may be enabled.
To address such challenges, this proposal provides techniques that may be utilized
to enhance Ethernet connectivity between a host sharing the same embedded optic types
that is governed by network software.
When working with standard Ethernet hosts an embedded optics host will be
configured to follow the Ethernet media access controller (MAC) and Reconciliation
sublayers and Physical Coding Sublayer (PCS) for 64B/66B encoding, 200GBASE-R and
400GBASE-R, as well the future 800GBASE-R. The 64B/66B encoding supports the
transmission of data and control characters.

A 64B/66B code is then transcoded to a

256B/257B encoding to reduce the overhead and to make room for forward error correction
(FEC). The 256B/257B-encoded data is then FEC encoded before being transmitted.
Figure 3, below, illustrates elements of the relationship between various of the
200GBASE-R and 400GBASE-R sublayers.

3
Published by Technical Disclosure Commons, 2022

6709
4

Defensive Publications Series, Art. 4863 [2022]

Ethernet Layers

Higher Layers

(OSI Network Layer)

LLC/MAC Client
MAC Control (Opt)

(OSI Data Link Layer)

MAC
Reconciliation
200GMII
200GBASE‐R PCS
PMA

PMA

PMD

PMD

MDI
Medium

(OSI Physical Layer)

400GMII
400GBASE‐R PCS

MDI
Medium

200GBASE‐R

400GBASE‐R

Figure 3: Relationship Between 200GBASE-R and 400GBASE-R Sublayers
As illustrated above, Figure 3 depicts the relationship between the 200GBASE-R
and 400GBASE-R sublayers (which are shown shaded in the figure), the Ethernet MAC
and Reconciliation sublayers, and the higher layers. As illustrated in Figure 3, The upper
PCS interface for each of the 200GBASE-R and 400GBASE-R sublayers connects to the
Reconciliation Sublayer via a corresponding Media Independent Interface (i.e., 200GMII
for the 200GBASE-R sublayer and 400GMII for the 400GBASE-R sublayer). Further, each
lower PCS interface connects to a corresponding Physical Medium Attachment (PMA)
sublayer, which further supports a Physical Medium Dependent (PMD) sublayer for each
corresponding 200GBASE-R sublayer and 400GBASE-R sublayer.
The 200GBASE-R PCS has a nominal rate of 26.5625 gigatransfers per second
(Gtransfers/s) on each of eight PCS lanes at the PMA service interface. This provides a
MAC data rate capacity of 200 gigabits per second (Gb/s). Conversely, the 400GBASE-R
PCS supports the same nominal rate of 26.5625 Gtransfers/s on each of 16 PCS lanes,
which provides a MAC data rate capacity of 400 Gb/s.
It is anticipated that a similar instantiation will be adopted for 800 Gb/s (i.e., 16
PCS lanes at 53.125 Gtransfers/s or eight PCS lanes at 106.25 Gtransfers/s) that will be

4
https://www.tdcommons.org/dpubs_series/4863

6709
5

Mazzini and Cervasio: CO-PACKAGED OPTICS MAXIMUM RATE ALLOCATION ALGORITHM

extended to 212.5 Gtransfers/s, as well for 1600 Gb/s (i.e., 16 PCS lanes at 106.25
Gtransfers/s or 8 PCS at 212.5 Gtransfers/s), to the Reconciliation Sublayer of the top
layers and the PMA Sublayer of the bottom layer.
The IEEE standard notes that while interfaces are defined interfaces in terms of bits,
octets, and frames, different data-path widths may be selected for different
implementations, based on convenience. Based on this, aspects of the techniques presented
herein support a flexible host that can define and negotiate any rate rather than the nominal
ones at the PMA service interface. While the aggregation of the PCS lanes can still be
assumed to be at a 400, 800, or 1600 Gb/s data rate, each of the single PCS can be
associated to PMAs (and thus physical lanes) of different rates which aggregate at exactly
the defined MAC rate.
Any other instance that is submitted to the same Ethernet protocol may, of course,
also be customized, having all PCS lanes at the same rate but transmitting over a different
aggregate rather than the IEEE rates (such as, for example, 600 Gigabit Ethernet (GE) or
1000 GE).
An illustrative example, encompassing the PMD sublayer and medium for type
200GBASE-FR4, is helpful for explaining how different PCS rates can be useful. For this
example, consider four lanes at different wavelengths – WL1, WL2, WL3, and WL4 – in
which wavelength is sending a signal at 26.5625 gigabaud per second (Gbaud/s), Pulse
Amplitude Modulation 4-level (PAM4) modulated, thus carrying 53.125 Gb/s for a total
rate of 212.5 Gb/s.
As an example of such a PMD, it is possible to have four PMA lanes at 26.5625
Gbaud/s and eight PCS lanes at 26.5625 Gtransfers/s, thus basically having two PCS
assigned to one physical lane of the PMA lane.
Under aspects of the techniques presented herein, these PCS can instead be
considered flexible, aggregating the overall transmitted signal at the same rate of 212.5
Gb/s but with PCS at different rates depending upon various factors. Several of those
factors will be described below.
A first example is the mitigation of optical and electrical impairments. Consider a
PMD on which WL4 exhibits a propagation penalty at a longer distance than a standard
one (e.g., at 10 kilometers (km)) which cause the link to be broken while other lanes,

5
Published by Technical Disclosure Commons, 2022

6709
6

Defensive Publications Series, Art. 4863 [2022]

incurring lower dispersion, are intrinsically more robust. Also, some intrinsic delay
between the lanes can be expected because of dispersion, which can depend upon the media
type (e.g., fiber type Lambda0) as well.
Because a dispersion penalty scales with the power of two from the bit rate, having
a flexible solution (according to aspects of the techniques presented herein) will optimize
such a PMD, opening the possibility of reducing the transmission rate of WL4 and
balancing the total transmitted rate over the other wavelengths.
One solution would be, for example, to modulate WL1 at 54.125 Gb/s, WL2 at
53.125 Gb/s, WL3 at 53.125 Gb/s, and WL4 at 52.125 Gb/s so to reduce the propagation
penalty over this particular wavelength. The total rate is the same (i.e., 212.5 Gb/s) and
there will still be eight PCS, but the two PCS that are associated to WL1 will be of 27.0625
Gtransfers/s each and the PCS that are associated to WL4 will be of 26.0625 Gtransfers/s.
It is important to note that the above values are exemplary only. It is also important
to note that all of the link optimizations may be done while still maintaining the upper
bound on the propagation delays through the network. This implies that MAC, MAC
Control sublayer, and PHY implementers will still conform to the certain delay maxima,
and that network planners and administrators will conform to constraints regarding the
cable topology and concatenation of devices and links.
As an example, Table 116-6 of the IEEE specification 802.3 contains the values for
maximum sublayer delay (i.e., sum of transmit and receive delays at one end of a link) in
bit times as specified by the IEEE and pause quanta as specified for 400 GE.
Because aspects of the techniques presented herein support an auto-negotiation
sublayer, the delay of the auto-negotiation sublayer should be included within the delay of
the PMD and the medium. Aspects of the techniques presented herein can then be viewed
as an iterative optimization of the link towards the IEEE maximum boundaries.
Another example would be the management of the electrical link, which may be
needed to fix design constraints coming from the layout of CPOs around a host ASIC.
Referring to Figure 1, the CPOs that are on each corner with respect to the ASIC
are the ones that need more care during design, since routing from the ASIC will lead to a
longer ‘ball-to-ball’ distance and complex routing. A flexible (but static) allocation of the

6
https://www.tdcommons.org/dpubs_series/4863

6709
7

Mazzini and Cervasio: CO-PACKAGED OPTICS MAXIMUM RATE ALLOCATION ALGORITHM

maximum PMA (PCS) rate may be considered then, so that it is ensured that the multi-link
(electrical ASIC to CPO, optical channel, and electrical CPO to ASIC) channel is optimized.
A flexible and dynamic allocation of such a maximum PMA (PCS) rate may include
the consideration that another portion of the CPO circuit and ASIC is made available when
this (unknown) limit is reached to ensure a demand of faster rate connectivity.
Aspects of the techniques presented herein optimize and manage a future increase
of bandwidth for a link depending upon media and circuits availability. For embedded
optics systems (i.e., CPO and ASIC) it is possible to leverage maximum flexibility instead
of dealing with a fixed layout and transceiver configuration. Among other things a local
host can negotiate the maximum transmission rate over the actual link for each of the ports
considering the electrical speed of a Serializer/Deserializer (SerDes) and their own routing
limitations.
As noted previously, aspects of the techniques presented herein support a link autonegotiation capability. Such a capability will, among other things, establish a maximum
speed first so that information is passed when a fiber or cable is connected between points
A and D, A and C, or D and C (as depicted in Figure 4, below).
CPO

CPO

High
Performance
Switch ASIC

CPO

CPO

CPO

CPO

CPO

CPO

CPO

CPO

CPO
CPO
CPO

CPO

High
Performance
Switch ASIC

CPO

CPO

CPO

CPO

CPO

CPO

CPO

CPO

High
Performance
Switch ASIC

CPO

CPO

CPO

CPO

CPO

CPO

CPO

CPO

CPO

CPO

CPO

CPO

CPO

CPO

Figure 4: Exemplary Mixed Connectivity Between CPO and Legacy Hosts

7
Published by Technical Disclosure Commons, 2022

6709
8

Defensive Publications Series, Art. 4863 [2022]

As noted above, according to aspects of the techniques presented herein the autonegotiation of the link will occur by establishing a maximum speed first so that information
is passed at link initialization. During such a phase, both CPO hosts communicate with
each other in a bidirectional way (e.g., A to D and D to A, as illustrated in Figure 4, above)
and share their 'design' conditions for which both have been defined and which depend
upon the ports that have been connected.
The principal difference between the auto-negotiation of a link, as supported by
aspects of the techniques presented herein, and the classic Ethernet auto negotiation process
is that under the techniques presented herein there is no fixed granularity in terms of rate
(e.g., 200, 400, 800) and this is done considering the actual medium and multi-link
conditions.
A CPO system is assumed to know its own electrical link parameters that are
associated to each port. The media (e.g., optical link) characteristics can be retrieved by
power on measurements during initialization and other techniques that allow for the
discovery of the chromatic dispersion impairment of the actual fiber medium.
Referring to above Figure 4, elements of system A-D should communicate between
themselves their own 'static' characteristics (e.g., maximum electrical speed per port, actual
port allocation, compensation for ageing, etc.) and negotiate the maximum link speed on
the actual link. They can also associate to another link the customer's needs if the actual
connected ports are not supportive, because the system D information transfer has by then
been acquired by system A and vice versa.
Figure 5, below, depicts as a flowchart elements of a possible algorithm for how
system parameters may be passed between CPO hosts A and D (following the arrangement
that was depicted in Figure 4, above) according to aspects of the techniques presented
herein.

8
https://www.tdcommons.org/dpubs_series/4863

6709
9

Mazzini and Cervasio: CO-PACKAGED OPTICS MAXIMUM RATE ALLOCATION ALGORITHM

Figure 5: Exemplary Parameter Exchange Algorithm
For a maximum speed that is required for some particular connectivity, a certain
part of the CPO circuit may then be defined. The parameters that are used to optimize the
overall link performance during auto-negotiation are the ones that are present locally on
links A and D. Depending upon the connection, these local parameters may comprise:


The available (i.e., not connected) ports.



The TP2 power and optical modulation amplitude (OMA) that is available per
port and wavelength (WL) at a laser default value.



A port (e.g., laser) temperature (valid for shared lasers).



The available laser extra power.



The maximum PCS rate electrical speed per lane.



The maximum physical rate of each of the PCS lanes.

Following connection, some parameters are exchanged across the nodes. Those
parameters may include:


The link loss.



The link dispersion versus WL.

9
Published by Technical Disclosure Commons, 2022

6709
10

Defensive Publications Series, Art. 4863 [2022]



The calculated TP3 RX signal-to-noise (SNR) ratio per input port (D).



The estimated transmission dispersion and eye closure quaternary (TDECQ)
from the SNR.

Referring again to Figure 4, above, there are cases in which the algorithm discussed
above may be partially applicable or where the IEEE standard protocol may be followed.
For example, assume that some ports of Node D or Node C do not allow transmit
(TX) and receive (RX) optimization but can handle a rate change. In such a case, a
maximum link rate may be negotiated by the two CPO hosts through their switch (SW)
control where this part of the algorithm would apply.
In the case of the connectivity of CPO system A to a 'legacy' system B, standard
connectivity can still be ensured by SW control having no automatic negotiation between
the two hosts. Another option that avoids an auto-negotiation sublayer, and its impact on
the total delay and then optimization, is the employment of a built-in manufacturing
database that may be applied depending upon medium type, distance, reach (PID) type,
and chip-to-module (C2M) channel for each of the port combinations of the optical circuit
and multi-link that can be enabled.
Such a database, containing the TX and RX characteristics of a CPO node, allows
for the calculation of all possible permutations of each of the parts of the multi-links (i.e.,
electrical and optical) that are stored concerning its maximum rate and can be retrieved,
managed, and updated through firmware (FW) whenever new PIDs or connectivity are
made available.
Figure 6, below, depicts as a flowchart elements of a possible algorithm for the
approach that was discussed above (i.e., involving a database) that considers the best
solution across all of the possible permutations of connectivity based upon a specific
request.

10
https://www.tdcommons.org/dpubs_series/4863

6709
11

Mazzini and Cervasio: CO-PACKAGED OPTICS MAXIMUM RATE ALLOCATION ALGORITHM

Figure 6: Exemplary Algorithm when a Manufacturing Database Available
A database as described above may be managed at a high level by a centralized SW
control that associates each of the links and optimizes their transmission towards CPO
and/or legacy hosts, with a control algorithm that defines the number of permutations that
are available in the CPO host and thus the residual connectivity and rates that are available
for ports that have not yet been provisioned on the host by the same algorithm.
The creation and the management of a database as described above may represent
a drawback to the simpler algorithm (as noted above) with respect to the low-level data
exchange case that was noted previously. Additionally, it is necessary to know the link
length under provisioning by a customer.
When dealing with a change rate, a clock distributed circuit should be referenced
to each of the engines associated to the single lanes rate. As noted previously, the relative
frequency and parts per million (PPM) variation should consider the maximum sublayer
delay, which depends upon the auto-negotiated and calculated rate corresponding to the
link, PID, and technique.
As described and illustrated in the above narrative, aspects of the techniques
presented herein support the automatic adjustment of the IEEE physical rate to the best rate

11
Published by Technical Disclosure Commons, 2022

6709
12

Defensive Publications Series, Art. 4863 [2022]

over an actual channel, which reflects the two electrical C2M channels, an optical channel,
and any relative impairments.

Further aspects of the presented techniques support

maintaining the same total aggregated rate but adjusting the PCS rate to improve a link.
To some extent, aspects of the presented techniques may be seen as an extension of
FlexEthernet (FlexE). Under aspects of the presented techniques the link rate is not limited
and may also include the aggregation of the IEEE 'nominal’ rates (e.g., 300, 500, 700, etc.),
somewhat similar to FlexE 'channelization,' but would in principle also include any subrate (e.g., 325, 515, 865, etc.) that may be different from FlexE.
Aspects of the techniques presented herein may encompass elements of a fractional
bit muxing PMA. While such an approach may introduce a level of complexity, any CPO
system is intrinsically more exposed to link impairments because there are no replaceable
parts such as pluggable modules. If one part of the circuit degrades, currently a customer
has few chances to adjust for it without a manual replacement in the field. For this, the
complexity would instead provide a great incremental value in the CPO systems which will
be intrinsically more robust and optimized for the real maximum capacity and/or maximum
life of the product.
A CPO line card may be tested against the maximum rate that is physically
achievable for any lane. In terms of optical receiver sensitivity (such as intrinsic noise,
bandwidth, etc. and transmitter parameters (such as OMA, TDECQ, etc.)) this
characterization may be accomplished at a socket level over PVT for the CPO which would
have optical circuits and drivers and a transimpedance amplifier (TIA) that will be
embedded in the same physical part. The electrical channel and the overall mating of the
CPO, in the case of being socketed to the CPO card, may be accomplished by verifying
that these same rates can be achieved over the actual line card electrical channel and
settings. For example, the RX bit error rate (BER) and TX SNR impairment between the
socket and line card testing conditions can provide the quality of the electrical channel from
the ASIC and the relative maximum rate achievable for any port.
Use of the various techniques presented herein may offer a number of advantages.
For example, the multipurpose host may maximize flexibility given that connectivity can
be adapted to the media, rate, and PMD. The connectivity is further improved by
optimizing bandwidth of a link once it is established. Additionally, the CPO can be used

12
https://www.tdcommons.org/dpubs_series/4863

6709
13

Mazzini and Cervasio: CO-PACKAGED OPTICS MAXIMUM RATE ALLOCATION ALGORITHM

as protection against impairments, aging, or damage by balancing on channel and
maintaining the same throughput. Different metrics for the CPO connectivity may be
defined to facilitate the lowest power consumption link adapting to the actual connection.
Further, high throughput capacity may be provided, potentially more than what can be
foreseen by using pluggable modules as interfaces.
The techniques presented herein further allow high speed optics to be available at
the same time as the introduction of a CPO-based line card because the ASIC development
will be the long pole of CPO development rather than the optics. The foreseen costs for
the optics are expected to be lower than for other small form factors that are suitable for
high-speed application development.
In summary, techniques have presented herein that support a new way of defining
the connectivity between CPO hosts that allows for maximum Ethernet flexibility in terms
of delay, skew, and electrical/optical PMDs that are available on a CPO circuit. Aspects
of the presented techniques support a fully-flexible CPO host where, for example, the link
rate will not be limited to the 'nominal' IEEE rates (e.g., 100, 200, 400, 800, etc.) but rather
may assume any value (e.g., 325, 515, 865, etc.) that can also include the aggregation of
the IEEE 'nominal’ rates (e.g., 300, 500, 700, etc.).

13
Published by Technical Disclosure Commons, 2022

6709
14

