If you are new to the world of design and verification, you probably have a LOT of questions! One of them may pertain to an important element – the Clock Data Recovery. In this blog, we try and de-mystify this process.
The purpose of designing various protocols is to transfer a set of information (data) from one place to another. Often, serial data communication is used to transmit the data at a high speed. At the receiver end, the transmitted data has to be retrieved without losing its integrity with the accompanied timing information. This process is called Clock and Data recovery.
In this article, we are going to elaborate on the requirements of CDR and how it works, how to tackle with issues like jitter and PPM in modeling the CDR.
BTW, if you are spending a lot of time staring at waveforms, trying to make sense, our protocol debug tool (PDA) might be interesting to you. Please have a look at a short 3 minute demo here (No earphones? No worries! It has subtitles!).
Different Techniques of Data Communication:
Before starting on CDR, we will have a look at different techniques of data communication, which are:
1. Serial Data Communication
In serial communication data bits are transmitted sequentially one by one.
2. Parallel Data Communication
In parallel communication data bits are driven on multiple wires simultaneously.
By looking at the above figures, one can easily judge that parallel communication will be much faster than serial communication.
But then the question arises, why is serial communication preferred over parallel communication???
This is because in practice, parallel communication is not faster than serial communication. This is due to the following reasons-
Travelling path length for every bit is going to be different.Due to this some bits can arrive early or before than others which may corrupt the information.
To solve this you can pad the bits. But this would be on the cost of speed as it will reduce speed of every link to the slowest of all.
b) Inter symbol interference and Cross talk
Due to several parallel links ISI and Cross talk is introduced in the system which gets more severe as length of link is increased. So this limits the length of a connection.
c) Limitation of I/O pin count
Parallel data communication requires a lot more I/O pins than what is required by serial data communication.
What is Clock Data Recovery?
Since most of the high speed serial interfaces do not have any accompanying clock, the receiver needs to recover the clock in order to sample the data on serial lines.
To recover the sampling clock, receiver needs a reference a clock of approximately same frequency. To generate the recovered clock, the receiver needs to phase align the reference clock to the transitions on the incoming data stream. This is called as Clock recovery.
Sampling of that incoming data signal with recovered clock to generate a bit stream is called as Data recovery. Together, this is called Clock Data Recovery, or CDR.
CDR is required to recover data from incoming data stream in the absence of any accompanying clock signal, without any bit errors due to over/under sampling.
How Does Clock Data Recovery Work?
The two main functions for performing CDR are- frequency detection and phase alignment.
I) Frequency Detection
It is a process of locking on a frequency that is retrieved from incoming data stream. This is done by detecting the time difference between two consecutive edges on data stream.
This locked frequency is used in regenerating the transmitted data bit stream.
To make you more familiar with frequency detection, let me give you an analogy of punctuation in a sentence. You may have observed that whenever a road undergoes repair work the construction company puts up a display message on a board to slow down the vehicles passing by. That message is something like this –
SLOW, MEN AT WORK
Now if the same message is written without proper punctuation then it may imply something totally different! –
SLOW MEN AT WORK
Punctuations are like detected frequency. If locked on a wrong frequency that will lead to incorrect data sampling!
So the question here is how to sample an incoming data bit stream correctly?
One solution that comes to mind instantly is to sample the bit stream on the same frequency at which it was transmitted.
To do that one has to generate the clock on receiver having the same frequency on which data was transmitted. But it is not possible to generate two clocks having the exact same frequency by using two different clock generators even if they have the same specifications.
Also it is not possible to generate a clock with a precise frequency.
At the same time, a minute difference in sampling frequency can lead to a bit error as described in the following diagram:
As shown in the above figure above, a single bit is getting sampled twice due to a minute difference in TX and RX frequency.
How else a clock with the same frequency of ‘TX clock’ be generated?
This can be done by checking the edges on incoming data bit stream.However in this process, the initial bits that get used to detect the frequency get lost. To solve this, a particular set of bit sequences are transmitted before transmitting the valid data. These sequences are called as training sequences. Training sequences posses very high edge density, so that receiver can easily lock on a frequency by checking the consecutive edges on the wire before start of valid data. Below figure shows a sequence with high edge density.
Frequency from incoming data bit stream has been recovered. RX clock can now be generated based on recovered frequency.
The above recovered frequency is fine for an ideal case when there is no any noise introduced in transmission i.e. clock frequency for TX Clock is same throughout. Also data is an integral multiple of TX Clock period. However that’s not true practically, as there are a number of attributes which affectthe data transmission and distort the uniformity of clock.
Below figure depicts a real time clock which has variations in its period.
There are mainly two attributes which affects most high speed serial data communications-
Jitter is a shift in the edges of a periodic signal. This breaks the periodicity of the signal.
Jitter is a short term effect. It follows Gaussian distribution that’s why the average mean of jitter is zero i.e. the cumulative effect of jitter is null.
Since there is a shift in the edges of clock signal due to jitter, the question is what is the optimum position to sample a bit?
A bit should be sampled at the centre. It is the optimum position where maximum shift in the edges on either side (from left to right or right to left) can be encountered. However if the shift in an edge becomes greater than half of the bit period then there will be a bit error.
B) PPM (parts per million):
PPM is an inaccuracy of certain components (quartz crystal in case of clock generator) in a circuit which leads to generation of a signal with inaccurate period. PPM does not break the periodicity of a signal. As its name states, PPM is a long term effect which denotes the inaccuracy in the bit period over a million clock cycles. PPM is additive or subtractive in nature.
Onlyif cumulative effect of jitter or PPM in TX CLK becomes more than half of RX CLK then there would be errors due to over/under sampling.
An example below shows how ongoing variations in incoming data stream can affect the sampling of data. This same example will be considered to resolve the issues as we progress further.
RX CLK(FD) is frequency locked during frequency detection. As the incoming data stream is being sampled on FD, as depicted in the red box, a single bit is getting sampled twice. This occurs because of the variations in the incoming data bit stream.
To encounter these variations in frequency of TX CLK, the second function of CDR, Phase alignment comes in picture. This readjusts RX CLK edges.
II) Phase Alignment
Phase alignment is a process of matching the phase of a signal with another signal. Here it is matching the phase of clock recovered in frequency detection with the incoming data bit stream.
Let me give you an analogy for phase alignment.
You may have seen an analog radio. There are two knobs namely coarse tune and fine tune on analog radio. When one wants to listen to any audible signals, coarse tune knob is used to lock onto a frequency where signals are audible but with some disturbances. Here coarse tune is as good as frequency detection and disturbances are jitter and PPM. To remove these disturbances and make the voice audible, fine tune knob is used which adjusts the pre locked frequency a little bit here and there to get a perfect audible signals. Here fine tuning is akin to phase alignment.
The following rules need to be followed for Phase Alignment:
- If a transition is detected on wire then make level of RX CLK(FD+PD) = 1.
- If RX CLK(FD) period is completed after a posedge on RX CLK(FD+PD) and no any transition is detected on wire then assert posedge of RX CLK(FD+PD).
Here RX CLK(FD) is clock frequency locked during frequency detection process and RX CLK(FD+PD) is clock frequency during phase alignment process.
It is time to look in to the working of phase alignment. Let us take the same example considered earlier for PPM jitter.
Here clock period RX CLK(FD) = 10
Clocks which have not been assigned with a period in the figure have by default period of 10.
In the above figure we have seen previously that last bit was getting sampled twice as a result of continuous constant variation (from 10 to 12) in TX CLK (can be because of PPM).However now, as depicted in red box, the bit is sampled correctly.
In the first TX clock cycle, period is 10 time unit which is locked after frequency detection and also reflected on RX CLK(FD+PD). According to rule 1 the edge on data will make the level of RX CLK(FD+PD) to 1(denoted by first dotted arrow). Then negedge will be asserted on RX CLK(FD+PD) after half of the RX CLK(FD) period. Then posedge will be asserted on RX CLK(FD+PD) depending on half of RX CLK(FD) period or transition on DATA whichever comes first (rule number 2).
On sixth and seventh TX clock cycle the DATA bits are 0 and 0, no transition on line. So RX CLK(FD+PD) will follow rule number 2 to have a clock period of RX CLK(FD), depicted in first cycle of RX CLK(FD+PD) in red box. Since the period for seventh clock cycle is 12 time unit, transition on DATA will occur after 2 unit of time than expected. Now RX CLK(FD+PD) already asserted the posedge and starts waiting for completion of half of RX CLK(FD) period to assert negedge on RX CLK(FD+PD). However after completion of 2 time unit, a transition gets detected which causes to restart the wait time of half of RX CLK(FD). That leads to negedge of RX CLK(FD+PD) after 7 time unit instead of 5 (i.e. half of RX CLK(FD) period). Then posedge after 5 time unit (2nd cycle of RX CLK(FD+PD) in red box). Likewise phase alignment adjusts the clock period based on constant variations in incoming data stream.
Below figure depicts negative jitter case…
The third TX clock period has been varied from 10 time unit to 7 time unit due to negative jitter. This variation encountered correctly by phase alignment as depicted in red box.
The posedge on RX CLK(FD+PD) occurs as transition detected on DATA. Negedge on RX CLK(FD+PD) occurs after completion of half of RX CLK(FD) period. Now a transition is seen on DATA before completion of half of RX CLK(FD) period which causes a level transition from 0 to 1 on RX CLK(FD+PD) (shown by 3rd curved dotted arrow).
Below figure depicts positive jitter case…
Fifth TX clock period has been varied from 10 time unit to 13 time unit due to positive jitter. This variation encountered correctly by phase alignment as depicted in red box.
RX CLK(FD+PD) will follow rule number 2 to have a clock period of RX CLK(FD), depicted in first cycle of RX CLK(FD+PD) in red box. Since the period for fifth clock cycle is 13, transition on DATA will occur after 3 unit of time than expected. Now RX CLK(FD+PD) already asserts the posedge and starts waiting for completion of half of RX CLK(FD) period to assert negedge on RX CLK(FD+PD). However after completion of 3 time unit, a transition gets detected which causes to restart the wait time of half of RX CLK(FD). That leads to negedge of RX CLK(FD+PD) after 8 time unit instead of 5 (i.e. half of RX CLK(FD) period). Then posedge after 5 time unit (2nd cycle of RX CLK(FD+PD) in red box).
That was all about working process of phase alignment.
There is one issue that is not covered by frequency detection and phase alignment! Phase alignment works on transition in incoming data stream. However it is possible to have long identical bit stream which does not have any transition in it. In this case if cumulative shift in an edge becomes more than half of recovered clock period (RX CLK(FD)) then it will lead to bit errors and sample the data incorrectly.
To solve this problem, bit sequences are processed with various types of encoding before transmitting it on wire. This limits the number of consecutive identical bits to a certain level. This reduces the probability of approaching the cumulative shift to more than half of RX CLK(FD).
For example, in USB 3.0 the data bits are processed with 8B10B encoding before transmitting it on a wire.
Below figure describes the behavioral block diagram of CDR
Incoming data stream passed as input to FD(frequency detector), ED(edge detector) and a D flip flop. Frequency detector generates a frequency based on training sequences. Edge detector gives output whenever it detects a transition on incoming data. The outputs of frequency detector and edge detector passed as input to clock generator block which generates a clock to sample the data. This generated clock and incoming data passed to D flip flop to regenerate the bit stream.
Below figure depicts the flow chart for clock generation after phase alignment
RX CLK(FD+PD) will be initialized to either 1 or 0. After initialization two parallel processes will start, wait timer for half of RX CLK(FD) time and edge detection on incoming data. Whichever is completed first out of the two parallel processes will disable the other process. If timer timed out before any edge is detected then it will check for current level of RX CLK(FD+PD). If level of RX CLK(FD+PD) is 1 then go to “RX CLK(FD+PD)=0” activity and reset RX CLK(FD+PD) to 0 else go to “RX CLK(FD+PD)=1” activity. Restart both the parallel processes again and wait for completion of either one. If an edge is detected before completion wait timer then move to “RX CLK(FD+PD)=0” activity and set the level of RX CLK(FD+PD) to 1.Restart both the parallel processes again and wait for completion of either one.
I hope this article will help engineers to model the clock and data recovery.
Author: Deepak Nagaria with Aditya Mittal