VHDL coding tips and tricks: April 2010

Monday, April 26, 2010

How to implement State machines in VHDL?

    A finite state machine (FSM) or simply a state machine, is a model of behavior composed of a finite number of states, transitions between those states, and actions.It is like a "flow graph" where we can see how the logic runs when certain conditions are met.
    In this aricle I have implemented a Mealy type state machine in VHDL.The state machine bubble diagram in the below figure shows the operation of a four-state machine that reacts to a single input "input" as well as previous-state conditions.
The code is given below:

library ieee;
use IEEE.std_logic_1164.all;

entity mealy is
port (clk : in std_logic;
      reset : in std_logic;
      input : in std_logic;
      output : out std_logic
  );
end mealy;

architecture behavioral of mealy is

type state_type is (s0,s1,s2,s3);  --type of state machine.
signal current_s,next_s: state_type;  --current and next state declaration.

begin

process (clk,reset)
begin
 if (reset='1') then
  current_s <= s0;  --default state on reset.
elsif (rising_edge(clk)) then
  current_s <= next_s;   --state change.
end if;
end process;

--state machine process.
process (current_s,input)
begin
  case current_s is
     when s0 =>        --when current state is "s0"
     if(input ='0') then
      output <= '0';
      next_s <= s1;
    else
      output <= '1';
      next_s <= s2;
     end if;  

     when s1 =>        --when current state is "s1"
    if(input ='0') then
      output <= '0';
      next_s <s3;
    else
      output <= '0';
      next_s <= s1;
    end if;

    when s2 =>       --when current state is "s2"
    if(input ='0') then
      output <= '1';
      next_s <= s2;
    else
      output <= '0';
      next_s <= s3;
    end if;


  when s3 =>         --when current state is "s3"
    if(input ='0') then
      output <= '1';
      next_s <= s3;
    else
      output <'1';
      next_s <= s0;
    end if;
  end case;
end process;

end behavioral;

I think the code is self explanatory.Depending upon the input and current state the next state is changed.And at the rising edge of the clock, current state is made equal to next state.A "case" statement is used for jumping between states.
The code was synthesised using Xilinx XST and the results are shown below:

---------------------------------------------------------

States                      4                                            
Transitions                 8                                            
Inputs                      1                                            
Outputs                     4                                            
Clock                       clk (rising_edge)                  
Reset                       reset (positive)                
Reset type                  asynchronous                          
Reset State                 s0                        
Power Up State              s0                              
Encoding                    Automatic                        
Implementation              LUT

---------------------------------------------------------

Optimizing FSM on signal with Automatic encoding.
-------------------
 State | Encoding
-------------------
 s0    | 00
 s1    | 01
 s2    | 11
 s3    | 10
-------------------

   Minimum period: 0.926ns (Maximum Frequency: 1080.030MHz)
   Minimum input arrival time before clock: 1.337ns
   Maximum output required time after clock: 3.305ns
   Maximum combinational path delay: 3.716ns

The technology schematic is shown below:

     As you can see from the schematic, XST has used two flipflops for implementing the state machine.The design can be implemented in hardware using many FSM encoding algorithms.The algorithm used here is "Auto" which selects the needed optimization algorithms during the synthesis process.Similarly there are other algorithms like one-hot,compact,gray,sequential,Johnson,speed1 etc.The required algorithm can be selected by going to Process -> Properties -> HDL options -> FSM encoding algorithm in the main menu.Now select the required one, from the drop down list.
More information about these options can be found here.

A very popular encoding method for FSM is One-Hot, where only one state variable bit is set, or "hot," for each state.The synthesis details for the above state machine implementation using One-hot method is given below:

Optimizing FSM on signal with one-hot encoding.
-------------------
 State | Encoding
-------------------
 s0    | 0001
 s1    | 0010
 s2    | 0100
 s3    | 1000
-------------------

   Minimum period: 1.035ns (Maximum Frequency: 966.464MHz)
   Minimum input arrival time before clock: 1.407ns
   Maximum output required time after clock: 3.418ns
   Maximum combinational path delay: 3.786ns

The Technology schematic is shown below:
    The main disadvantage of One-hot encoding method can be seen from the schematic.It uses 4 flip flops while, binary coding which is explained in the beginning of this article, uses only 2 flip flops.In general, for implementing a (2^n) state machine , binary method take n-flip flops while one hot method takes (2^n) flip flops.
But there are some advantages with one-hot method:
1)Because only two bits change per transition, power consumption is small.
2)They are easy to implement in schematics.

Monday, April 19, 2010

8 bit Binary to BCD converter - Double Dabble algorithm

    I have written a function for converting a 8-bit binary signal into a 12 bit BCD ( consisting of 3 BCD digits).An algorithm known as "double dabble" is used for this.The explanation for the algorithm can be found here.

The function code is given below:

function to_bcd ( bin : std_logic_vector(7 downto 0) ) return std_logic_vector is
variable i : integer:=0;
variable bcd : std_logic_vector(11 downto 0) := (others => '0');
variable bint : std_logic_vector(7 downto 0) := bin;

begin
for i in 0 to 7 loop  -- repeating 8 times.
bcd(11 downto 1) := bcd(10 downto 0);  --shifting the bits.
bcd(0) := bint(7);
bint(7 downto 1) := bint(6 downto 0);
bint(0) :='0';


if(i < 7 and bcd(3 downto 0) > "0100") then --add 3 if BCD digit is greater than 4.
bcd(3 downto 0) := bcd(3 downto 0) + "0011";
end if;

if(i < 7 and bcd(7 downto 4) > "0100") then --add 3 if BCD digit is greater than 4.
bcd(7 downto 4) := bcd(7 downto 4) + "0011";
end if;

if(i < 7 and bcd(11 downto 8) > "0100") then  --add 3 if BCD digit is greater than 4.
bcd(11 downto 8) := bcd(11 downto 8) + "0011";
end if;


end loop;
return bcd;
end to_bcd;

Some sample inputs and the corresponding outputs are shown below:
bin = "01100011"   ,     output = "0000 1001 1001"  (99).
bin = "11111110"   ,     output = "0010 0101 0100"  (254).
bin = "10111011"   ,     output = "0001 1000 0111"  (187).

The code is synthesisable, and the cell usage statistics for Virtex-5 FPGA is shown below:


# BELS                             : 24
#      GND                         : 1
#      LUT3                        : 1
#      LUT4                        : 2
#      LUT5                        : 12
#      LUT6                        : 7
#      MUXF7                       : 1
# IO Buffers                       : 20
#      IBUF                        : 8
#      OBUF                        : 12


Note :- The code can be modified to convert any length binary number to BCD digits.This require very little change in the code.

Recursive functions in VHDL

     In this article I will write about recursive functions in VHDL.Recursion is a method of defining functions in which the function being defined calls itself. The idea is to execute a task in a loop, in a self similar way.
When comes to VHDL, you cannot blindly write a recursive code in your design.Recursive logic has the disadvantage of eating up lot of resources in FPGA. I have explained these points with the help of an example.

    Let us see what is the objective of the code,I have written.Here I am doing repetitive XOR operation on a 16 bit signal,until the result obtained is 2 bit. For example if the signal is "1111001100010101" ,then I am doing the following steps:
MSB 8 bits of signal : 11110011
LSB 8 bits of signal  : 00010101
XOR operation        : 11100110   ( call this signal as 'x1')

MSB 4 bits of x1 : 1110
LSB 4 bits of x1  : 0110
XOR operation   : 1000   ( call this signal as 'x2')

MSB 2 bits of x2 : 10
LSB 2 bits of x2  : 00
XOR operation   : 10   ( this should be the result obtained when the code is simulated)

Now normally if you want to do such a program you have to define a lot of temporary signals for x1,x2 etc.This makes the code more complicated and difficult to understand.
But to make the code short and simple ,you can write a recursive function like shown below:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity recursion is
port ( num : in std_logic_vector(15 downto 0);
       exor_out : out std_logic_vector(1 downto 0)
         );
end recursion;

architecture Behavioral of recursion is

function exor( num : std_logic_vector ) return std_logic_vector is
variable numf : std_logic_vector(num'length-1 downto 0):=(others => '0');
variable exorf : std_logic_vector((num'length/2)-1 downto 0):=(others => '0');

begin
numf := num;
if(num'length = 4) then
exorf := numf(1 downto 0) xor numf(3 downto 2);
else
exorf := exor(numf(num'length-1 downto num'length/2)) xor exor(numf((num'length/2)-1 downto 0));
end if;
return exorf;
end exor;

begin
exor_out <= exor(num);
end Behavioral;

Now let us analyse the code.

1)function exor( num : std_logic_vector ) return std_logic_vector is :
     The function is defined in a general sense.This means the function "exor" takes a std_logic_vector" of any length as input and outputs another std_logic_vector.

2)variable numf : std_logic_vector(num'length-1 downto 0):=(others => '0');
     This is the signal on which XOR operation has to be performed.The num'length attribute is used to get the size of the input argument.This value is initialized to zero.

3)variable exorf : std_logic_vector((num'length/2)-1 downto 0):=(others => '0');
     This is the result of XOR operation.The result always has half the size of the input.

4) if(num'length = 4) then
       exorf := numf(1 downto 0) xor numf(3 downto 2);
    else
      exorf := exor(numf(num'length-1 downto num'length/2)) xor exor(numf((num'length/2)-1 downto 0));
    end if; 
     On the 4th line, I am calling the function exor in a recursive way.Each time the function is called only half of the signal gets passed to the function.The recursive call is continued until the size of the signal becomes 4.Note that how, exor function calls itself , each time passing only  a part of the input applied to it until it reaches a critically small size(which is 4 here).

Now let us see how this code is synthesised.The Technology schematic view of the design is shown below:
By analyzing the figure you can see that there are 4 LUT's used to implement the logic-Two 4 input  LUT's and two 5 input LUT's.the connection can be understood from the below block diagram:
  From the figure you can see that for implementing the logic relativly more resources are used.This is the disadvantage of recursive functions in VHDL. In C and other high level languages recursion is implemented using stack and, there the main issue is stack overflow.But in a HDL like VHDL, the resources may get heavily used for even simple codes.The synthesis tool implements the logic by replicating the function in separate hardware components.This means that if a function calls 10 times itself, then the resources will be used nearly 10 times than that , when an individual block is implemented.
  The same thing if we implement with the help of a clock,then you need to use only two XOR gates.But in that case the block will take 8 clock cycles to compute the output.But the recursive function defined here uses more logic resources to compute the output in less time.

Note :- Use recursive functions when you have enough logic gates in your FPGA, and speed is your main concern.

Wednesday, April 14, 2010

VLSI Interview Questions - Part 2

This is part -2 of the interview questions series.Hope it is useful.

1)For the circuit shown below, what should the function F be, so that it produces an output of the same frequency (function F1), and an output of double the frequency (function F2).

a. F1= NOR gate and F2= OR gate
b. F1=NAND gate and F2= AND gate
c. F1=AND gate and F2=XOR gate
d. None of the above
Ans : (d) . (Hint : Assume a small delta delay in the NOT gate).

2)The maximum number of minterms realizable with two inputs (A,B) is:
Ans : For n bits the max number minterms is, (2^n).
For n=2, no. of minterms = (2^2) =  4.
http://www.iberchip.net/VII/cdnav/pdf/75.pdf

3)The maximum number of boolean expressions with two inputs (A,B) is:
Ans : For n bits the max number boolean expressions are, 2^(2^n).
For n=2, no. of boolean expressions = 2^(2^2) =  2^4 = 16.
http://www.iberchip.net/VII/cdnav/pdf/75.pdf

4) A ring counter that counts from 63 to 0 will have ______ D flip-flops,
but a binary counter that counts from 63 to 0 will have _____ D flip-flops
Ans : For ring counter 64. for binary counter 6.

5) Why cache memory is used in computers?
    Cache memory is used to increase the speed of memory access by processor.Unlike the main(physical) memory cache memory is small and has very short access time.The most recent data accessed by processor is stored in cache memory.This will help the processor to save time bacause time is not wasted in accessing the same data from the main memory again and again.
A good example is that ,if processor is executing a loop 1000 times involving many variables(so that the CPU registers available are all used up) then the value of these variables can be stored in cache memory.This will make the loop execution faster.
In designing cache, cache miss probability and hit probability determines the efficiency of the cache and the extend to which the average memory access time can be reduced.

6) How will you design a sequence detector?
See this link:
http://web.cs.mun.ca/~paul/cs3724/material/web/notes/node23.html

7) What is setup time and holdtime?
Setup time is the minimum amount of time before the clock’s active edge by which the data must be stable for it to be detected correctly. Any violation in this will cause incorrect data to be captured.
(Analogy for setup time: Suppose you have to catch a train and the train leaves at 8:00.Say you live 20 minutes away from the station, when should you leave your house?
Ans : at 7:40 -> set up time is 20 mins in this case)

Hold time is the minimum amount of time after the clock’s active edge during which the data must be stable. Any violation in this required time causes incorrect data to be latched.
(Suppose your friend needs help in boarding the train and train only allows 5 mins for boarding.How long should you stay after you have arrived?
Ans : Atleast 5 mins -> Hold time is 5 mins )
A very good tutorial with examples about setup time and hold time can be found at this link:
http://nigamanth.net/vlsi/2007/09/13/setup-and-hold-times/

8)What is the difference between Moore and Mealy state machines?
Ans : Moore and Mealy state machines are two ways of designing a state machine. Moore state machines are controlled in such a way that the outputs are a function of the previous state and the inputs. However, Mealy state machines are controlled in a way such that the Outputs may change with a change of state OR with a change of inputs.A Moore state machine may require more states(but less complexity) than a Mealy state machine to accomplish the same task.

VLSI Interview questions - Part 1

     Here are some common interview questions asked by some VLSI companies.Try to learn the concept used in solving the questions rather than blindly going through the answers.If you have any doubts drop me a note in the comment section.

1)Design a full adder using halfadders.

Ans :
2) Find the value of A,B,C in the following circuit ,after 3 clock cycles.  (ST Microelectronics)
This is a simple Ring counter.An n-bit ring counter has n states.The 3-bit counter shown above has 3 states and they are : 100 , 010 , 001 , 100 and so on..
So after 3 clock cycles  A,B,C = 100.

3) Design XOR gate using 2:1 MUX.    (Intel)
Ans :       

4) If A=10 and B=20, without using temporary register how can you interchange the two things?   (Intel) 
Ans :
    Perform the following operations sequentially:
         A = A xor B;
         B = A xor B;
        A = A xor B;
  Now A=20 and B=10.

5)What is the expression for 
output 'y' in the following circuit?
Ans : (In the notation I have used,A' means not(A), and AB means (A and B).
y = ( A'B'C + AB'C' + A'BC + ABC' )
  = ( A'C (B+B') + AC' (B+B') )
  = A'C + AC'
  = A xor C.

6)The input combination to find the stuck at '0' fault in the following circuit is:  (Texas Instruments)


Ans : X is always zero in the above circuit. So P is always zero whatever the value of A,B,C or D is.
To check the fault at X, make either inputs C or D zero, and A,B as '1'.So the input combination is "1101".

7)Consider a two-level memory hierarchy system M1 & M2. M1 is accessed first and on miss M2 is accessed. The access of M1 is 2 nanoseconds and the miss penalty (the time to get the data from M2 in case of a miss) is 100 nanoseconds. The probability that a valid data is found in M1 is 0.97. The average memory access time is:
Ans : This question is based on cache miss and success probability.
Average memory access time = (Time_m1 * success_prob ) + ( (Time_m1 + Time_m2) * miss_prob)
                    = ( 2* 0.97 ) + ( (2+100) * (1- 0.97) )
                    =  1.94 + 3.06 = 5 ns.

8)Interrupt latency is the time elapsed between:
a. Occurrence of an interrupt and its detection by the CPU
b. Assertion of an interrupt and the start of the associated ISR
c. Assertion of an interrupt and the completion of the associated ISR
d. Start and completion of associated ISR.
Ans : (b). ISR means Interrupt service routine.

These are only some of the questions I have seen.More questions will be up soon.Get the updates by subscribing to VHDLGURU.If you want answers, for questions related to VLSI,digital etc. then you can contact me. 

What is a Gated clock and how it reduces power consumption?

    Gated clock is a well known method for reducing power consumption in synchronous digital circuits.By this method the clock signal is not applied to the flip flop when the circuit is in idle condition.This reduces the power consumption.
In a digital corcuit the power consumption can be accounted due to the following factors:
1) Power consumed by combinatorial logic whose values are changing on each clock edge 
2) Power consumed by flip-flops.
Of the above two, the second one contributes to most of the power usage.
A flip flop consumes power whenever the applied clock signal changes,due to the charging and discharging of the capacitor.If the frequency of the clock is high then the power consumed is also high.Gated clock is a method to reduce this frequency.

Consider the following VHDL code:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity normalclk is
port( clk : in std_logic;
      load : in std_logic;
      i : in std_logic;
     o : out std_logic
      );    
end normalclk;


architecture Behavioral of normalclk is

BEGIN
process(clk)
begin
if(rising_edge(clk)) then
if(load ='1') then
o <= i;
end if;
end if;
end process;

end Behavioral;

The code if synthesized will look like this:(RTL schematic on the left side and technology schematic on the right side)
As you can see the clock is always applied to the flip flop and this results in considerable loss in power due to frequent charging and discharging of the capacitor.

Now let us modify the above piece of code , by using gated clock.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity gatedclk is
port( clk : in std_logic;
      load : in std_logic;
      i : in std_logic;
      o : out std_logic
      );    
end gatedclk;

architecture Behavioral of gatedclk is
signal gclk : std_logic;

BEGIN
gclk <= clk and load;
process(gclk)
begin
if(rising_edge(gclk)) then
o <= i;
end if;
end process;

end Behavioral;

The synthesized design will look like this:(RTL schematic on top and technology schematic on bottom)
   Note the AND operation between load and clk signal.Here the clock to the flip flop "FD" is said to be gated.The code's purpose is that ,the output has to change only when load is '1' at the rising edge of clock.So it is useless to drive the flip flop when the load signal is '0'.If the load signal changes very rarely, then the above gated clock code will result in a low power design.
   Apart from the advantage of reducing power consumption it has some disadvantages :
1) As you can see when you use gated clock, the buffer used in of type IBUF (input buffer) .But when the clock is not gated, synthesis tool uses BUFGP buffer ( which is faster and used normally as a buffer for clocks).This may result in a small delay.
2) In a synthesis point of view the gate controller takes more area and make the design more complicated.

Tuesday, April 13, 2010

Synchronous Vs Asynchronous resets in VHDL

     In your design you may need to initialize all your signals to a predetermined state.This is done by applying a reset signal.A reset signal can change the system in two ways: Synchronous and asynchronous.In this article I have tried to explain the advantages and disadvantages of both the methods and how exactly it is implemented in hardware.

Synchronous Reset :

A synchronous reset signal can be applied as shown below :

process(clk)
begin
if(rising_edge(clk)) then
if(reset = '0') then  --reset is checked only at the rising edge of clock.
o <= i;
else
o <= '0';
end if;
end if;
end process;

The code is synthesised to the following block in FPGA: (truth table of FDR flip flop is also given)
In the schematic diagram FDR is a single D-type flip-flop with data (D) and synchronous reset (R) inputs and data output (Q). The synchronous reset (R) input, when High, overrides all other inputs and resets the Q output Low on the 0 to 1 clock (C) transition. The data on the D input is loaded into the flip-flop when R is Low during the 0 to1 clock transition.If you analyze the above code you can see that the value of reset changes the signal 'o' only at the rising edge of the clock.This method has the following advantages:
1)The reset applied to all the flip-flops are fully synchronized with clock and always meet the reset recovery time.
2)In some cases, synchronous reset will synthesis to smaller flip-flops.
Synchronous resets have some disadvantages also:
1)If the reset applied is for a small duration then the clock edge may not be able to capture the reset signal.Thus if you are synchronous resets make sure that your reset signal stays active for enough time so that it get captured by the clock.
2)Also the change in reset doesn't immediately reflect in the associated signals.

Asynchronous reset :

Now let us have a look at the asynchronous reset :

process(clk,reset)
begin
if(reset = '0') then  --change in reset get immediately reflected on signal 'o'.
if(rising_edge(clk)) then
o <= i;
end if;
else
o <= '0';
end if;
end process;

The code is synthesised to the following in FPGA. (the truth table of the particular flip-flop is also given)

In the schematic FDC is a single D-type flip-flop with data (D) and asynchronous clear (CLR) inputs and data output (Q). The asynchronous CLR, when High, overrides all other inputs and sets the Q output Low. The data on the D input is loaded into the flip-flop when CLR is Low on the 0 to 1 clock transition.If you analyse the code you can see that when the reset goes high , immediately signal 'o' will become '0'.It doesn't wait for clock change.Now let us look at the advantages of this method:
1)High speed can be achieved.
2)Data can be reset without waiting for the clock edge.
The disadvantages are:
1) Asynchronous resets have metastability problems. By metastability what I mean is that,the clock and reset have no relationship.So if the reset  changes from 1 to 0 at the rising edge of the clock, the output is not determinate. The reset input has to follow the reset recovery time rule.This time is a kind of setup time condition on a flip-flop that defines the minimum amount of time between the change in reset signal and the next rising clock edge.If the signal doesn't follow this set up time then it may create metastability problems.

Note :- As you can see both the methods have their own advantages and disadvantages.And selecting one of the method depends upon your design requirement.There is another method of resetting the signals known as "Asynchronous Assertion, Synchronous De-assertion" which is the best method of resetting signals.This will be discussed in the next article.

Process sensitivity list Vs Synthesis-ability

For some of you this is a common error while synthesisng the code :
"One or more signals are missing in the process sensitivity list". In this article I will explain if there is any relation between the process sensitivity list and synthesis results.

Let us take an example.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity blog1 is
port( a,b : in unsigned(3 downto 0);
        c,d : out unsigned(3 downto 0);
        clk : in std_logic;
        rst : in std_logic
      );    
end blog1;

architecture Behavioral of blog1 is

BEGIN
--Synchronous process(some flipflop's are used for implementation)
process(clk,rst)    
begin
    if(rst = '1') then
        c<="0000";
    elsif(rising_edge(clk)) then
        c<=a;
    end if;
end process;
--combinational process(some LUT's are used for implementation)
process(a,b)
begin
    d <= a and b;
end process;

end Behavioral;

The testbench code used for testing the functionality of the code is given below:

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;

ENTITY tb IS
END tb;

ARCHITECTURE behavior OF tb IS
   --Inputs
   signal a : unsigned(3 downto 0) := (others => '0');
   signal b : unsigned(3 downto 0) := (others => '0');
   signal clk : std_logic := '0';
   signal rst : std_logic := '0';
    --Outputs
   signal c : unsigned(3 downto 0);
   signal d : unsigned(3 downto 0);
   -- Clock period definitions
   constant clk_period : time := 10 ns;
   
BEGIN

    -- Instantiate the Unit Under Test (UUT)
   uut: entity work.blog1 PORT MAP (
          a => a,
          b => b,
          c => c,
          d => d,
          clk => clk,
          rst => rst
        );
   -- Clock process definitions
   clk_process :process
   begin
        clk <= '0';
        wait for clk_period/2;
        clk <= '1';
        wait for clk_period/2;
   end process;
   -- Stimulus process
   stim_proc: process
   begin       
      b<="1111";
        a<="1001";
        wait for 20 ns;
        a<="1010";
        rst<='1';
        wait for 30 ns;
        rst<='0';
        a<="1011";
      wait;
   end process;

END;

We will first check the simulation results for the above code.See the below image:
Ok. So the code is working well as per the simulation results. Now let us synthesis the code using Xilinx ISE. Synthesis process also finished successfully.For your future reference make a copy of the synthesis report somewhere.

Now let us make a small change in the process sensitivity list of the above code.
Use process(rst)  instead of process(clk,rst).
Also use process(b) instead of process(a,b).
Simulate the design once more using the same testbench code. I am giving the waveform I got below:
What did you notice between the new waveform and old waveform. Since we have removed the "clk" and "a" from the process sensitivity lists the output signals stopped changing with respect to changes in inputs.So effectively the code is not working in the simulation.That is a big problem. But is this change going to be reflected in the synthesis results also?

Let us synthesis the new design and see.We got the following warning after synthesis:
"One or more signals are missing in the process sensitivity list. To enable synthesis of FPGA/CPLD hardware, XST will assume that all necessary signals are present in the sensitivity list. Please note that the result of the synthesis may differ from the initial design specification. The missing signals are:".

Compare the new and old synthesis reports to check whether this warning has any effect on the synthesis results. To your surprise you will see that there is no change in both the reports.Both the codes have resulted in the same synthesis result.

So what about the warning? After going through some of the forums I found the folowing reasons:
1) Usually the behavior in the equations inside a process is what is intended, the sensitivity list is just a
     bookkeeping chore.It doesnt have anything to do with synthesis.
2) Technically what XST(Xilinx synthesis tool) have implemented is not what your VHDL code says to do as per the VHDL language definition. They are taking somewhat of a guess about what you really intended to do.By violating the language specification they implement it the way they think you 'really' want it and kick out a warning to let you know that the actual implementation will operate differently than your simulation shows.
           (Thanks to KJ)

Conclusion :- Sensitivity list has nothing to do with synthesis.But without the proper sensitivity list, the process will not work in simulation.So as a good practice include all the signals which are read inside the process, in the sensitivity result. The results may be varied if you are using some other tool.I have used Xilinx ISE 12.1 version for the analysis.

Thursday, April 8, 2010

Difference between rising_edge(clk) and (clk'event and clk='1')

     Only few VHDL programmers know that there is something called "rising_edge()" function.Even those who know about it, they still stick to the old fashioned clk'event and clk='1' method of finding an edge transition of clock.So in this article I will explain the difference between rising_edge or falling_edge function and clk'event based edge detection.

Consider the following snippet:

clk_process :process
   begin
        clk <= '0';
        wait for clk_period/2;  --for 0.5 ns signal is '0'.
        clk <= '1';
        wait for clk_period/2;  --for next 0.5 ns signal is '1'.
   end process;

process(clk)
begin
if(rising_edge(clk)) then
xr<= not xr;
end if;

if(clk'event and clk='1') then
x0 <= not x0;
end if;

end process;

If you run the above code the output will look like this:
Now you may ask where is the difference? There is no difference in this case.But let us see another example:

clk_process :process
   begin
        clk <= 'Z';    ----------Here is the change('Z' instead of '0').
        wait for clk_period/2;  --for 0.5 ns signal is '0'.
        clk <= '1';
        wait for clk_period/2;  --for next 0.5 ns signal is '1'.
   end process;

process(clk)
begin
if(rising_edge(clk)) then
xr<= not xr;
end if;

if(clk'event and clk='1') then
x0 <= not x0;
end if;

end process;

Now the output will look like this:

Does this ring any bells?You can see that the signal 'xr' doesn't change at all,but x0 changes as in the first code.Well this is the basic difference between both the methods.To get a clear view look at the rising_edge function as implemented in std_logic_1164 library:

    FUNCTION rising_edge  (SIGNAL s : std_ulogic) RETURN BOOLEAN IS
    BEGIN
        RETURN (s'EVENT AND (To_X01(s) = '1') AND
                            (To_X01(s'LAST_VALUE) = '0'));
    END;

    As you can see the function returns a value "TRUE" only when the present value is '1' and the last value is '0'.If the past value is something like 'Z','U' etc. then it will return a "FALSE" value.This makes the code, bug free, beacuse the function returns only valid clock transitions,that means '0' to '1'.All the rules and examples said above equally apply to falling_edge() function also.

But the statement (clk'event and clk='1') results TRUE when the present value is '1' and there is an edge transition in the clk.It doesnt see whether the previous value is '0' or not.

You can take a look at all the possible std_logic values here.

Note :- Use rising_edge() and falling_edge() functions instead of (clk'event and clk='1') statements in your designs.

Thursday, April 1, 2010

Combinatorial Frequency Multiplier Circuit in VHDL

     This article is about frequency multiplier in VHDL.Frequency multiplier??? Did you mean Frequency divider?No,you heard me right,I mean multiplier.And the interesting thing is that this is a combinational circuit which doubles the frequency applied to it.The digital circuit for this is given below:
Now analyse the above circuit.Assume a practical NOT gate instead of an ideal one.Then you can see the following timing diagram:


As you can see, because of the delay in the NOT gate the "notclk" signal is not an exact invert of the clk signal.After the XOR operation you will get the "output" signal as shown above.If you analyse the frequency of the output signal it is double that of clk signal.But the duty cycle of the output signal is not 50%.

A VHDL code can be written for this code.But since the Xilinx ISE simulator just ignores the gate delays we cannot see the above output in the simulation level.Also for actually applying a delay in hardware you need to use a clock buffer as shown in the code.

--Library declarations
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
Library UNISIM;   --for including the Language template "BUFG"
use UNISIM.vcomponents.all;
--------------------------------------------

entity test is
port (clk : in std_logic;  --input clock
      a : out std_logic;   --output signal which changes with "clk" signal
 --output signal which changes with "frequency multiplied clock" signal        
                b : out std_logic
     );
end test;

architecture Behavioral of test is
signal c1,O,a2 : std_logic:='0';
signal count,count2 : std_logic_vector(31 downto 0):=(others => '0');
begin
   BUFG_inst : BUFG
   port map (
      O => O,     -- Clock buffer output
      I => c1      -- Clock buffer input
   );
c1<=not clk;
a2 <= clk xor O;
b<=count(28);
a<=count2(28);  --original

--counter with modified clock
process(a2)
begin
if(rising_edge(a2)) then
count <= count+'1';
if(count="11111111111111111111111111111111") then
count <=(others =>'0');
end if;
end if;
end process;

--counter with original clock
process(clk)
begin
if(rising_edge(clk)) then
count2 <= count2+'1';
if(count2="11111111111111111111111111111111") then
count2 <=(others =>'0');
end if;
end if;
end process;

end Behavioral;

     If you analyse the code ,you can see that I have two counters running at different frequencies.And the actual output signals are the 28th MSB bit of these counter values.Why this extra piece of code is needed?
As I said the delay in NOT will be neglected in simulation results.So only way to verify your result is to synthesis it and try it in a FPGA board.I tried this code in Virtex 5 FPGA(package : ff136,Device : XC5VSX50T).The clock frequency I gave was 100 MHz.The doubling of the frequency is verified by 2 LED's one running at the input clock and another running at output clock.But at 100 MHz the ON/OFF process of the LED's cannot be seen by the human eyes.So you have to divide both the frequencies by a fixed number.That is why I have chosen a 32 bit counter and used its 28th bit to drive the LED.
When I ran the code I got the following results:
  LED1(original clk)    LED2(doubled clk)
     off                              off
     off                              on
     on                              off
     on                              on
     off                              off

From the way the LED's are ON/OFF, we can see that the clock actually doubled in frequency!!! And it is acting like a 2-bit counter.

Note :- This kind of circuit is the only possible type of combinatorial circuit for doubling frequency.