VHDL coding tips and tricks: July 2010

Sunday, July 25, 2010

Binary counter IP core in Xilinx Core Generator

      Even though a custom VHDL program can be written very easily for a counter, Xilinx Core Generator provides free Counter IP core.This can be used if you want to save time and your code need many extra  functionality. The counter can be up to 256 bits wide.
      This tutorial is for binary counter version 11.0 and I am using xilinx 12.1 webpack for this project.But without any major changes you can follow the same rules for other versions of the IP.
The counter IP can be viewed under the category , Basic elements -> Counters.I have written a tutorial earlier, on how to create a core generator project and about its simulation. If this is your first project using Core generator then I suggest you go through it first . It is available here.
Now double click on the binary counter IP, for setting the generic parameters as required.A new window similar to the one shown below will show up: 

 Now I will go through the general settings and their meaning.
1)Component name : Name of the module you want .
2)Implement using : You have two options here – DSP blocks and fabric. If you select DSP then the Core generator will use the in built DSP blocks available in the FPGA chip for making the counter. Selecting Fabric will result in only usage of available Look up table’s and flip flops. If you want the counter to be very wide enough then select the Fabric option.
3)Output width : Width of the counter output. If you choose ‘n’ then the counter can count up to the max value value 2^n-1.
4)Increment value : This is the value by which counter value gets incremented or decremented in each step.
5)Loadable : If you want the counter value to be changed at any stage make it loadable.This will introduce a std_logic and a std_logic_vector input in the IP port list.When load=1 the value available at the input std_logic_vector port will replace the counter value after some latency cycles.
6)Restrict Count : Fix a maximum count value.Use this if your final count value is not 2^n-1.
7)Count mode: You have three options – UP,DOWN and UPDOWN. They have their usual meanings with respect to a counter.
8)Sync Threshold output : If you select this then when the count value is equal to the “Threshold value” given, the std_logic signal will go high for one clock cycle.
9)Clock Enable : Enables the clock.This input should be high for the core to work properly.This is also an example of gated clock.
10)Synchronous clear : Clear the output at the clock edge when activated.       
11)Synchronous set : Set the output at the clock edge when activated.
12)Synchronous Init : Initialize the count value(output) at the clock edge when activated.Initialization value can be set by you.
13)Power-on reset init value : The count value when the module is switched ON.
14)Latency configuration : Give the latency(delay). Select “automatic” for optimal latency settings.

I have tested some of the features of the IP core. The settings I have used are given in the image shown above. The testbench code I have used is given below:
library IEEE;

entity blog_cg is
end blog_cg;

architecture Behavioral of blog_cg is

signal clk,sset,load,thresh0 : std_logic:='0';
signal l,q : std_logic_vector(2 downto 0):="000";

component counter IS
        port (
        clk: IN std_logic;
        sset: IN std_logic;

        load: IN std_logic;
        l: IN std_logic_VECTOR(2 downto 0);
        thresh0: OUT std_logic;
        q: OUT std_logic_VECTOR(2 downto 0));
END component;


UUT : counter port map(clk,sset,load,l,thresh0,q);

--Generate 2 ns clock.
clock : process begin
                clk <= '0';
                wait for 1 ns;
                clk <= '1';
                wait for 1 ns;
end process;

--Generate the required inputs for testing/simulation.
simulate : process begin
wait for 3 ns;
wait for 2 ns;
load <='1';
wait for 2 ns;
load <= '0';
end process;

end Behavioral;

Here is the simulation waveform:
If you look carefully, you can notice that there is a delay of one clock cycle in affecting the output value after applying sset and load control signals.This is the latency value the I made "automatic" in the IP core setting.If you make it manual, then this delay can be set as per your wish.

Saturday, July 24, 2010

KEEP HIERARCHY : A synthesis option in XST.

    For those who doesnt know about what this topic about, I am trying to talk about the "keep hierarchy" option available in the xilinx synthesis options.This option along with many other options can be viewed or changed by going through the following steps:
1)Right clicking on the synthsis-XST in the processes tab.
2)Click on Properties to get the window shown below in the image.
3)Click on synthesis options in the category side.
4)Look down on the right side of the window for the "Keep Hierarchy" option and change it to No or yes.

Now I want to list out some points about this "Keep Hierarchy" option:
1)This particular option is used for designs which has more than one VHDL code in the design.I mean a design which may contain a top module and some components.
2)Whenever the XST starts synthesis process it tries to optimize the design for the particular selected  architecture.Now when we select the option as "keep hierarchy yes" then XST will only optimize by taking each component at a time.This is faster in terms of synthesis time but optimization has limited reach.
3)when we select the option as "keep hierarchy no" then XST will optimize the whole design at one single pass.This is time consuming but results in better optimization results.

Let me prove the above points with the help of an example.The code I have used is already published in
this blog.It is "4 bit Synchronous UP counter(with reset) using JK flip-flops".

I copied these codes into a new project directory and synthesized the design using the option "Keep Hierarchy: yes" first. The following synthesis results were showed up:

Device utilization summary:
Selected Device : 5vlx30ff324-3

Slice Logic Utilization:
 Number of Slice Registers:               4  out of  19200     0%
 Number of Slice LUTs:                    6  out of  19200     0%
    Number used as Logic:                 6  out of  19200     0%

Slice Logic Distribution:
 Number of LUT Flip Flop pairs used:     10
   Number with an unused Flip Flop:       6  out of     10    60%
   Number with an unused LUT:             4  out of     10    40%
   Number of fully used LUT-FF pairs:     0  out of     10     0%
   Number of unique control sets:         4

Minimum period: 1.565ns (Maximum Frequency: 639.141MHz)

Now I ran the XST again with the option "Keep Hierarchy: no".This time I got different results:

Slice Logic Utilization:
 Number of Slice Registers:               4  out of  19200     0%
 Number of Slice LUTs:                    4  out of  19200     0%
    Number used as Logic:                 4  out of  19200     0%

Slice Logic Distribution:
 Number of LUT Flip Flop pairs used:      8
   Number with an unused Flip Flop:       4  out of      8    50%
   Number with an unused LUT:             4  out of      8    50%
   Number of fully used LUT-FF pairs:     0  out of      8     0%
   Number of unique control sets:         3

Minimum period: 1.101ns (Maximum Frequency: 908.348MHz)

You can see that there is a 300 MHz increase in the operating frequency when the hierachy is not kept.Even the resource usage become minimal in the second case.
Another observation you can make is from the Technology Schematic viewer in XST. For the first case the RTL viewer will show you sub blocks in the main block which may contain LUT's and flipflops.But in the second case there wont be any sub blocks available at all.Only LUT's and flipflops will be available directly under the main block.This is what I meant by saying that the XST will optimize the design taking as a whole.You can verify this yourself by clicking on the "View Technology schematic" option under synthesis-XST.