VHDL coding tips and tricks: xilinx tips

Showing posts with label xilinx tips. Show all posts

Friday, November 13, 2015

Video tutorial on how to simulate a VHDL code using Xilinx ISE

This is the first video, in a series of video tutorials I am planning upload.

In this video, I want to show you

how to create a new project.
Add VHDL codes to it.
Compile and simulate the codes.
How to see internal signals in the waveform window.

I have used Xilinx ISE version 14.6 for this demo. The steps should be almost same in other Xilinx versions too.

Wednesday, July 11, 2012

Synthesised code is too big for the fpga device - What to do?

After lot of hard work you completed your HDL project. The simulation results verified that the code is functionally working. To check how it performs in hardware you synthesis the design. To your bad luck you realize that the code you just wrote is too big for the fpga. What can you do? Don't panic. There are few ways you can tackle this problem.

1)If possible choose a higher graded fpga device:

Its a simple but the easiest thing you can do. Check if the lab or a friend has a better fpga device which can afford your design. If you really don't want to test the design in hardware,but just want to see the synthesis results then simply select the largest device available in the list.

2)Is the fpga out of pins?

Some times the synthesis tool will give out an "Out of resources" warning if the design has too many signals in its port list that the device can't support. This happens when you try to input or output large arrays or vectors.
In such cases use a multiplexed input or output system. Rather than inputting everything in one go, do it step wise. Check it out here.

3)Changing synthesis tool settings:

By default, the synthesis tool try to optimize your design for both speed and resource usage. But you can change this setting so that the tool will optimize for less resource usage. This may reduce the speed a little, but may significantly reduce the resource usage.

4)Re-use of resources:

Analyze the design carefully and see if any parts of the design can use time-sharing of resources. To do this you have to synchronize the whole design with a clock.

Time sharing means using the same resource for similar kind of operations like addition, multiplication etc. Suppose you want to do an operation like,

y = a+b+c+d; which uses 3 adder circuits.

then split the above operation over 3 clock cycles like this,

y= a + b; --in first clock cycle.
y= y + c; --in second clock cycle.
y= y + d; --in third clock cycle.

this way only one adder will be used for the whole operation. This will increase the time for generating output, but reduces logic usage.

5) Look for any mathematical simplifications:

Analyze the mathematical formula you are implementing and look for any simplification. For instance take this operation,

y=x / 5;

In digital world, division circuit is bigger than multiplication circuit. So make a small change in the formula like this,

y = x * (1/5) = x * 0.2;

6)Simplify design based on nature of inputs:

The code may be written for a generic use. But in real cases, the range of inputs may be small and predictive in nature. In such cases you can further simply the formula.

One good example is multiplication and division of variables by a number which is power of 2. If the multiplicand or divisor is a power of 2, then you can implement it using a left shift or right shift operation respectively. This is an excellent optimization method in some cases.

Friday, July 29, 2011

Some tips on reducing power consumption in Xilinx FPGA's

Power estimation and power reduction is an important part of any design. Especially in wireless devices, the reduction in power is a very important factor. In this article I will note down some points, on how to reduce the power consumption for xilinx based designs.

1)BRAM Enable signal:
Every BRAM has an enable signal which by default is high always. Most of the HDL coders never care to disable it even when the BRAM is not used. But when this enable signal is ON BRAM consumes a lot of power. It doesn't matter whether you change the address or write the data. So always have a control logic which will control the bram enable signal.

2)Low power option in coregen for BRAM's:
You can create a BRAM entity file using Xilinx's Core generator software. There are several options available in coregen to help you achieve what you want. For low power designs select the "Low power" option in coregen.

3)Decide on LUT or BRAM:
Suppose you want instantiate a memory in your design. Rather than going straight at BRAM or LUT, give it some thought. Xilinx says that for small memory blocks( less than 4 Kbits) LUT consumes less power than BRAM. Similarly for large memory blocks( more than 4 Kbits) BRAM uses less power for its operation than LUT-RAM. So from design to design, switch to LUT-RAM or BRAM depending on the size of memory block.

4)Global reset:
All FPGA devices have an internal global reset path. When the device is switched OFF and then ON, all the flip flops and memories are reset to their initial state. But when we define one more reset signal in the HDL code, Xilinx creates a second reset. This second reset is relatively low and hence not recommended. But if you still want to use them make sure it is synchronous, so that the number of the control signals in your design is low.

5)Initialization of Registers:
It is recommended that we initialize the registers in our design. Normally we do this for safe simulation purposes. But during synthesis, these initialization values will be connected to the INIT pin of the flip flops. Remember that this will work only for bits and bit vectors. It will not work for integer or natural types.

6)DSP slice Utilization:
Depending on how complex your FPGA is it will have some number of DSP slices. These components are highly efficient with low power consumption and high speed. All the DSP blocks in Xilinx FPGA are synchronous. So when we define asynchronous behavior for these operations, XST can't implement it using DSP slices. This will decrease the efficiency of your design.

Note:- I realized these tips after watching a Xilinx tutorial video recently. You can too watch it here.

Tuesday, February 15, 2011

How to stop using "buffer" ports in VHDL?

Buffer ports are used when a particular port need to be read and written. This mode is different from inout mode. The source of buffer port can only be internal. For example if you need a signal to be declared as output, but at the same time read it in the design, then declare it as buffer type.

But buffer types are not recommended by Xilinx and they say if it possible try to reduce the amount of buffer usage. According to Xilinx, buffers may give some problems during synthesis. If a signal is used internally and as an output port then in every level in your hierarchical design, it must be declared as a buffer. So let me show how to reduce the amount of buffer usage with an example.

The following code uses a buffer. I am not going through the functionality of the code since it is very simple.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity with_buffer is
port( A : in unsigned(3 downto 0);
B : in unsigned(3 downto 0);
Clk : in std_logic;
C : buffer unsigned(3 downto 0) );
end with_buffer;

architecture BEHAVIORAL of with_buffer is

begin

process(Clk)
begin
if ( rising_edge(Clk) ) then
C <= A + B + C;
end if;
end process;

end BEHAVIORAL;

As you can see the signal C is used repetitively in the addition, and also its an output of the module. So we declared it as a buffer. Another way to code this same functionality without a buffer is given below:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity without_buffer is
port( A : in unsigned(3 downto 0);
B : in unsigned(3 downto 0);
Clk : in std_logic;
C : out unsigned(3 downto 0) );
end without_buffer;

architecture BEHAVIORAL of without_buffer is

--intermediate signal to avoid the use of buffer.s
signal C_dummy : unsigned(3 downto 0);

begin
C <= C_dummy; --Assign the intermediate signal to output port.

process(Clk)
begin
if ( rising_edge(Clk) ) then
C_dummy <= A + B + C_dummy; --Use the intermediate signal in actual calculation.
end if;
end process;

end BEHAVIORAL;

What I have done is, I used an intermediate or dummy signal inside the process statement. The value of C is read from this dummy signal named C_dummy. And outside the process we assign the value of C_dummy to the output port C. This is how we reduce the buffer usage in vhdl. Avoiding buffer usage is very useful particularly in case of hierarchical designs.

Note:- Both the codes were synthesised successfully using Xilinx Webpack 12.1. The results may or may not vary for Altera FPGA's.

Sunday, January 23, 2011

Block and distributed RAM's on Xilinx FPGA's

Distributed RAM's:

The configuaration logic blocks(CLB) in most of the Xilinx FPGA's contain small single port or double port RAM. This RAM is normally distributed throughout the FPGA than as a single block(It is spread out over many LUT's) and so it is called "distributed RAM". A look up table on a Xilinx FPGA can be configured as a 16*1bit RAM , ROM, LUT or 16bit shift register. This multiple functionality is not possible with Altera FPGA's.

For Spartan-3 series, each CLB contains upto 64 bits of single port RAM or 32 bits of dual port RAM. As indicated from the size, a single CLB may not be enough to implement a large memory. Also the most of this small RAM's have their input and output as 1 bit wide. For implementing larger and wider memory functions you can connect several distributed RAM's in parallel. Fortunately you need not know how these things are done, because the Xilinx synthesiser tool will infer what you want from your VHDL/ Verilog code and automatically does all this for you.

Block RAM's:

A block RAM is a dedicated (cannot be used to implement other functions like digital logic) two port memory containing several kilobits of RAM. Depending on how advance your FPGA is there may be several of them. For example Spartan 3 has total RAM, ranging from 72 kbits to 1872 kbits in size.While Spartan 6 devices have block RAM's of upto 4824 Kbits in size.

Difference between Distributed and Block RAM's:

As you can see from the definition distributed RAM, a large sized RAM is implemented using a parallel array of large number of elements. This makes distributed RAM, ideal for small sized memories. But when comes to large memories, this may cause a extra wiring delays.
But Block RAM's are fixed RAM modules which comes in 9 kbits or 18 kbits in size. If you implement a small RAM with a block RAM then its wastage of the rest of the space in RAM.
So use block RAM for large sized memories and distributed RAM for small sized memories or FIFO's.
Another notable difference is how they are operated. In both, the WRITE operation is synchronous(data is written to ram only happens at rising edge of clock). But for the READ operation, distributed RAM is asynchronous (data is read from memory as soon as the address is given, doesn't wait for the clock edge) and block RAM is synchronous.

How to tell XST which type of RAM you want to use?

When you declare a RAM in your code, XST(Xilinx synthesizer tool) may implement it as either block RAM or distributed RAM. But if you want, you can force the implementation style to use block RAM or distributed RAM resources. This is done using the ram_style constraint. See the following code to understand how it is done:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity ram_example is
port (Clk : in std_logic;
address : in integer;
we : in std_logic;
data_i : in std_logic_vector(7 downto 0);
data_o : out std_logic_vector(7 downto 0)
);
end ram_example;

architecture Behavioral of ram_example is

--Declaration of type and signal of a 256 element RAM
--with each element being 8 bit wide.
type ram_t is array (0 to 255) of std_logic_vector(7 downto 0);
signal ram : ram_t := (others => (others => '0'));

begin

--process for read and write operation.
PROCESS(Clk)
BEGIN
if(rising_edge(Clk)) then
if(we='1') then
ram(address) <= data_i;
end if;
data_o <= ram(address);
end if;
END PROCESS;

end Behavioral;

The above code declares and defines a single port RAM. Code is also written to specify how the read and write process is implemented. when we synthesis this design, XST uses the block RAM resources by default for implementing the memory. In certain cases you may want to change it. For instance, if I want the memory to be implemented using distributed RAM then add the following two lines before the begin statement in the architecture section:

attribute ram_style: string;

attribute ram_style of ram : signal is "distributed";

Here ram is the signal name. By changing the word distributed to block we can force XST to use block RAM resources. The default value of the attribute ram_style is Auto.

Notes:- The code was synthesized successfully using Xilinx Webpack version 12.1. The results may vary if you are using an older version of the XST.

Saturday, December 4, 2010

Tips for running a successful simulation in Xilinx ISim.

Though I have given enough examples for learning VHDL I didn't write much about using the software till now. In this article I will cover some basics about running your simulation in Xilinx ISim. This article will point out some basic mistakes people do when simulating their code in ISim.
For explaining, I have just used one of my earlier example in the post : Explaining testbench code using a counter design. Lets go step by step, see the images for easier understanding of the steps. Open the images in a new tab in your browser if they are not clear enough.

1)Once the coding is done( I mean both the testbench and the design to be tested) make sure you select the top entity(testbench code) in the Xilinx window as shown below. Many people just select any other file and click the compilation button.
Note down the red markings in the image below. Points to be noted are:

Choose View as simulation.
Select the top entity i.e. the testbench. If the wrong file is selected for simulation then the waveform in ISim will be blank and you will see no waveform.
Double click on the Behavioral check syntax for compiling the design or for finding out any syntax errors.
If the above step is successful then double click on Simulate Behavioral Model. If there are syntax errors in step 3 then you may have to check your code.

2)Now ISim will open in a new window with waveforms. Note down the toolbar at the bottom. Check the below image for knowing what each button does. You can also hover your mouse over the button and they will display the function of that button.

3)Mostly the signals in the wavforms will be displayed as binary numbers or integers. But you can change this basic setting. See the image below.

Click on the signal which you want to change the display format. Go to radix and then select the format. Some options available are Binary, Hexadecimal, octal etc. Note that depending on your code , you have to change the display format. My code was a counter, so unsigned decimal was the best format in this case.

4)Another interesting thing you can do is adding the internal signals to the waveform which is not displayed by default. By default ISim displays only the signals which are declared in the testbench code. But if there are many sub entities then you may need to see them for debugging purpose. See the image below for how to do it.

Go to the Instance and process names on the left side of the ISim window.
Select the Instance name whose internal signals you want to observe.
All the signals declared in that particular instance will be displayed on the immediate right tab now, under simulation objects.
Now select the signals you want to display. You can use keyboard short cuts like shift and Ctrl for selecting multiple signal names.
Now drag and drop these select signals into the immediate right tab under signal Name in waveform window.
For updating these signal values you have to restart the simulation and run it again.

5)You may have noticed that in ISim, all the additional signals you added in step (4) are reset when you close the ISim window. This is little bit annoying since you have to add all the internal signals again. But you need not worry about it. Follow the steps:

Add the required signals into the waveform as described in step 4.
Save the waveform file by clicking, Ctrl + S . Give an appropriate name to the wave file.
Now close ISim and go back to the Xilinx ISE window.
Right click on simulate behavioral model.
Choose the option Process properties.
A new window will open as shown below in the image.
Select the check box, Use custom waveform configuration file.
Choose the waveform file you just saved in the custom waveform configuration file.

Thats it for now. Hope these explain the things better. Thanks.

Friday, August 13, 2010

FPGA: Tips to Reduce Power Consumption in RAM's

I want to discuss some points which will be helpful for reducing the power consumed in an FPGA. I will be particularly focusing on the power dissipation caused by the RAM's. These points are selected from the Xilinx white paper for Virtex-5 system power design considerations. But I will note down the points which will apply for any Xilinx FPGA.

Types of Power consumption in FPGA:

There are two primary types of power consumption in FPGA's: static and dynamic.

Static power is consumed due to transistor leakage while Dynamic power is consumed by toggling nodes as a function of voltage, frequency, and capacitance. The leakage current is directly proportional to the speed of the processor, operating voltage of the processor and junction(or die) temperature.

So static power increases from Virtex 4 FPGA to Virtex 5 FPGA. On the other hand dynamic power reduces from Virtex 4(90 nm device) to Virtex 5(65 nm device). This is because dynamic power is directly proportional to the voltage of operation and the capacitance (this includes the transistor parasitic capacitance and metal interconnect capacitance). From Virtex 4 to Virtex 5, these two parameters decrease and so we get around 40 % reduction in dynamic power.

You can get more details from the pdf link I had shared above.

Tips to reduce power consumption:

Xilinx has given some tips in reducing the power consumption by designing RAM's intelligently.

1. Choose the right RAM primitive for your design.

When choosing a RAM organization within the target architecture, the width, depth and functionality must be considered. Choosing the right memory facilitates the selection of the most power-efficient resource for the end design.

2. Ensure that, the block RAM is only enabled when data is needed from it.

This is because the power requirements of a block RAM is directly proportional to the amount of time it is enabled. Normally for ease of coding the enable signal is always "ON". But for power sensitive applications take some extra effort to make use of enable signal of RAM.

Another tip regarding enable signal is explained in the following example. Say you want a 2k x 8 bit RAM in your design. Then use four 512 x 8 bit RAM's for this. Now use a separate enable signal for each RAM. This needs some extra logic, but at any time only one RAM will be ON , so we can save around 75% of the power.

3. Ensure the WRITE_MODE of RAM is set properly.

If the block RAM contents are never read during a write, the RAM power can be reduced by a significant amount with the selection of the NO_CHANGE mode rather than the default WRITE_FIRST mode. This mode can be set easily if you are using the core generator GUI to create the RAM module.

Note:- It would be wise to see the Xilinx white paper on power reduction for your particular FPGA before start the coding for your design.This will be helpful in getting useful tips which are device specific.

Friday, March 12, 2010

Exploring Your VHDL Design: Leveraging RTL and Technology Schematics in Xilinx ISE

THIS ARTICLE IS APPLICABLE ONLY FOR XILINX ISE TOOL

This tutorial introduces some cool features of Xilinx ISE synthesis tool. I assume you're familiar with starting a project in Xilinx ISE and running simulations. If not, watch How to compile and simulate a VHDL code using Xilinx ISE first before continuing with this article.

Once you've successfully done functional simulation on your design, the next step would be to test it on a real fpga. The compiled VHDL code need to be translated or mapped into the resources available in the fpga. This is known as synthesis and can be done using Xilinx Synthesis Technology (XST), included in the Xilinx ISE software. XST generates Xilinx-specific net-list files known as NGC files. It's important to note that NGC files may vary even for the same VHDL code, depending on the chosen family and device. Each NGC file consists of two parts: logical design data and constraints.

Let's see how we can synthesis our design. Copy the VHDL code for a 4-bit counter with a reset input from here. Add this code to your Xilinx ISE project and click on Synthesis-XST to start the synthesis process. Once the design is synthesised successfully without any errors, a report will be generated with the following information.

View Synthesis Report:

To obtain the overall synthesis report, click on View Synthesis Report. This opens a file in a new tab with the extension .syr. From this file, you can gather the following information:

Synthesis Options Summary
HDL Compilation
Design Hierarchy Analysis
HDL Analysis
HDL Synthesis

HDL Synthesis Report

Advanced HDL Synthesis

Advanced HDL Synthesis Report

Low Level Synthesis
Partition Report
Final Report

Device utilization summary
Partition Resource Summary
Timing Summary

Additionally, synthesis results can be visualized by viewing two types of schematics:

View RTL schematic:

2) Click on View RTL schematic to see the RTL net-list generated by XST. The file opened will have the extension .ngr and is for viewing only. It provides a schematic representation of the pre-optimized design using generic symbols independent of the targeted Xilinx device, such as adders, multipliers, counters, AND gates, and OR gates, etc. In this case, the .ngr schematic will resemble this:

You'll notice that the schematic doesn't display any hardware elements. To identify the hardware elements used by tool for implementing this logic, you need to view the View Technology Schematic.

View Technology Schematic:

Click on View Technology Schematic. This opens an image in a new tab with the extension .ngc. This file provides detailed information on the exact elements utilized in the FPGA chip.

Refer to the figure below:

Looking at the diagram above, you can see that your design is implemented using one 4-input LUT (Lookup Table), one 3-input LUT, one 2-input LUT, and 4 FDCs (D flip-flops with asynchronous clear input). Additionally, other elements utilized include input buffers, output buffers, inverters, and so on.

Next, double-click on any of the LUTs (let's say LUT2). This opens a new window, displaying the following details:

The schematic of the gate elements employed within that specific LUT.
The Truth Table illustrating the function implemented by the LUT.
The Karnaugh map corresponding to the truth table.

These schematics can often be used for understanding how your code translates into hardware. For those who wants to dive deeper into the field of FPGA's, this feature is a blessing. I remember feeling thrilled when I first encountered these schematics.

VHDL coding tips and tricks

Pages