VHDL coding tips and tricks: Block and distributed RAM's on Xilinx FPGA's

Sunday, January 23, 2011

Block and distributed RAM's on Xilinx FPGA's

Distributed RAM's:

The configuaration logic blocks(CLB) in most of the Xilinx FPGA's contain small single port or double port RAM. This RAM is normally distributed throughout the FPGA than as a single block(It is spread out over many LUT's) and so it is called "distributed RAM". A look up table on a Xilinx FPGA can be configured as a 16*1bit RAM , ROM, LUT or 16bit shift register. This multiple functionality is not possible with Altera FPGA's.

For Spartan-3 series, each CLB contains upto 64 bits of single port RAM or 32 bits of dual port RAM. As indicated from the size, a single CLB may not be enough to implement a large memory. Also the most of this small RAM's have their input and output as 1 bit wide. For implementing larger and wider memory functions you can connect several distributed RAM's in parallel. Fortunately you need not know how these things are done, because the Xilinx synthesiser tool will infer what you want from your VHDL/ Verilog code and automatically does all this for you.

Block RAM's:

A block RAM is a dedicated (cannot be used to implement other functions like digital logic) two port memory containing several kilobits of RAM. Depending on how advance your FPGA is there may be several of them. For example Spartan 3 has total RAM, ranging from 72 kbits to 1872 kbits in size.While Spartan 6 devices have block RAM's of upto 4824 Kbits in size.

Difference between Distributed and Block RAM's:

As you can see from the definition distributed RAM, a large sized RAM is implemented using a parallel array of large number of elements. This makes distributed RAM, ideal for small sized memories. But when comes to large memories, this may cause a extra wiring delays.
But Block RAM's are fixed RAM modules which comes in 9 kbits or 18 kbits in size. If you implement a small RAM with a block RAM then its wastage of the rest of the space in RAM.
So use block RAM for large sized memories and distributed RAM for small sized memories or FIFO's.
Another notable difference is how they are operated. In both, the WRITE operation is synchronous(data is written to ram only happens at rising edge of clock). But for the READ operation, distributed RAM is asynchronous (data is read from memory as soon as the address is given, doesn't wait for the clock edge) and block RAM is synchronous.

How to tell XST which type of RAM you want to use?

When you declare a RAM in your code, XST(Xilinx synthesizer tool) may implement it as either block RAM or distributed RAM. But if you want, you can force the implementation style to use block RAM or distributed RAM resources. This is done using the ram_style constraint. See the following code to understand how it is done:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity ram_example is
port (Clk : in std_logic;
address : in integer;
we : in std_logic;
data_i : in std_logic_vector(7 downto 0);
data_o : out std_logic_vector(7 downto 0)
);
end ram_example;

architecture Behavioral of ram_example is

--Declaration of type and signal of a 256 element RAM
--with each element being 8 bit wide.
type ram_t is array (0 to 255) of std_logic_vector(7 downto 0);
signal ram : ram_t := (others => (others => '0'));

begin

--process for read and write operation.
PROCESS(Clk)
BEGIN
if(rising_edge(Clk)) then
if(we='1') then
ram(address) <= data_i;
end if;
data_o <= ram(address);
end if;
END PROCESS;

end Behavioral;

The above code declares and defines a single port RAM. Code is also written to specify how the read and write process is implemented. when we synthesis this design, XST uses the block RAM resources by default for implementing the memory. In certain cases you may want to change it. For instance, if I want the memory to be implemented using distributed RAM then add the following two lines before the begin statement in the architecture section:

attribute ram_style: string;

attribute ram_style of ram : signal is "distributed";

Here ram is the signal name. By changing the word distributed to block we can force XST to use block RAM resources. The default value of the attribute ram_style is Auto.

Notes:- The code was synthesized successfully using Xilinx Webpack version 12.1. The results may vary if you are using an older version of the XST.

13 comments:

UnknownFebruary 17, 2011 at 4:12 AM
Hi vhdlguru

nice article on RAM inference for Xilinx FPGAs!

I have built (around 2007) a block RAM generator for Xilinx FPGAs, termed mprfgen (multi-port register file generator). This is useful when you have to map a multi-port memory to multiple block RAMs.

This is a different problem of having just to map a large memory with a few ports (recent ISEs handle this properly).

You can grab it from here:
http://www.nkavvadias.co.cc/misc/mprfgen.zip

Best regards,
Nikolaos Kavvadias
ReplyDelete
Replies
UnknownSeptember 5, 2012 at 9:59 PM
Hi ,,

Wondering if you could give me some idea on how to implement a Built in Test in VHDL?
Is it just about read/write to mem locs and comparing the values ?

Thanks
Venkat
ReplyDelete
Replies
vishnuDecember 11, 2013 at 9:02 AM
plz can u provide vhdl code on ram read and write
ReplyDelete
Replies
blogzworldFebruary 3, 2014 at 3:35 PM
Is using block ram an efficient way to store registers, like control and data storage registers (16-bit).
ReplyDelete
Replies
UnknownMarch 5, 2014 at 8:40 PM
hi,
I have a small project on implementing my projects on a fpga using vhdl or Verilog.
can anyone give me ideas for some implementation of flashing custom kernel and ROMs on android phones ? The main idea of flashing custom kernel or custom ROMS is for flashing better kernels made to have a better power management.
So, if you have any idea based on all these to be done on FPGA, please let me know ( mail or just post here )
Thank you in advance!
ReplyDelete
Replies
UnknownAugust 7, 2014 at 11:05 AM
sir i am not getting how to reduce size of the code,written in FPGA language.please help me,because my simulator is not supporting for the 200MB code..
ReplyDelete
Replies
UnknownJanuary 29, 2015 at 6:41 PM
I like this blog and there are useful information thank you for sharing us such a good stuff.
Thank you
Best Gold Tips
ReplyDelete
Replies
Rehan MollickFebruary 24, 2015 at 4:38 PM
sir plz give me the vhdl code of adddress block for 128 point fft processor design
ReplyDelete
Replies
UnknownAugust 25, 2015 at 1:21 PM
Thanks for sharing.
Amazing piece of information you shared with this blog. This is just kind of information that I had been looking for. I really appreciate your writing.
gold tips advisor
ReplyDelete
Replies
UnknownOctober 20, 2015 at 2:39 PM
guys please rply to this ASAP...
i have defined a module named processing element and in it there is a BRAM which is having some initial data. An in the top module this processing element is being generated 64 times with same data fr every single processing element. bt in my design i want every single processing element to have bram with different data.. is there anyway to give different data to BRAM every time processing element is being instantiated....
ReplyDelete
Replies