VHDL coding tips and tricks: core generator
Showing posts with label core generator. Show all posts
Showing posts with label core generator. Show all posts

Wednesday, October 25, 2017

How to create a Floating Point IP using CORE Generator on Xilinx ISE

As you learn VHDL, soon or later, you will do projects which require you to do operations on floating point(FP) numbers. In most of the programming languages, dealing with real numbers is as easy as dealing with integers. But not so in VHDL or Verilog.

If you want the design to be synthesisable, then the real numbers has to be stored in floating or fixed point format in hardware. Arithmetic operations on these numbers aren't so easy, if you have to write the code from scratch.

Fortunately, tools like Xilinx/Altera have IP's, which deals with these numbers. These IP's can be easily customized too. Xilinx grouped all these IP's under the CORE Generator system. In this post, I want to show you how to create a simple FP multiplier and how to simulate it.

Let's go step by step.

1) Create a new project in Xilinx ISE.

2) Right click on the design sources window, in ISE and click on "New Source".

3) Next select source type as IP(CORE Generator & Architecture Wizard). Enter the file name, and change the file location if needed. Click Next.

4) A new window will open up saying creating selector for specified device. After some moments, a window will show all the IP's available in coregen.
Click on + sign of Math Functions. 
Then click on + sign of Floating Point.
Select the normal Floating-Point IP and click Next.

5) Upon this, a window will be opened where you can customize the selected IP.  As you can see, the left side will show the IP symbol with input and output signals, and on right side, we have the parameters of IP which can be customized. Select the settings as shown below and then click Generate at the end.




 


So what kind of IP have we created here? Its a FP multiplier with 2 inputs and one output of single precision FP numbers. There is also a Clock input and RDY (Ready) output. A high on RDY signal shows that the current output is valid. 

6) Coregen will generate all the files which are required for simulating or synthesizing the IP in your project. Upon completion, the CORE Generator creates and adds to the project, a file called component_name.xco. This is a file that records all the customization parameters used to create the core and the project options in effect when the core was generated. Double clicking on this file, will open the customization window, and you will be able to edit the IP later on if needed.

7) When the View mode is Implementation, select the .xco file, and then you can see various options under the Design tab down. Double click on the View HDL Instantiation Template. A file named component_name.vho will open, which shows the component declaration. This is useful, if we want to instantiate the IP later on.


8) Now we want to test this IP. For that purpose, I have written a testbench. Create a New Source in the same project, and copy and paste the following code in there. You can name the file as tb.

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;

ENTITY tb IS
END tb;

ARCHITECTURE behavior OF tb IS 

    -- Component Declaration for the Unit Under Test (UUT) 
    COMPONENT ip_test
    PORT(
         a : IN  std_logic_vector(31 downto 0);
         b : IN  std_logic_vector(31 downto 0);
         clk : IN  std_logic;
         result : OUT  std_logic_vector(31 downto 0);
         rdy : OUT  std_logic
        );
    END COMPONENT;
   
   --Inputs
   signal a : std_logic_vector(31 downto 0) := (others => '0');
   signal b : std_logic_vector(31 downto 0) := (others => '0');
   signal clk : std_logic := '0';
    --Outputs
   signal result : std_logic_vector(31 downto 0);
   signal rdy : std_logic;
   -- Clock period definitions
   constant clk_period : time := 10 ns;

BEGIN

    -- Instantiate the Unit Under Test (UUT)
   uut: ip_test PORT MAP (
          a => a,
          b => b,
          clk => clk,
          result => result,
          rdy => rdy
        );

   -- Clock process definitions
   clk_process :process
   begin
        clk <= '0';
        wait for clk_period/2;
        clk <= '1';
        wait for clk_period/2;
   end process;


   -- Stimulus process
   stim_proc: process
   begin        
      a <= x"4148F5C3";  --12.56
        b <= x"42C80000";  --100.0
        --result should be 1256 = x"449D0000"
      wait;
   end process;

END;

9) Double click on Simulate Behavioral Model. The ISIM window will open with the simulation waveform. The waveform should look like this:


Analysis:

The inputs applied were 12.56 and 100.0. These needed to be converted to FP representation before. I used an online tool for this. The result was also confirmed using the same tool.

Have you noticed the 6 clock cycle delay between the input and output? This is called latency of the design. If you remember, while creating the IP using coregen, we could have changed this value on Page 3. A lower latency will reduce the maximum clock frequency of the design and might also increase the usage of resources. Think carefully before changing the default value.

The tool used for this post is Xilinx ISE 14.6 and ISIM for simulation. 

Monday, April 27, 2015

A simple image processing example in VHDL using Xilinx ISE

Unlike with Matlab, where image processing is such a simple task, VHDL can give you few sleepless nights, even for simple tasks. But once you know the basic initial steps, it would become much more easier.

Image processing in VHDL is big topic and its impossible to cover all the areas in a single post. What I try to do here is explain some of the basics with an example. 

An image is almost always a 2D matrix. But processing a 2D image in FPGA might not be a good idea. It might lead to excessive delays and resources. So we  convert the 2D image into a linear 1 D array. This data can be stored in a RAM or ROM. To get the most efficient memory module, its recommended that, we use the Block Memory Generator module available in coregen to do this.

In this example, I am going to read the pixels of an image(of size 3*4), stored in a ROM, and store the transpose of the image(of size 4*3) in a RAM. 

In brief the steps are:
  1. Create a .coe file with the image pixels data.
  2. Use coregen in Xilinx ISE to create a simple single port ROM of the required size and load the ROM with the data in steps 1.
  3. Use coregen in Xilinx ISE to create a simple single port RAM of the same size as ROM. 
  4. Write the code where both these RAM and ROM are initiated as components and a process is written to get the transpose of the image stored in ROM.
  5. To verify that the RAM contains the correct transposed image, read its contents one by one.
Lets go through the steps in detail now. I have used Xilinx ISE 13.1 for this. The device selected was xc6slx9-2csg324. These steps might be a little different for a different version of Xilinx, but remember that the underlying ideas are still the same.

If you have never used coregen, you might want to go through these examples, before proceeding.

1. Creating .coe file with image pixels:

    Open notepad and paste the following text.

memory_initialization_radix=10;
memory_initialization_vector=22,12,200,126,127,128,129,255,10,0,1,98;

Save the file as "bram_data.coe".

2. Create the ROM module:

Look at the screenshots posted below. They should be self explanatory. If a page of settings is missing below, then assume that they remain at their default values.



Click generate and coregen would create the necessary files for you.

3.Create the RAM module:

Once again look at the screenshots below.


4. VHDL code:

This code initiates the RAM and ROM created above and calculates the transpose of the input image. The code also acts as a testbench and reads the data from RAM, to verify the working of the design. 
Its not synthesisable because I have incorporated the functionalites of a testbench into this. But if you remove the testbench part its synthesisable.

The code is self explanatory with line by line comments.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity image_process is
end image_process;

architecture Behavioral of image_process is

COMPONENT image1
  PORT (
    clka : IN STD_LOGIC;
    addra : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
    douta : OUT STD_LOGIC_VECTOR(7 DOWNTO 0)
  );
END COMPONENT;

COMPONENT image2
  PORT (
    clka : IN STD_LOGIC;
    wea : IN STD_LOGIC_VECTOR(0 DOWNTO 0);
    addra : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
    dina : IN STD_LOGIC_VECTOR(7 DOWNTO 0);
    douta : OUT STD_LOGIC_VECTOR(7 DOWNTO 0)
  );
END COMPONENT;

signal done,clk : std_logic := '0';
signal wr_enable : STD_LOGIC_VECTOR(0 DOWNTO 0) := "0";
signal addr_rom,addr_ram : STD_LOGIC_VECTOR(3 DOWNTO 0) := (others => '0');
signal data_rom,data_in_ram,data_out_ram : STD_LOGIC_VECTOR(7 DOWNTO 0) := (others => '0');
signal row_index,col_index : integer := 0;

begin

--the original image of size 3*4 stored here in rom.
--[22,12,200,126,
--127,128,129,255,
--10,0,1,98]
image_rom : image1 port map(Clk,addr_rom,data_rom);
--the transpose of image1, of size 4*3, is stored here in ram.
--[22,127,10,
--12,128,0,
--200,129,1,
--126,255,98]
image_ram : image2 port map(Clk,wr_enable,addr_ram,data_in_ram,data_out_ram);

--generate the clock.
clk <= not clk after 5 ns;


--transpose the image1 into image2.
--To do this I have to store the pixel at location (a,b) into location (b,a).
process(clk)
begin
    if(falling_edge(clk)) then
        if(done = '0') then
            addr_rom <= addr_rom + "0001"; --start reading each pixel from rom
            
            --row and column index of the image.
            if(col_index = 3) then  --check if last column has reached
                col_index <= 0; --reset it to zero.
                if(row_index = 2) then --check if last row has reached.
                    row_index <= 0; --reset it to zero
                    done <= '1'; --the processing is done.
                else    
                    row_index <= row_index + 1; --increment row index.
                end if;
            else
                col_index <= col_index + 1; --increment column index.
            end if;     
            
            wr_enable <= "1"; --write enable for the RAM
            data_in_ram <= data_rom; --store the current read data from rom into ram.
            addr_ram <= conv_std_logic_vector((col_index*3 + row_index),4); --set the address for RAM.
        else
        --this segment reads the transposed image(data written into RAM).
            wr_enable <= "0";  --after processing write enable is disabled
            addr_rom <= "0000"; 
            if(addr_ram = "1011") then 
                addr_ram <= "0000";
            else
                addr_ram <= addr_ram + 1;
            end if; 
        end if; 
    end if;     
end process;    

end Behavioral;


5. Simulated waveform:

The design was simulated using Xilinx ISIM. The waveform should look like the following: