VHDL coding tips and tricks: multipliers
Showing posts with label multipliers. Show all posts
Showing posts with label multipliers. Show all posts

Wednesday, October 25, 2017

How to create a Floating Point IP using CORE Generator on Xilinx ISE

As you learn VHDL, soon or later, you will do projects which require you to do operations on floating point(FP) numbers. In most of the programming languages, dealing with real numbers is as easy as dealing with integers. But not so in VHDL or Verilog.

If you want the design to be synthesisable, then the real numbers has to be stored in floating or fixed point format in hardware. Arithmetic operations on these numbers aren't so easy, if you have to write the code from scratch.

Fortunately, tools like Xilinx/Altera have IP's, which deals with these numbers. These IP's can be easily customized too. Xilinx grouped all these IP's under the CORE Generator system. In this post, I want to show you how to create a simple FP multiplier and how to simulate it.

Let's go step by step.

1) Create a new project in Xilinx ISE.

2) Right click on the design sources window, in ISE and click on "New Source".

3) Next select source type as IP(CORE Generator & Architecture Wizard). Enter the file name, and change the file location if needed. Click Next.

4) A new window will open up saying creating selector for specified device. After some moments, a window will show all the IP's available in coregen.
Click on + sign of Math Functions. 
Then click on + sign of Floating Point.
Select the normal Floating-Point IP and click Next.

5) Upon this, a window will be opened where you can customize the selected IP.  As you can see, the left side will show the IP symbol with input and output signals, and on right side, we have the parameters of IP which can be customized. Select the settings as shown below and then click Generate at the end.




 


So what kind of IP have we created here? Its a FP multiplier with 2 inputs and one output of single precision FP numbers. There is also a Clock input and RDY (Ready) output. A high on RDY signal shows that the current output is valid. 

6) Coregen will generate all the files which are required for simulating or synthesizing the IP in your project. Upon completion, the CORE Generator creates and adds to the project, a file called component_name.xco. This is a file that records all the customization parameters used to create the core and the project options in effect when the core was generated. Double clicking on this file, will open the customization window, and you will be able to edit the IP later on if needed.

7) When the View mode is Implementation, select the .xco file, and then you can see various options under the Design tab down. Double click on the View HDL Instantiation Template. A file named component_name.vho will open, which shows the component declaration. This is useful, if we want to instantiate the IP later on.


8) Now we want to test this IP. For that purpose, I have written a testbench. Create a New Source in the same project, and copy and paste the following code in there. You can name the file as tb.

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;

ENTITY tb IS
END tb;

ARCHITECTURE behavior OF tb IS 

    -- Component Declaration for the Unit Under Test (UUT) 
    COMPONENT ip_test
    PORT(
         a : IN  std_logic_vector(31 downto 0);
         b : IN  std_logic_vector(31 downto 0);
         clk : IN  std_logic;
         result : OUT  std_logic_vector(31 downto 0);
         rdy : OUT  std_logic
        );
    END COMPONENT;
   
   --Inputs
   signal a : std_logic_vector(31 downto 0) := (others => '0');
   signal b : std_logic_vector(31 downto 0) := (others => '0');
   signal clk : std_logic := '0';
    --Outputs
   signal result : std_logic_vector(31 downto 0);
   signal rdy : std_logic;
   -- Clock period definitions
   constant clk_period : time := 10 ns;

BEGIN

    -- Instantiate the Unit Under Test (UUT)
   uut: ip_test PORT MAP (
          a => a,
          b => b,
          clk => clk,
          result => result,
          rdy => rdy
        );

   -- Clock process definitions
   clk_process :process
   begin
        clk <= '0';
        wait for clk_period/2;
        clk <= '1';
        wait for clk_period/2;
   end process;


   -- Stimulus process
   stim_proc: process
   begin        
      a <= x"4148F5C3";  --12.56
        b <= x"42C80000";  --100.0
        --result should be 1256 = x"449D0000"
      wait;
   end process;

END;

9) Double click on Simulate Behavioral Model. The ISIM window will open with the simulation waveform. The waveform should look like this:


Analysis:

The inputs applied were 12.56 and 100.0. These needed to be converted to FP representation before. I used an online tool for this. The result was also confirmed using the same tool.

Have you noticed the 6 clock cycle delay between the input and output? This is called latency of the design. If you remember, while creating the IP using coregen, we could have changed this value on Page 3. A lower latency will reduce the maximum clock frequency of the design and might also increase the usage of resources. Think carefully before changing the default value.

The tool used for this post is Xilinx ISE 14.6 and ISIM for simulation. 

Sunday, August 4, 2013

VHDL code for Wallace tree multiplication

A Wallace tree is an efficient hardware architecture for multiplying two integers. Compared to a normal multiplier Wallace tree multiplier is much faster.

In this post I have written the vhdl code for a 4 bit Wallace tree multiplier. The code uses three components.
a) half_adder  - Typical half adder module with 2 input bits and 2 ouputs - sum and carry.
b) full_adder -  Typical full adder module with 3 input bits and 2 ouputs - sum and carry.
c) wallace4 -  4 bit wallace multiplier which uses half_adder and full_adder as components.

Half_adder code:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity half_adder is
    Port ( a : in  STD_LOGIC;
           b : in  STD_LOGIC;
           sum : out  STD_LOGIC;
           carry : out  STD_LOGIC);
end half_adder;

architecture Behavioral of half_adder is

begin

sum <= a xor b;
carry <= a and b;

end Behavioral;

Full_adder code:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity full_adder is
    Port ( a : in  STD_LOGIC;
           b : in  STD_LOGIC;
           c : in  STD_LOGIC;
           sum : out  STD_LOGIC;
           carry : out  STD_LOGIC);
end full_adder;

architecture Behavioral of full_adder is

begin

sum <= (xor b xor c);
carry <= (and b) xor (and (xor b));

end Behavioral;

Wallace4 code:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

entity wallace4 is
    Port ( A : in  STD_LOGIC_VECTOR (3 downto 0);
           B : in  STD_LOGIC_VECTOR (3 downto 0);
           prod : out  STD_LOGIC_VECTOR (7 downto 0));
end wallace4;

architecture Behavioral of wallace4 is

component full_adder is
    Port ( a : in  STD_LOGIC;
           b : in  STD_LOGIC;
           c : in  STD_LOGIC;
           sum : out  STD_LOGIC;
           carry : out  STD_LOGIC);
end component;

component half_adder is
    Port ( a : in  STD_LOGIC;
           b : in  STD_LOGIC;
           sum : out  STD_LOGIC;
           carry : out  STD_LOGIC);
end component;

signal s11,s12,s13,s14,s15,s22,s23,s24,s25,s26,s32,s33,s34,s35,s36,s37 : std_logic;
signal c11,c12,c13,c14,c15,c22,c23,c24,c25,c26,c32,c33,c34,c35,c36,c37 : std_logic;
signal p0,p1,p2,p3 : std_logic_vector(6 downto 0);

begin

process(A,B)
begin
    for i in 0 to 3 loop
        p0(i) <= A(i) and B(0);
        p1(i) <= A(i) and B(1);
        p2(i) <= A(i) and B(2);
        p3(i) <= A(i) and B(3);
    end loop;       
end process;
    
prod(0) <= p0(0);
prod(1) <= s11;
prod(2) <= s22;
prod(3) <= s32;
prod(4) <= s34;
prod(5) <= s35;
prod(6) <= s36;
prod(7) <= s37;

--first stage
ha11 : half_adder port map(p0(1),p1(0),s11,c11);
fa12 : full_adder port map(p0(2),p1(1),p2(0),s12,c12);
fa13 : full_adder port map(p0(3),p1(2),p2(1),s13,c13);
fa14 : full_adder port map(p1(3),p2(2),p3(1),s14,c14);
ha15 : half_adder port map(p2(3),p3(2),s15,c15);

--second stage
ha22 : half_adder port map(c11,s12,s22,c22);
fa23 : full_adder port map(p3(0),c12,s13,s23,c23);
fa24 : full_adder port map(c13,c32,s14,s24,c24);
fa25 : full_adder port map(c14,c24,s15,s25,c25);
fa26 : full_adder port map(c15,c25,p3(3),s26,c26);

--third stage
ha32 : half_adder port map(c22,s23,s32,c32);
ha34 : half_adder port map(c23,s24,s34,c34);
ha35 : half_adder port map(c34,s25,s35,c35);
ha36 : half_adder port map(c35,s26,s36,c36);
ha37 : half_adder port map(c36,c26,s37,c37);

end Behavioral;


Note:- The code is synthesisable and works in simulation. Its been tested in Xilinx ISE 13.1, but should work with other tools as well.