Pages

Sunday, November 29, 2020

Writing a Gate Level VHDL design (and Testbench) from Scratch

    In this video I want to show you how you can take a logic circuit diagram and write the corresponding VHDL code along with its testbench. 




The VHDL codes presented in the video are given below:

xor_gate.vhd:


library ieee;
use ieee.std_logic_1164.all;

entity xor_gate is
    port (
        A,B : in std_logic;
        C : out std_logic
    );
end entity;


architecture gate_level of xor_gate is

signal An,Bn,t1,t2 : std_logic := '0';

begin

An <= not A;
Bn <= not B;
t1 <= An and B;
t2 <= Bn and A;

C <= t1 or t2;

end architecture;

tb_xor.vhd:


library ieee;
use ieee.std_logic_1164.all;

entity tb_xor is
end entity;

architecture behav of tb_xor is

component xor_gate is
    port (
        A,B : in std_logic;
        C : out std_logic
    );
end component;

signal A,B,C : std_logic := '0';

begin

UUT : xor_gate port map (A,B,C);

stimulus : process
begin
    A <= '0';
    B <= '0';
    wait for 100 ns;
    A <= '0';
    B <= '1';
    wait for 100 ns;
    A <= '1';
    B <= '0';
    wait for 100 ns;
    A <= '1';
    B <= '1';
    wait;
end process;    

end architecture;


The Logic circuit diagram is given below:





Simulation Waveform from Modelsim:






Saturday, November 28, 2020

Synthesizable Matrix Multiplier in VHDL

    Long back I had posted a simple matrix multiplier which works well in simulation but couldn't be synthesized. But many people had requested for synthesizable version of this code. So here we go.

    The design takes two matrices of 3 by 3 and outputs a matrix of 3 by 3. Each element is stored as unsigned 8 bits. This is not a generic multiplier, but if you watch the video explaining the code, you might be able to extend it to a different sized multiplier. 



    Each matrix has 9 elements, each of which is 8 bits in size. So I am passing the matrix as a 72 bit 1-Dimensional array in the design. The following table shows how the 2-D elements are mapped into the 1-D array.

Row

Column

Bit’s Position in 1-D array

0

0

7:0

0

1

15:8

0

2

23:16

1

0

31:24

1

1

39:32

1

2

47:40

2

0

55:48

2

1

63:56

2

2

71:64


Let me share the codes now...

matrix_mult.vhd:


--3 by 3 matrix multiplier. Each element of the matrix is 8 bit wide. 
--Inputs are called A and B and output is called C. 
--Each matrix has 9 elements each of which is 8 bit wide. So the inputs is 9*8=72 bit long.
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all; 

entity matrix_mult is
    port (  Clock: in std_logic; 
            reset : in std_logic;   --active high reset
            start : in std_logic;   --A '1' starts the matrix multiplication process.
            A,B : in unsigned(71 downto 0);
            C : out unsigned(71 downto 0);
            done : out std_logic    --a '1' indicates that multiplication is done and result is availble at C.
            );
end entity;

architecture Behav of matrix_mult is

type matType is array(0 to 2,0 to 2) of unsigned(7 downto 0);
signal matA, matB, matC : matType := (others => (others => X"00"));
type state_type is (init,do_mult,apply_outputs);
signal state : state_type := init;
signal i,j,k : integer := 0;

begin 

sm : process (Clock,reset)    --process implementing the state machine for multiplying the matrices.
variable temp : unsigned(15 downto 0) := (others => '0');
begin
    if(reset = '1') then
        state <= init;
        i <= 0;
        j <= 0;
        k <= 0;
        done <= '0';
        matA <= (others => (others => X"00"));
        matB <= (others => (others => X"00"));
        matC <= (others => (others => X"00"));
    elsif rising_edge(Clock) then
        case state is
            when init =>    --the matrices which are in a 1-D array are converted to 2-D matrices first.
                if(start = '1') then
                    for i in 0 to 2 loop    --run through the rows
                        for j in 0 to 2 loop    --run through the columns
                            matA(i,j) <= A((i*3+j+1)*8-1 downto (i*3+j)*8);
                            matB(i,j) <= B((i*3+j+1)*8-1 downto (i*3+j)*8);
                        end loop;
                    end loop;
                    state <= do_mult;
                end if;
            when do_mult =>
                temp := matA(i,k)*matB(k,j);
                matC(i,j) <= matC(i,j) + temp(7 downto 0);
                if(k = 2) then
                    k <= 0;
                    if(j = 2) then
                        j <= 0;
                        if (i= 2) then
                            i <= 0;
                            state <= apply_outputs;
                        else
                            i <= i + 1;
                        end if;
                    else
                        j <= j+1;
                    end if;        
                else
                    k <= k+1;
                end if;     
            when apply_outputs =>   --convert 3 by 3 matrix into a 1-D matrix.
                for i in 0 to 2 loop    --run through the rows
                    for j in 0 to 2 loop    --run through the columnss
                        C((i*3+j+1)*8-1 downto (i*3+j)*8) <= matC(i,j);
                    end loop;
                end loop;   
                done <= '1';
                state <= init;  
        end case;
    end if;
end process;
 
end architecture;


tb_matrix_mult.vhd:


--Testbench for testing the 3 by 3 matrix multiplier.
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all; 

entity tb_matrix_mult is  --testbench entity is always empty. No input or output ports.
end entity;

architecture behav of tb_matrix_mult is

component matrix_mult is
    port (  Clock: in std_logic; 
            reset : in std_logic;   --active high reset
            start : in std_logic;   --A '1' starts the matrix multiplication process.
            A,B : in unsigned(71 downto 0);
            C : out unsigned(71 downto 0);
            done : out std_logic    --a '1' indicates that multiplication is done and result is availble at C.
            );
end component;

signal A,B,C : unsigned(71 downto 0);
signal Clock,reset, start, done : std_logic := '0';
type matType is array(0 to 2,0 to 2) of unsigned(7 downto 0);
signal matC : matType := (others => (others => X"00")); 

begin

matrix_multiplier : matrix_mult port map (Clock, reset, start, A,B, C,done);

--generate a 50Mhz clock for testing the design.
Clk_generator : process
begin
    wait for 10 ns;
    Clock <= not Clock;
end process;

apply_inputs : process
begin
    reset <= '1';
    wait for 100 ns;
    reset <= '0';
    wait for 20 ns;
    A <= X"09" & X"08" & X"07" & X"06" & X"05" & X"04" & X"03" & X"02" & X"01";
    B <= X"01" & X"09" & X"08" & X"07" & X"06" & X"05" & X"04" & X"03" & X"02";
    start <= '1';
    wait for 20 ns;
    start <= '0';
    wait until done = '1';
    --The result C should be (93,150,126,57,96,81,21,42,36)
    wait for 5 ns;
    for i in 0 to 2 loop --run through the rows
        for j in 0 to 2 loop --run through the columnss
            matC(i, j) <= C((i * 3 + j + 1) * 8 - 1 downto (i * 3 + j) * 8);
        end loop;
    end loop;
    wait;
end process;

end behav;


Simulation Results:

    The design was simulated successfully using Modelsim SE 10.4a version. Screenshots of the simulation waveform is shown below:




    Please let me know if you are unable to get the code to work or if its not synthesisable. Good luck with your projects. 

 

Thursday, November 26, 2020

Signals and Variables in VHDL

    Every programming language has objects for storing values. VHDL too have them. Two of these object types are called Signals and Variables. They might look very similar for a beginner, but there are few fundamental differences between them. 

  • Variables are assigned using the := operator. And signals are assigned with the <= operator.
  • Variables can be declared and used only within a process/function/procedure but Signals can be declared and used anywhere.


A very fundamental difference:

    In a block of statements, the statements with variables immediately take their values. Very similar to how it works in programming languages like C. But in a group of statements with Signals on the left hand side, the signals does not take it's new value until the process has suspended (either hit the bottom or hit a wait statement).

This can be further explained with the following example scenario. 

Suppose I want to implement a swapping function in VHDL using Signals. 

I can simply write, 

signal x,y : std_logic := 0;
process(Clk)
begin
if(rising_edge(Clk)) then
  x <= y;
  y <= x;
end if;
end process;

    What happens above is that, though the 'x' is assigned the value of 'y' sequentially first, the new value isn't updated to 'x' until we "exit" the process. So from a practical point of looking at it, it looks like they happen in parallel. 

    Now if I have to use variables for implementing a swapping function, I need three statements. Like below:

process(Clk)
variable x,y,temp : std_logic := 0;
begin
if(rising_edge(Clk)) then
  temp := x;
  x := y;
  y := x;
end if;
end process;


Since variables take the values assigned to them right away, we need a temporary variable to hold the value of 'x' before assigning 'y' to it.

  • Variables declared in different processes cannot communicate with each other. They are local to the process. On the other hand signals declared in a VHDL entity can be used anywhere in the entity.
  • You cannot declare or use a Signal inside a VHDL Function. Functions are purely combinatorial in VHDL and thus you have to have use variables. 


    If you want the code to be synthesised, then beware of the consequences of using a variable. Variables often create latches when implemented on a FPGA and synthesis tools often pass a warning to notify. If not needed its good to avoid latches in your design. 

    Though using variables might seem make the work easier, it might not pass the synthesis stage. For many, who come to VHDL from a C background, using variables is very tempting.


Be easy with the use of Variables:

    Check this Matrix Multiplication code using Variables to see some of the dangers involved with them. Multiplication of two matrices requires a large number of multipliers and adders. In C, you would use some nested "for" loops to achieve this. And with the use of variables you can do the same thing in VHDL too like you can see from the link. 

    But using this same piece of code in a real FPGA is impossible to achieve. Either the design wont pass the synthesis stage or it will take days to get it done. 

    All those individual additions and multiplications gets done in "one" clock cycle. None of the adders and multipliers get reused and the loops get unfolded into a concatenated series of resources. 

    In such a case its necessary to use signals and spilt the whole operation over many clock cycles. This reduces the resource usage and more importantly you have a chance to get your design synthesised. 


Sunday, November 22, 2020

Simulating a VHDL/Verilog code using Modelsim SE

    This is a simple How-To video for ModelSim SE 10.4a version. If you are already familiar with this software tool then you may not need to watch this video.


In this video, I am trying to show you:
  1. How to create a new project in ModelSim SE.
  2. Add VHDL codes to this project.
  3. Compile and simulate the codes.
  4. Few tips on the simulation part of the tool.





Friday, November 6, 2020

Quaternary Signed Digit (QSD) Based Fast Adder In VHDL

    Quaternary Signed Digit is a base-4 number system where a number is represented by one of the following 7 digits : -3,-2,-1,0,1,2,3. The advantage of this number system is that it allows carry free addition, thus speeding up the addition process.

    Fast adders based on QSD are typical and there are several papers on this. In this post I have written the VHDL code for a 4 digit(each input being 12 bits) QSD adder. With a bit of editing, this code can be extended to handle larger input numbers.

    One thing to be careful about is that while checking online for information on QSD adders, I came upon several papers with some little mistakes here and there. Even though these typos are small, but it can take hours of your debugging time, as it did in my case. So I recommend cross checking any circuit diagram you see online across several references.

The Block diagram for the design is given below:




A QSD adder has two stages. 

In the first stage we perform operation on a single digit from each operand to form an intermediate carry and sum. The carry is 2 bit and can have one of the three values from -1 to +1. 
The sum is 3 bit and can have one of the 7 values from -3 to +3.

In the second stage, the intermediate carry and sum are simply added to form a single 3 bit sum which is between -3 to +3.

For an N digit QSD adder we have two input operands each N*3 bit in size. The Carry output is 2 bit in size and Sum output is N*3 bit in size. 

For a N digit QSD adder we need N carry-sum generators and N-1 adders. How these blocks are connected together are shown in the block diagram above.

The boolean equations for these blocks are available in Page 4 of the second pdf shared in this blog. But some of these equations are not correct. But the circuit diagram given in the page 5 of the same pdf is correct and you can refer it to form the correct boolean equations.

The carry sum generator can be better understood by looking at the Table 2 and 3 of the first pdf. And table 5 gives more clarity on how the second step adder is working.

The VHDL codes are given below:

First step : Carry Sum Generator

--QSD carry sum generator.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity QSD_cs_gen is
    Port ( A,B : in  signed(2 downto 0);
           S : out  signed(2 downto 0);
	   		C : out  signed(1 downto 0)	
	);
end QSD_cs_gen;

architecture Behavioral of QSD_cs_gen is

begin

process(A,B)
variable anot,bnot : signed(2 downto 0);
variable ss : signed(2 downto 0);
variable cc : signed(1 downto 0);
variable temp1,temp2,temp3,temp4,temp5 : std_logic;
begin
	anot := not A;
	bnot := not B;
	temp1 := not(A(1) or B(1));
	temp2 := A(2) and bnot(0);
	temp3 := B(2) and anot(0);
	temp4 := temp1 and (temp2 or temp3);
	cc(1) := (A(2) and B(2) and not(A(0) and B(0) and A(1) and B(1))) or temp4;
	cc(0) := cc(1) or ((anot(2) and bnot(2)) and 
			((A(1) and B(1)) or (B(1) and B(0)) or (B(1) and A(0)) or (B(0) and A(1)) or (A(1) and A(0))));

	ss(0) := A(0) xor B(0);
	ss(1) := A(1) xor B(1) xor (A(0) and B(0));
	temp1 := (ss(0) and (A(1) xor B(1)));
	temp2 := (B(2) and anot(1) and bnot(0));
	temp3 := (A(2) and bnot(1) and anot(0));
	temp4 := ( A(0) and B(0) and anot(1) and bnot(1) and (A(2) or B(2)) );
	temp5 := ( A(0) and B(0) and A(1) and B(1) and A(2) and B(2) );
	ss(2) := temp1 or temp2 or temp3 or temp4 or temp5;

	S <= ss;
	C <= cc;
end process;

end Behavioral;


Second step : Addition of Intermediate Carry and Sum


--QSD step 2: adder for adding intermediate carry and sum.
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity QSD_adder is
    Port ( A : in  signed(1 downto 0);
           B : in  signed(2 downto 0);
	   	   S : out  signed(2 downto 0)	
	);
end QSD_adder;

architecture Behavioral of QSD_adder is

begin

process(A,B)
variable sum : signed(2 downto 0);
variable temp1,temp2,temp3,temp4 : std_logic;
begin
	sum(0) := A(0) xor B(0);
	sum(1) := A(1) xor B(1) xor (A(0) and B(0));
	temp1 := A(1) and B(1);
	temp2 := A(1) xor B(1);
	temp3 := A(0) and B(0);
	temp4 := temp1 or (temp2 and temp3);
	sum(2) := A(1) xor B(2) xor temp4;
	S <= sum;
end process;

end Behavioral;


4 Digit QSD Adder:

--4 digit QSD adder. 
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity QSDAdder is
    Port ( A,B : in  signed(11 downto 0);
           Cout : out  signed(1 downto 0);
	   	   S : out  signed(11 downto 0)	
	);
end QSDAdder;

architecture Behavioral of QSDAdder is
 
component QSD_cs_gen is
    Port ( A,B : in  signed(2 downto 0);
           S : out  signed(2 downto 0);
	   		C : out  signed(1 downto 0)	
	);
end component;

component QSD_adder is
    Port ( A : in  signed(1 downto 0);
           B : in  signed(2 downto 0);
	   	   S : out  signed(2 downto 0)	
	);
end component;

signal S0,S1,S2,S3 : signed(2 downto 0);
signal C0,C1,C2,C3 : signed(1 downto 0);

begin

--First stage to QSD addition : The 4 carry-sum generators.
carry_sum_gen1 : QSD_cs_gen port map (
          A => A(2 downto 0),
          B => B(2 downto 0),
		  S => S(2 downto 0),
		  C => C0
        );

carry_sum_gen2 : QSD_cs_gen port map (
          A => A(5 downto 3),
          B => B(5 downto 3),
		  S => S1,
		  C => C1
        );

carry_sum_gen3 : QSD_cs_gen port map (
          A => A(8 downto 6),
          B => B(8 downto 6),
		  S => S2,
		  C => C2
        );

carry_sum_gen4 : QSD_cs_gen port map (
          A => A(11 downto 9),
          B => B(11 downto 9),
		  S => S3,
		  C => Cout
        );
 
--Second stage to QSD addition : The addition of intermediate carry's and sum's
adder1 : QSD_adder port map (
          A => C0,
          B => S1,
		  S => S(5 downto 3)
        );

adder2 : QSD_adder port map (
          A => C1,
          B => S2,
		  S => S(8 downto 6)
        );

adder3 : QSD_adder port map (
          A => C2,
          B => S3,
		  S => S(11 downto 9)
        );


end Behavioral;


Testbench for the 4 Digit QSD Adder:

--Testbench code which tests all combinations of inputs to a 4 digit QSD adder
library IEEE;
use IEEE.Std_logic_1164.all;
use IEEE.Numeric_Std.all;

entity QSDAdder_tb is
end;

architecture bench of QSDAdder_tb is

  component QSDAdder
      Port ( A,B : in  signed(11 downto 0);
             Cout : out  signed(1 downto 0);
  	   	    S : out  signed(11 downto 0)	
  	);
  end component;

  signal A,B: signed(11 downto 0);
  signal Cout: signed(1 downto 0);
  signal S: signed(11 downto 0) ;

	--A function to convert any length QSD number to a signed integer.
	function  qsd2int  ( A : SIGNED ) return signed is

	variable res : signed(31 downto 0) := (others => '0');
	variable num_digits : integer := (A'high+1)/3;
	variable temp : signed(31 downto 0) := (others => '0');
	variable ones : signed(31 downto 0) := (others => '1');
	variable zeros : signed(31 downto 0) := (others => '0');
	
	begin
	for i in 0 to num_digits-1 loop
		if(A(2+3*i) = '1') then  --this part is just does sign extension
			temp := ones(31 downto 3) & A(2+3*i downto 3*i);
		else
			temp := zeros(31 downto 3) & A(2+3*i downto 3*i);
		end if;
		res := res + shift_left(temp,2*i); --shift left and accumulate.
	end loop;
	return res;
	
	end qsd2int;
  
signal A_dec,B_dec,S_dec,S_act : signed(31 downto 0) := (others => '0');
signal error : integer := 0;

begin

  uut: QSDAdder port map ( A    => A,
                           B    => B,
                           Cout => Cout,
                           S    => S );

 --this is where we generate inputs to apply to the adder.
 --4 digits for one number. and we have two numbers. 
 --so 8 for-loops to generate all combination of values for all digits.
  stimulus: process
  begin
	wait for 5 ns;
	for i in -3 to 3 loop
  		for j in -3 to 3 loop
			for k in -3 to 3 loop
				for l in -3 to 3 loop
					A <= to_signed(i,3) & to_signed(j,3) & to_signed(k,3) & to_signed(l,3);
					for m in -3 to 3 loop
				  		for n in -3 to 3 loop
							for o in -3 to 3 loop
								for p in -3 to 3 loop
									B <= to_signed(m,3) & to_signed(n,3) & to_signed(o,3) & to_signed(p,3);	
									wait for 10 ns;
								end loop;									
							end loop;
						end loop;
					end loop;
				end loop;
			end loop;
		end loop;
	end loop;
	wait;
  end process;

--the outputs are checked here for error with actual sum.
check_results: process
variable A_dec1,B_dec1,S_dec1,S_act1 : signed(31 downto 0) := (others => '0');
begin
	for i in 1 to 7**8 loop  --7^8 total set of inputs.
		wait for 10 ns;
		A_dec1 := qsd2int(A);
		B_dec1 := qsd2int(B);
		--if carry out is -1 we subtract 256. or else we add if carry out is 1.
		if(Cout = "11") then  
			S_dec1 := qsd2int(S)-256;
		elsif(Cout = "01") then
			S_dec1 := qsd2int(S)+256;
		else  --carry out is zero.
			S_dec1 := qsd2int(S);
		end if;
		S_act1 := A_dec1+B_dec1;
		--if result from adder and actual sum doesnt match increment "error"
		if(S_dec1 /= S_act1) then
			error <= error+1;  
		end if;
		A_dec <= A_dec1;
		B_dec <= B_dec1;
		S_dec <= S_dec1;
		S_act <= S_act1;
	end loop;
	wait;
end process;

end;

    A bit of explanation on the VHDL codes:

    The first two codes, QSD_cs_gen and QSD_adder, are simply based on the boolean equations and circuit diagram presented in the second pdf. Its a gate level code. Note that I have broken the long equations into several lines by using temporary variables. This adds clarity as well as makes the code you write less prone to error.

    The third code, QSDAdder, is the 4 digit QSD adder, which connects the above two blocks in a structural level design.

    The fourth code, QSDAdder_tb, is the testbench for testing the functionality of our adder. This is relatively complicated compared to the other three blocks of code.

    Testbench has a function named qsd2int, which converts any QSD number into a signed number. Each digit of the QSD number is sign extended to 32 bits and then left shifted by a multiple of 2 before accumulatively adding to the result. Left shifting here simply means I am trying to multiply by 1,4,16,64 etc. based on the index of the digit.
    In the testbench I want to test the design for all the possible combinations of inputs. There are two 4 digit QSD numbers and each number has 7 possible values.  Which means that the number of sets of inputs is 7^(4+4) = 7^8 = 5764801. This is achieved in the process named stimulus.
    The resultant sum from the Adder module are compared with the actual result in another process named check_results. If there is a mismatch in this comparison, a variable named error is incremented by 1. The Adder is fully working, if by the end of the simulation error is still 0.

    VHDL codes and papers which I have referred to write the codes can be downloaded as a Zipped file from here

    Note that the Boolean equations in the second paper have some mistakes. But you can check the circuit diagram, which seems to be correct. Cross check with the VHDL codes if you are not sure. 

    The codes were simulated and tested successfully using Modelsim 10.4a.