VHDL coding tips and tricks: Matrix multiplication in VHDL

## Wednesday, March 24, 2010

### Matrix multiplication in VHDL

Here is a function for doing matrix multiplication in VHDL.
For storing matrix elements I have declared the following data types:

type t11 is array (0 to numcols1-1) of unsigned(15 downto 0);
type t1 is array (0 to numrows1-1) of t11;
type t22 is array (0 to numcols2-1) of unsigned(15 downto 0);
type t2 is array (0 to numrows2-1) of t22;
type t33 is array (0 to numcols3-1) of unsigned(31 downto 0);
type t3 is array (0 to numrows3-1) of t33;

Depending upon the size of your matrix you have to set the values numcols1,numcols2,numcols3,numrows1,numrows2,numrows3 etc.Here for valid matrix multiplication, numcols1 = numrows2.
For the resultant matrix, numrows3 = numrows1 and numcols3 = numcols2.

The function is given below:
function  matmul  ( a : t1; b:t2 ) return t3 is
variable i,j,k : integer:=0;
variable prod : t3:=(others => (others => (others => '0')));
begin
for i in 0 to numrows1-1 loop
for j in 0 to numcols2-1 loop
for k in 0 to numcols1-1 loop
prod(i)(j) := prod(i)(j) + (a(i)(k) * b(k)(j));
end loop;
end loop;
end loop;
return prod;
end matmul;
In the above function replace the names numrows1,numcols1,numcols2 etc with appropriate values.
For example if I want to multiply a 4*3 matrix with 3*5 matrix then :
numcols1=3, numcols2 =5 ,numcols3 = 5,numrows1=4 ,numrows2 =3 and numrows3=4.
So the type declarations will look like this:

type t11 is array (0 to 2) of unsigned(15 downto 0);
type t1 is array (0 to 3) of t11;
type t22 is array (0 to 4) of unsigned(15 downto 0);
type t2 is array (0 to 2) of t22;
type t33 is array (0 to 4) of unsigned(31 downto 0);
type t3 is array (0 to 3) of t33;

Note :- I have declared the elements of the matrix as unsigned 16 bit and for the product matrix as unsigned 32 bit.If you want to change the size of the operands you can easily do that  in the type declaration.This will not alter the function logic.

1. Many people have been asking for a testbench for the above program.Here is the testbench program.The following codes is for matrix multiplication between 4*3 and 3*5 matrices.The resulting matrix has size of 4*5.

First copy paste the below code and store it as mat_ply.vhd.This is the package file.

--package definition.
library IEEE;
use IEEE.STD_LOGIC_1164.all;
use ieee.numeric_std.all;

package mat_ply is

type t11 is array (0 to 2) of unsigned(15 downto 0);
type t1 is array (0 to 3) of t11; --4*3 matrix
type t22 is array (0 to 4) of unsigned(15 downto 0);
type t2 is array (0 to 2) of t22; --3*5 matrix
type t33 is array (0 to 4) of unsigned(31 downto 0);
type t3 is array (0 to 3) of t33; --4*5 matrix as output

function matmul ( a : t1; b:t2 ) return t3;

end mat_ply;

package body mat_ply is

function matmul ( a : t1; b:t2 ) return t3 is
variable i,j,k : integer:=0;
variable prod : t3:=(others => (others => (others => '0')));

begin
for i in 0 to 3 loop --(number of rows in the first matrix - 1)
for j in 0 to 4 loop --(number of columns in the second matrix - 1)
for k in 0 to 2 loop --(number of rows in the second matrix - 1)

prod(i)(j) := prod(i)(j) + (a(i)(k) * b(k)(j));

end loop;
end loop;
end loop;
return prod;
end matmul;

end mat_ply;

copy paste the below code and store it as test_mat.vhd.This is main module which is used to call the function.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.numeric_std.ALL;

library work;
use work.mat_ply.all;

entity test_mat is
port (clk : in std_logic;
a : in t1;
b : in t2;
prod : out t3
);
end test_mat;

architecture Behavioral of test_mat is
begin
process(clk)
begin
if(clk'event and clk='1') then
prod<=matmul(a,b); --function is called here.
end if;
end process;
end Behavioral;

Now comes the test bench code.Copy paste the below code and store it as mat_tb.vhd.

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;
library work;
use work.mat_ply.all;

ENTITY mat_tb IS
END mat_tb;

ARCHITECTURE behavior OF mat_tb IS
--signals declared and initialized to zero.
signal clk : std_logic := '0';
signal a : t1:=(others => (others => (others => '0')));
signal b : t2:=(others => (others => (others => '0')));
signal x: unsigned(15 downto 0):=(others => '0'); --temporary variable
signal prod : t3:=(others => (others => (others => '0')));
-- Clock period definitions
constant clk_period : time := 1 ns;

BEGIN
-- Instantiate the Unit Under Test (UUT)
uut: entity work.test_mat PORT MAP (clk,a,b,prod);

-- Clock process definitions
clk_process :process
begin
clk <= '0';
wait for clk_period/2;
clk <= '1';
wait for clk_period/2;
end process;

-- Stimulus process
stim_proc: process
begin
--first set of inputs..
a <= ((x,x+1,x+4),(x+2,x,x+1),(x+1,x+5,x),(x+1,x+1,x));
b <= ((x,x+1,x+4,x+2,x+7),(x,x+1,x+3,x+2,x+4),(x,x+2,x+3,x+4,x+5));
wait for 2 ns;
--second set of inputs can be given here and so on.
end process;

END;

Hope this helps..

1. so can i use this code to multiply 2 matrix (32*32).. and for the test bench i want to set inputs from my external SRAM and my internal SRAM ( i mean into fpga) because i have one matrix that gonna be transposed first then i m going to multiply my matrix by her transposed one

2. Could you please tell me how to test this code.
I want to multiply 3x3 matrix with a 3x1 matrix to obtain an output of 3x1.
I tried to test it but getting wrong answers.
Its urgent please let me know how to go about the pin assignments.

3. @swati : in your case, the function needs a little change.Because you are using a column matrix.
The function will look like this:

function matmul ( a : t1; b:t2 ) return t2 is
variable i,k : integer:=0;
variable prod : t2 :=(others => (others => '0'));
begin
for i in 0 to 2 loop
for k in 0 to 2 loop
prod(i) := prod(i) + (a(i)(k) * b(k));
end loop;
end loop;
return prod;
end matmul;

The type definitions will be like this:

type t11 is array (0 to 2) of unsigned(15 downto 0);
type t1 is array (0 to 2) of t11; --3*3 matrix
type t2 is array (0 to 2) of unsigned(15 downto 0); --3*1 matrix

The applied inputs in the testbench will look like this:
a <= ((1,2,3),(4,5,3),(7,2,1));
b <= (3,4,5);

4. Could you please tell me how to test this code.
I want to multiply 3x3 matrix with a 3x3 matrix to obtain an output of 3x3.
I tried to test it but getting wrong answers.
Its urgent please let me know how to go about the pin assignments.

how do you obtain:
The applied inputs in the testbench will look like this:
a <= ((1,2,3),(4,5,3),(7,2,1));
b <= (3,4,5);

5. @carlos : Send me your code to me via this form :
http://vhdlguru.blogspot.com/p/contact-me.html

1. please tell me vhdl code for 3*3 generalized matrix multiplication to show results on fpga.

6. I would like to know the How to take Matrix transpose in Verilog HDL, 4x4 and 8x8,
or e-mail to me : cavalli_italy_@gmail.com

HD.

7. The transpose is a simple concept.I will give you the VHDL function here.If you know Verilog basics then it would be easy for you to port it into Verilog.

function transpose ( a : t1 ) return t1 is
variable i,j : integer:=0;
variable prod : t3:=(others => (others => (others => '0')));
begin
for i in 0 to numrows-1 loop
for j in 0 to numcols-1 loop
prod(i)(j) := a(j)(i);
end loop;
end loop;
return prod;
end transpose;

Hope that helped.

8. could you please tell me the code to do a fully pipelined 4x5 multiplied by a 5x3 to get a 4x3 with the testbench included. please email it to me jroccurrie15@gmail.com

Thanks

1. Did you ever figure out the code for this? I have the same question.

9. @ vipin : could you write the whole program including inputs and outputports ,pls

10. @vipin: im use mat_ply.vhd and test_mat.vhd file,
here the error is in line 7 of test_mat.vhd,

the line is "use work.mat_ply.all;"

the error is "Declaration all can not be be made visible in this scope since design unit mat_ply is not a package."

it will be better to understand if u write the whole program,

11. The codes given is already tested.As I am busy with my job I am not able to update this blog properly. If you want any additional help with the code or tutorial you can pay me to do so.
Thanks.

12. can u please tell me how to write a vhdl code for inveting a rectangular matrix

13. @jayam : Please contact me for codes, which are not listed in my blog.If I have time, I will post them here or else I will charge a fee for the same, as I am working as a freelancer for vhdl coding.

14. could u tell how to multiply large matrix dimension eg 200*20 with 20 *10 then 200*10
i tried your code but resourse utilsation more.
how is the solution?

15. @DON : you have to pass the elements of the matrix one by one. This code will not work under such situations. You have to re write the entire logic.

1. how would the code to write element by element?

16. sir,
could u explain a little explanation about higher dimension. every time i wrote code in different method iob and dsp48 utilisation more.sir how to pass element of matrix one by one.why looping can't work
could u help me?

17. sir,

can you plz tell me how to add text io functions to your code as I want to use this code for multiplying 510*510 and 510*1020 size matrices.

I want to copy the simulation output to a file.

can you help me?

vidhi

18. @vidhi : see this post:

19. Can you help me with a matrix multiplication of (1x64) and (64 x 128)

20. SIR
I WANT TO MULTIPLY 3 MATRICES (XYZ) AT A TIME USING SYSTOLIC ARCHITECTURE, WHERE X AND Z ARE RECTANGULAR MATRICES AND Y IS A DIAGONAL SQUARE MATRIX . CAN U PLZ HELP ME . ITS URGENT...

21. can u tell me how to get the ieee.numeric_std.all

i'm working in max || plus and this library is not available ..
best regard ..

22. i have to implement a matrix multiplication of 3 matrices of 64x64 to find approximation coefficient of an image. Is it possible to implement matrix multiplication of these matrices in FPGA with VHDL coding?

23. Appreciation for nice Updates, I found something new and folks can get useful info about BEST ONLINE TRAINING

24. Great blog. I have a question on this particular implementation: this appears to be all sequential and doesn't implement any parallelism or pipelining, correct? From that, one couldn't expect much of an FPGA performance gain for this computation over a standard sequential processor. In fact, there are even faster strictly sequential algorithms (that use dynamic programming). Do you know of any resources for parallelized matrix multiplication?

25. Very nice VHDL code , could you please help to perform that matrices multiplication in the following equation :

u =-c*(a+q*I4/abs(c*x)+k*I4)*x -k*s ;
where :
c=1x4 , a=4x4 , q=1x1 , I4=4x4 , x=4x1 , k=1x1 , s=1x1 .

26. can you give me the code for QPSK Modulator, Complementary Code Keying, Direct-Sequence Spread Spectrum

27. how do i link the matmul function to test_mat.vhd file?

28. Can help me anyone how to generate a matrix for hamming code generator matrix. How to generate it.

29. i tried that code but why did the code couldn't compiled?

30. @vipin i need 3*3 with 3*3.. plz send me code

31. how to give the input matrices in the form of text file and get the output in form of text file

32. Do anybody know how to write VHDL code to find eigenvalues of a matrix.

33. Sir I have a doubt on how to take 3 matrices as input in vhdl. Can you give me the full code.