VHDL coding tips and tricks: Behavior level model
Showing posts with label Behavior level model. Show all posts
Showing posts with label Behavior level model. Show all posts

Sunday, December 20, 2020

Image Processsing: RGB to Gray scale Converter in VHDL

    Implementing image processing algorithms in VHDL is a scary thing for many. Though I agree that its much more difficult to do it in VHDL than in a high level programming language like C, Matlab etc, it needn't be that scary.

    In this post I am going to share the code for a simple image processing algorithm - A RGB to Gray scale image converter. 

    There are many ways, from simple to complex, in which you can do this. I have done it in a way, which makes sense to me. Touching upon few topics related to this subject. For example reading the image data from a text file, storing it in RAM and accessing the data within the code and then manipulating them etc...

    I have used the standard Matlab image Lenna.bmp for this. The original image was 512*512*3 pixels in size. This takes a long time to load and run in Modelsim. So I first reduced its size to 1/8th of its original size, making it a 64*64*3 pixel image. Each pixel ranges from 0 to 255 and the dimension "3" indicates the presence of Red, Green and Blue components.

    VHDL text file operations aren't ideal for reading multiple pixels from the same row. So I converted the 3 Dimensional image data into a 1 Dimensional matrix. And then used dlmwrite Matlab command to write it to a text file named rgb.txt. This will be our input image.

    In Matlab, I manually converted the above RGB image into a gray scale image using the formula:

 Grayscale Image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).

    The above grayscale pixels were converted to a 1-D matrix and written to a text file called gray.txt. This text file would be read by our VHDL testbench to verify that the results from our VHDL design is the same as the ideal result obtained from Matlab.

The Matlab program which I used for achieving all this is shared below:


I=imread('Lenna.bmp');  %read the image into memory

I=imresize(I,1/8);  %reduce the size by 8 times.

%convert image to 1-D array and write it to a text file.

dlmwrite('rgb.txt',reshape(I,64*64*3,1,1));

I4=double(I);   %convert it to double format

%convert rgb pixels to gray manually as per formula.

for i=1:64

    for j=1:64

        I2(i,j) = I4(i,j,1)*0.3 + I4(i,j,2)*0.59 + I4(i,j,3)*0.11;

    end

end

%the converted image is changed to 1-D and then written to a text file.

I3=reshape(I2,64*64,1);

dlmwrite('gray.txt',I3);


    The standard RGB format Lenna image, the resized version of it and the converted Grayscale image are shared below:




    Now that we have prepared the above basic data to work with, we can go and explore the VHDL designs. There are three VHDL files in this project:

1) im_ram.vhd:

    The input RGB image is internally stored in a RAM. The RAM is declared and initialized with the pixel values from the text file rgb.txt. The std.textio library is used, to do the file reading operations.


--RAM entity for storing the image.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
--the below libraries are needed for reading text file.
library std;
use std.textio.all;

entity im_ram is
    generic(
        ADDR_WIDTH : integer := 16--Address bus size of the Image Ram.
        IM_SIZE_D1 : integer := 64--Size along Dimension 1
        IM_SIZE_D2 : integer := 64  --Size along Dimension 2
    );
    port (
        Clk : in std_logic;
        addr_in : in unsigned(ADDR_WIDTH-1 downto 0);   --Address bus to the Image Ram.
        rgb_out : out std_logic_vector(23 downto 0--24 bit RGB pixel output
    );
end im_ram;

architecture Behav of im_ram is

--custom array declaration.
type im_ram_type is array (0 to  IM_SIZE_D1*IM_SIZE_D2-1of std_logic_vector(23 downto 0);

--function for reading the image pixels from text file and use
--it to initialize the RAM.
impure function im_ram_initialize return im_ram_type is
    variable line_var : line;
    file text_var : text;
    variable pixel : integer;
    variable image_pixels : im_ram_type;
begin        
    --Open the file in read mode.
    file_open(text_var,"rgb.txt",read_mode);    
    while(NOT ENDFILE(text_var)) loop   --until end of file is reached
        for k in 1 to 3 loop    --through R, G and B.
            for j in 0 to IM_SIZE_D2-1 loop
                for i in 0 to IM_SIZE_D1-1 loop
                    readline(text_var,line_var);   --read one row. Each row contains one pixel.
                    read(line_var,pixel);   --From the read line, read the integer value.
                    --save the pixel in the RAM.
                    image_pixels(i*IM_SIZE_D2+j)(k*8-1 downto k*8-8) := std_logic_vector(to_unsigned(pixel,8));
                end loop;
            end loop;
        end loop;
    end loop;
    file_close(text_var); --close the file after reading.
    return image_pixels;    
end function;

--declare and initialize the image ram.
signal ram : im_ram_type := im_ram_initialize;

begin

--read the R,G and B pixels from RAM with the addr_in input.
rgb_out <= ram(to_integer(addr_in));

end architecture;


2) rgb2gray.vhd:

    This is the top level entity which converts the image from RGB format to Grayscale. The RAM block is instantiated as a component inside this entity. 


--Convert a internally stored RGB image into gray image.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity rgb2gray is
    generic(
        ADDR_WIDTH : integer := 16--Address bus size of the Image Ram.
        IM_SIZE_D1 : integer := 64--Size along Dimension 1
        IM_SIZE_D2 : integer := 64  --Size along Dimension 2
    );
    port (
        Clk : in std_logic;
        reset : in std_logic;   --active high asynchronous reset
        data_valid : out  std_logic;    --High when gray_out has valid output.
        gray_out : out unsigned(7 downto 0--8 bit gray pixel output
    );
end rgb2gray;

architecture Behav of rgb2gray is

    component im_ram is
        generic(
            ADDR_WIDTH : integer := 16--Address bus size of the Image Ram.
            IM_SIZE_D1 : integer := 64--Size along Dimension 1
            IM_SIZE_D2 : integer := 64  --Size along Dimension 2
        );
        port (
            Clk : in std_logic;
            addr_in : in unsigned(ADDR_WIDTH-1 downto 0);   --Address bus to the Image Ram.
            rgb_out : out std_logic_vector(23 downto 0--24 bit RGB pixel output
        );
    end component;

    signal rgb_out : std_logic_vector(23 downto 0);
    signal addr_in : unsigned(ADDR_WIDTH-1 downto 0);

begin

    --Instantiation of Image RAM. Internally stored image.
    image_ram : im_ram generic map(ADDR_WIDTH,  IM_SIZE_D1 ,IM_SIZE_D2)
        port map(Clk, addr_in, rgb_out);

    --Process to convert RGB to Gray image.
    CONVERTER_PROC : process(Clk,reset)
        --temperary variables
        variable temp1,temp2,temp3,temp4 : unsigned(15 downto 0);
    begin
        if(reset = '1'then    --active high asynchronous reset
            addr_in <= (others => '0');
            data_valid <= '0';
        elsif rising_edge(Clk) then
            --output is ready when the last address in the ram has reached.
            if(to_integer(addr_in) = IM_SIZE_D1*IM_SIZE_D2-1then  
                addr_in <= (others => '0');
                data_valid <= '0';
            else    --otherwise keep incrementing the address value.
                addr_in <= addr_in + 1;
                data_valid <= '1';  --indicates output is ready
            end if;
            --Gray pixel = 0.3*Red pixel + 0.59*Green pixel + 0.11*Blue pixel
            --the 24 bit value is split into R,G and B components and multiplied
            --with their respective weights and then added together.
            temp1 := "01001100" * unsigned(rgb_out(7 downto 0));        --(0.3 * R)  
            temp2 := "10010111" * unsigned(rgb_out(15 downto 8));       --(0.59 * G) 
            temp3 := "00011100" * unsigned(rgb_out(23 downto 16));  --(0.11 * B)
            temp4 := temp1 + temp2 + temp3;
            --Most significant bit of the LSB portion is added to the MSB portion. 
            --To round off the result.
            gray_out <= temp4(15 downto 8) + ("0000000" & temp4(7));
        end if;
    end process;

end architecture;


3) tb.vhd:

    This is the testbench for testing our main designs. The output image is received from the main design and compared with the actual result. Previously using Matlab, we have saved the actual result in a file called gray.txt .


--Testbench
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
--the below libraries are needed for reading text file.
library std;
use std.textio.all;

--Testbench entity is always empty.
entity tb is
end tb;

architecture sim of tb is

    --the generic parameters are declared and initialized as constants here.
    constant IM_SIZE_D1: integer := 64;
    constant IM_SIZE_D2: integer := 64--we have a 64*64 image.
    constant ADDR_WIDTH: integer := 12--64*64=4096 which needs 12 bits.
    
    --this array is used to store the output gray pixels.
    type image_type is array (1 to  IM_SIZE_D1, 1 to  IM_SIZE_D2) of integer;
    signal image_pixels : image_type := (others => (others => 0));

    --component declaration.
    component rgb2gray is
        generic(
            ADDR_WIDTH : integer := 16;
            IM_SIZE_D1 : integer := 64;
            IM_SIZE_D2 : integer := 64
        );
        port (
            Clk : in std_logic;
            reset : in std_logic;
            data_valid : out  std_logic;
            gray_out : out unsigned(7 downto 0)
        );
    end component;

    --temperory signal declarations.
    signal Clk,reset,data_valid : std_logic := '0';
    signal gray_out : unsigned(7 downto 0);
    constant Clk_period : time := 10 ns;    --clock period.

begin

    --generate the clock signal.
    Clk <= not Clk after Clk_period/2;

    --Instantiate the Unit under test.
    UUT : rgb2gray generic map(ADDR_WIDTH, IM_SIZE_D1, IM_SIZE_D2)
            port map(Clk, reset, data_valid, gray_out);


    --Process where we apply inputs, read outputs and verify the result.        
    STIMULUS_PROC : process
        variable i,j : integer := 1;    --loop indices.
        variable line_var : line;
        file text_var : text;
        variable pixel : integer;
        variable error : integer := 0;  --this value should be zero at the end of simulation.
        variable diff : image_type := (others => (others => 0));    
    begin
        reset <= '1';
        wait for Clk_period;
        reset <= '0';   --reset is applied for one clock cycle.
        wait until data_valid = '1';    --wait for valid data at the output port.
        while(data_valid = '1'loop
            wait until (falling_edge(Clk)); --sample outputs at the falling edge of clock
            image_pixels(i,j) <= to_integer(gray_out);  --save pixel as integer.
            --generate indices to save the pixels in the correct place.
            if(j = IM_SIZE_D2) then
                j := 1;
                if(i = IM_SIZE_D1) then
                    i := 1;
                else
                    i := i+1;
                end if;
            else
                j := j+1;
            end if
            wait until (rising_edge(Clk));  --pause until rising edge of the clock 
        end loop;
        --all output gray pixels are read. Activate reset again.
        reset <= '1';

        --Now check if the results are the same as in Matlab
        --Open the file in read mode. gray.txt contains pixels calculated using Matlab.
        file_open(text_var,"gray.txt",read_mode);   
        while(NOT ENDFILE(text_var)) loop   --until end of file is reached
            for j in 1 to IM_SIZE_D2 loop
                for i in 1 to IM_SIZE_D1 loop
                    readline(text_var,line_var);   --read one row. Each row contains one pixel.
                    read(line_var,pixel);
                    --calculate the difference between actual gray pixels from Matlab
                    --and pixel values from our rgb2gray VHDL design.
                    diff(i,j) := abs(image_pixels(i,j) - pixel);
                    --If the difference is 2 or more then, we take it as an error
                    --and increment the variable 'error' by 1.
                    if(diff(i,j) > 1then
                        error := error+1;
                    end if;
                    wait for 1 ns;  --pause for 1 ns. 
                end loop;
            end loop;
        end loop;
        file_close(text_var); --close the file after reading.

        wait;   --wait eternally after finishing simulation.
    end process;

end architecture;   --End of Testbench


    Note that IM_SIZE_D1 and IM_SIZE_D2 are the number of rows and columns respectively of the image matrix. You can read it as a short form for "Image size along dimension 1 (or) 2".

     In the testbench we have a variable matrix called "diff". This matrix contains the absolute value  of difference between gray image pixels, from Matlab and VHDL. A difference of "1" is considered okay, as binary multiplications can cause some loss of least significant bits. But more than a difference of "1" is considered as an error. We can see that at the end of simulation, we didn't get any "error". 

    There is no address output from the top level entity rgb2gray. In our design, I thought it was unnecessary, because the pixel outputs were coming out in a sequential manner. So the testbench "knew" where to place each output pixel.

    I hope this post was helpful for you and gave you some ideas on how to do image processing using VHDL.

    You can Download the VHDL codes, images etc from Here.


Saturday, November 28, 2020

Synthesizable Matrix Multiplier in VHDL

    Long back I had posted a simple matrix multiplier which works well in simulation but couldn't be synthesized. But many people had requested for synthesizable version of this code. So here we go.

    The design takes two matrices of 3 by 3 and outputs a matrix of 3 by 3. Each element is stored as unsigned 8 bits. This is not a generic multiplier, but if you watch the video explaining the code, you might be able to extend it to a different sized multiplier. 



    Each matrix has 9 elements, each of which is 8 bits in size. So I am passing the matrix as a 72 bit 1-Dimensional array in the design. The following table shows how the 2-D elements are mapped into the 1-D array.

Row

Column

Bit’s Position in 1-D array

0

0

7:0

0

1

15:8

0

2

23:16

1

0

31:24

1

1

39:32

1

2

47:40

2

0

55:48

2

1

63:56

2

2

71:64


Let me share the codes now...

matrix_mult.vhd:


--3 by 3 matrix multiplier. Each element of the matrix is 8 bit wide. 
--Inputs are called A and B and output is called C. 
--Each matrix has 9 elements each of which is 8 bit wide. So the inputs is 9*8=72 bit long.
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all; 

entity matrix_mult is
    port (  Clock: in std_logic; 
            reset : in std_logic;   --active high reset
            start : in std_logic;   --A '1' starts the matrix multiplication process.
            A,B : in unsigned(71 downto 0);
            C : out unsigned(71 downto 0);
            done : out std_logic    --a '1' indicates that multiplication is done and result is availble at C.
            );
end entity;

architecture Behav of matrix_mult is

type matType is array(0 to 2,0 to 2) of unsigned(7 downto 0);
signal matA, matB, matC : matType := (others => (others => X"00"));
type state_type is (init,do_mult,apply_outputs);
signal state : state_type := init;
signal i,j,k : integer := 0;

begin 

sm : process (Clock,reset)    --process implementing the state machine for multiplying the matrices.
variable temp : unsigned(15 downto 0) := (others => '0');
begin
    if(reset = '1') then
        state <= init;
        i <= 0;
        j <= 0;
        k <= 0;
        done <= '0';
        matA <= (others => (others => X"00"));
        matB <= (others => (others => X"00"));
        matC <= (others => (others => X"00"));
    elsif rising_edge(Clock) then
        case state is
            when init =>    --the matrices which are in a 1-D array are converted to 2-D matrices first.
                if(start = '1') then
                    for i in 0 to 2 loop    --run through the rows
                        for j in 0 to 2 loop    --run through the columns
                            matA(i,j) <= A((i*3+j+1)*8-1 downto (i*3+j)*8);
                            matB(i,j) <= B((i*3+j+1)*8-1 downto (i*3+j)*8);
                        end loop;
                    end loop;
                    state <= do_mult;
                end if;
            when do_mult =>
                temp := matA(i,k)*matB(k,j);
                matC(i,j) <= matC(i,j) + temp(7 downto 0);
                if(k = 2) then
                    k <= 0;
                    if(j = 2) then
                        j <= 0;
                        if (i= 2) then
                            i <= 0;
                            state <= apply_outputs;
                        else
                            i <= i + 1;
                        end if;
                    else
                        j <= j+1;
                    end if;        
                else
                    k <= k+1;
                end if;     
            when apply_outputs =>   --convert 3 by 3 matrix into a 1-D matrix.
                for i in 0 to 2 loop    --run through the rows
                    for j in 0 to 2 loop    --run through the columnss
                        C((i*3+j+1)*8-1 downto (i*3+j)*8) <= matC(i,j);
                    end loop;
                end loop;   
                done <= '1';
                state <= init;  
        end case;
    end if;
end process;
 
end architecture;


tb_matrix_mult.vhd:


--Testbench for testing the 3 by 3 matrix multiplier.
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.numeric_std.all; 

entity tb_matrix_mult is  --testbench entity is always empty. No input or output ports.
end entity;

architecture behav of tb_matrix_mult is

component matrix_mult is
    port (  Clock: in std_logic; 
            reset : in std_logic;   --active high reset
            start : in std_logic;   --A '1' starts the matrix multiplication process.
            A,B : in unsigned(71 downto 0);
            C : out unsigned(71 downto 0);
            done : out std_logic    --a '1' indicates that multiplication is done and result is availble at C.
            );
end component;

signal A,B,C : unsigned(71 downto 0);
signal Clock,reset, start, done : std_logic := '0';
type matType is array(0 to 2,0 to 2) of unsigned(7 downto 0);
signal matC : matType := (others => (others => X"00")); 

begin

matrix_multiplier : matrix_mult port map (Clock, reset, start, A,B, C,done);

--generate a 50Mhz clock for testing the design.
Clk_generator : process
begin
    wait for 10 ns;
    Clock <= not Clock;
end process;

apply_inputs : process
begin
    reset <= '1';
    wait for 100 ns;
    reset <= '0';
    wait for 20 ns;
    A <= X"09" & X"08" & X"07" & X"06" & X"05" & X"04" & X"03" & X"02" & X"01";
    B <= X"01" & X"09" & X"08" & X"07" & X"06" & X"05" & X"04" & X"03" & X"02";
    start <= '1';
    wait for 20 ns;
    start <= '0';
    wait until done = '1';
    --The result C should be (93,150,126,57,96,81,21,42,36)
    wait for 5 ns;
    for i in 0 to 2 loop --run through the rows
        for j in 0 to 2 loop --run through the columnss
            matC(i, j) <= C((i * 3 + j + 1) * 8 - 1 downto (i * 3 + j) * 8);
        end loop;
    end loop;
    wait;
end process;

end behav;


Simulation Results:

    The design was simulated successfully using Modelsim SE 10.4a version. Screenshots of the simulation waveform is shown below:




    Please let me know if you are unable to get the code to work or if its not synthesisable. Good luck with your projects. 

 

Thursday, November 2, 2017

VHDL code for a Dual Port RAM with Testbench

There are many types RAM. These differ in terms of the number of ports, synchronous or asynchronous modes of operation etc.

Here in this post, I have written the VHDL code for a simple Dual port RAM, with two ports 0 and 1. The writing is allowed to only one port, on the positive edge the clock. The reading is done from both the ports asynchronously, that means we don't have to wait for the clock signal to read from the memory.

I have fixed the RAM size as 16*8 bits, meaning 16 elements of 8 bits each. These specifications can be changed easily by altering few lines in the code.

VHDL code:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity dual_port_ram is
port(   clk: in std_logic; --clock
        wr_en : in std_logic;   --write enable for port 0
        data_in : in std_logic_vector(7 downto 0);  --Input data to port 0.
        addr_in_0 : in std_logic_vector(3 downto 0);    --address for port 0
        addr_in_1 : in std_logic_vector(3 downto 0);    --address for port 1
        port_en_0 : in std_logic;   --enable port 0.
        port_en_1 : in std_logic;   --enable port 1.
        data_out_0 : out std_logic_vector(7 downto 0);  --output data from port 0.
        data_out_1 : out std_logic_vector(7 downto 0)   --output data from port 1.
    );
end dual_port_ram;

architecture Behavioral of dual_port_ram is

--type and signal declaration for RAM.
type ram_type is array(0 to 15) of std_logic_vector(7 downto 0);
signal ram : ram_type := (others => (others => '0'));

begin

process(clk)
begin
    if(rising_edge(clk)) then
        --For port 0. Writing.
        if(port_en_0 = '1') then    --check enable signal
            if(wr_en = '1') then    --see if write enable is ON.
                ram(conv_integer(addr_in_0)) <= data_in;
            end if;
        end if;
    end if;
end process;

--always read when port is enabled.
data_out_0 <= ram(conv_integer(addr_in_0)) when (port_en_0 = '1') else
            (others => 'Z');
data_out_1 <= ram(conv_integer(addr_in_1)) when (port_en_1 = '1') else
            (others => 'Z');
            
end Behavioral;

Testbench code:

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
 
ENTITY tb IS
END tb;
 
ARCHITECTURE behavior OF tb IS 
 
    -- Component Declaration for the Unit Under Test (UUT)
    COMPONENT dual_port_ram
    PORT(
         clk : IN  std_logic;
         wr_en : IN  std_logic;
         data_in : IN  std_logic_vector(7 downto 0);
         addr_in_0 : IN  std_logic_vector(3 downto 0);
         addr_in_1 : IN  std_logic_vector(3 downto 0);
         port_en_0 : IN  std_logic;
         port_en_1 : IN  std_logic;
         data_out_0 : OUT  std_logic_vector(7 downto 0);
         data_out_1 : OUT  std_logic_vector(7 downto 0)
        );
    END COMPONENT;
    

   --Inputs
   signal clk : std_logic := '0';
   signal wr_en : std_logic := '0';
   signal data_in : std_logic_vector(7 downto 0) := (others => '0');
   signal addr_in_0 : std_logic_vector(3 downto 0) := (others => '0');
   signal addr_in_1 : std_logic_vector(3 downto 0) := (others => '0');
   signal port_en_0 : std_logic := '0';
   signal port_en_1 : std_logic := '0';
    --Outputs
   signal data_out_0 : std_logic_vector(7 downto 0);
   signal data_out_1 : std_logic_vector(7 downto 0);
   -- Clock period definitions
   constant clk_period : time := 10 ns;
 
BEGIN
 
    -- Instantiate the Unit Under Test (UUT)
   uut: dual_port_ram PORT MAP (
          clk => clk,
          wr_en => wr_en,
          data_in => data_in,
          addr_in_0 => addr_in_0,
          addr_in_1 => addr_in_1,
          port_en_0 => port_en_0,
          port_en_1 => port_en_1,
          data_out_0 => data_out_0,
          data_out_1 => data_out_1
        );

   -- Clock process definitions
   clk_process :process
   begin
        clk <= '1';
        wait for clk_period/2;
        clk <= '0';
        wait for clk_period/2;
   end process;

   -- Stimulus process
   stim_proc: process
   begin        
        --these 4 lines shows that when port is not enabled, we cannot perform write or read operation.
        port_en_0 <= '0';
        wr_en <= '1';
        data_in <= X"FF";
        addr_in_0 <= X"1";  
        wait for 20 ns;
        --Write all the locations of RAM
        port_en_0 <= '1';   
      for i in 1 to 16 loop
            data_in <= conv_std_logic_vector(i,8);
            addr_in_0 <= conv_std_logic_vector(i-1,4);
            wait for 10 ns;
        end loop;
        wr_en <= '0';
        port_en_0 <= '0';   
        --Read from port 1, all the locations of RAM.
        port_en_1 <= '1';   
        for i in 1 to 16 loop
            addr_in_1 <= conv_std_logic_vector(i-1,4);
            wait for 10 ns;
        end loop;
        port_en_1 <= '0';   
        --Wait eternally.
      wait;
   end process;

END;

The code synthesised and simulated using Xilinx ISE 14.6 tool.

Simulation waveform:


Synthesis:

The synthesis was done for a Virtex 6 device. Note that, for this particular device the inbuilt BRAM's were not used to implement this RAM because of asynchronous read operations. Making it synchronous might use the available BRAM's.