VHDL coding tips and tricks: image processing

Implementing image processing algorithms in VHDL is a scary thing for many. Though I agree that its much more difficult to do it in VHDL than in a high level programming language like C, Matlab etc, it needn't be that scary.

In this post I am going to share the code for a simple image processing algorithm - A RGB to Gray scale image converter.

There are many ways, from simple to complex, in which you can do this. I have done it in a way, which makes sense to me. Touching upon few topics related to this subject. For example reading the image data from a text file, storing it in RAM and accessing the data within the code and then manipulating them etc...

I have used the standard Matlab image Lenna.bmp for this. The original image was 512*512*3 pixels in size. This takes a long time to load and run in Modelsim. So I first reduced its size to 1/8th of its original size, making it a 64*64*3 pixel image. Each pixel ranges from 0 to 255 and the dimension "3" indicates the presence of Red, Green and Blue components.

VHDL text file operations aren't ideal for reading multiple pixels from the same row. So I converted the 3 Dimensional image data into a 1 Dimensional matrix. And then used dlmwrite Matlab command to write it to a text file named rgb.txt. This will be our input image.

In Matlab, I manually converted the above RGB image into a gray scale image using the formula:

Grayscale Image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).

The above grayscale pixels were converted to a 1-D matrix and written to a text file called gray.txt. This text file would be read by our VHDL testbench to verify that the results from our VHDL design is the same as the ideal result obtained from Matlab.

The Matlab program which I used for achieving all this is shared below:

I=imread('Lenna.bmp'); %read the image into memory

I=imresize(I,1/8); %reduce the size by 8 times.

%convert image to 1-D array and write it to a text file.

dlmwrite('rgb.txt',reshape(I,64*64*3,1,1));

I4=double(I); %convert it to double format

%convert rgb pixels to gray manually as per formula.

for i=1:64

for j=1:64

I2(i,j) = I4(i,j,1)*0.3 + I4(i,j,2)*0.59 + I4(i,j,3)*0.11;

end

%the converted image is changed to 1-D and then written to a text file.

I3=reshape(I2,64*64,1);

dlmwrite('gray.txt',I3);

The standard RGB format Lenna image, the resized version of it and the converted Grayscale image are shared below:

Now that we have prepared the above basic data to work with, we can go and explore the VHDL designs. There are three VHDL files in this project:

1) im_ram.vhd:

The input RGB image is internally stored in a RAM. The RAM is declared and initialized with the pixel values from the text file rgb.txt. The std.textio library is used, to do the file reading operations.

--RAM entity for storing the image.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
--the below libraries are needed for reading text file.
library std;
use std.textio.all;

entity im_ram is
    generic(
        ADDR_WIDTH : integer := 16; --Address bus size of the Image Ram.
        IM_SIZE_D1 : integer := 64; --Size along Dimension 1
        IM_SIZE_D2 : integer := 64  --Size along Dimension 2
    );
    port (
        Clk : in std_logic;
        addr_in : in unsigned(ADDR_WIDTH-1 downto 0);   --Address bus to the Image Ram.
        rgb_out : out std_logic_vector(23 downto 0) --24 bit RGB pixel output
    );
end im_ram;

architecture Behav of im_ram is

--custom array declaration.
type im_ram_type is array (0 to  IM_SIZE_D1*IM_SIZE_D2-1) of std_logic_vector(23 downto 0);

--function for reading the image pixels from text file and use
--it to initialize the RAM.
impure function im_ram_initialize return im_ram_type is
    variable line_var : line;
    file text_var : text;
    variable pixel : integer;
    variable image_pixels : im_ram_type;
begin        
    --Open the file in read mode.
    file_open(text_var,"rgb.txt",read_mode);    
    while(NOT ENDFILE(text_var)) loop   --until end of file is reached
        for k in 1 to 3 loop    --through R, G and B.
            for j in 0 to IM_SIZE_D2-1 loop
                for i in 0 to IM_SIZE_D1-1 loop
                    readline(text_var,line_var);   --read one row. Each row contains one pixel.
                    read(line_var,pixel);   --From the read line, read the integer value.
                    --save the pixel in the RAM.
                    image_pixels(i*IM_SIZE_D2+j)(k*8-1 downto k*8-8) := std_logic_vector(to_unsigned(pixel,8));
                end loop;
            end loop;
        end loop;
    end loop;
    file_close(text_var); --close the file after reading.
    return image_pixels;    
end function;

--declare and initialize the image ram.
signal ram : im_ram_type := im_ram_initialize;

begin

--read the R,G and B pixels from RAM with the addr_in input.
rgb_out <= ram(to_integer(addr_in));

end architecture;

2) rgb2gray.vhd:

This is the top level entity which converts the image from RGB format to Grayscale. The RAM block is instantiated as a component inside this entity.

--Convert a internally stored RGB image into gray image.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity rgb2gray is
    generic(
        ADDR_WIDTH : integer := 16; --Address bus size of the Image Ram.
        IM_SIZE_D1 : integer := 64; --Size along Dimension 1
        IM_SIZE_D2 : integer := 64  --Size along Dimension 2
    );
    port (
        Clk : in std_logic;
        reset : in std_logic;   --active high asynchronous reset
        data_valid : out  std_logic;    --High when gray_out has valid output.
        gray_out : out unsigned(7 downto 0) --8 bit gray pixel output
    );
end rgb2gray;

architecture Behav of rgb2gray is

    component im_ram is
        generic(
            ADDR_WIDTH : integer := 16; --Address bus size of the Image Ram.
            IM_SIZE_D1 : integer := 64; --Size along Dimension 1
            IM_SIZE_D2 : integer := 64  --Size along Dimension 2
        );
        port (
            Clk : in std_logic;
            addr_in : in unsigned(ADDR_WIDTH-1 downto 0);   --Address bus to the Image Ram.
            rgb_out : out std_logic_vector(23 downto 0) --24 bit RGB pixel output
        );
    end component;

    signal rgb_out : std_logic_vector(23 downto 0);
    signal addr_in : unsigned(ADDR_WIDTH-1 downto 0);

begin

    --Instantiation of Image RAM. Internally stored image.
    image_ram : im_ram generic map(ADDR_WIDTH,  IM_SIZE_D1 ,IM_SIZE_D2)
        port map(Clk, addr_in, rgb_out);

    --Process to convert RGB to Gray image.
    CONVERTER_PROC : process(Clk,reset)
        --temperary variables
        variable temp1,temp2,temp3,temp4 : unsigned(15 downto 0);
    begin
        if(reset = '1') then    --active high asynchronous reset
            addr_in <= (others => '0');
            data_valid <= '0';
        elsif rising_edge(Clk) then
            --output is ready when the last address in the ram has reached.
            if(to_integer(addr_in) = IM_SIZE_D1*IM_SIZE_D2-1) then  
                addr_in <= (others => '0');
                data_valid <= '0';
            else    --otherwise keep incrementing the address value.
                addr_in <= addr_in + 1;
                data_valid <= '1';  --indicates output is ready
            end if;
            --Gray pixel = 0.3*Red pixel + 0.59*Green pixel + 0.11*Blue pixel
            --the 24 bit value is split into R,G and B components and multiplied
            --with their respective weights and then added together.
            temp1 := "01001100" * unsigned(rgb_out(7 downto 0));        --(0.3 * R)  
            temp2 := "10010111" * unsigned(rgb_out(15 downto 8));       --(0.59 * G) 
            temp3 := "00011100" * unsigned(rgb_out(23 downto 16));  --(0.11 * B)
            temp4 := temp1 + temp2 + temp3;
            --Most significant bit of the LSB portion is added to the MSB portion. 
            --To round off the result.
            gray_out <= temp4(15 downto 8) + ("0000000" & temp4(7));
        end if;
    end process;

end architecture;

3) tb.vhd:

This is the testbench for testing our main designs. The output image is received from the main design and compared with the actual result. Previously using Matlab, we have saved the actual result in a file called gray.txt .

--Testbench
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
--the below libraries are needed for reading text file.
library std;
use std.textio.all;

--Testbench entity is always empty.
entity tb is
end tb;

architecture sim of tb is

    --the generic parameters are declared and initialized as constants here.
    constant IM_SIZE_D1: integer := 64;
    constant IM_SIZE_D2: integer := 64; --we have a 64*64 image.
    constant ADDR_WIDTH: integer := 12; --64*64=4096 which needs 12 bits.
    
    --this array is used to store the output gray pixels.
    type image_type is array (1 to  IM_SIZE_D1, 1 to  IM_SIZE_D2) of integer;
    signal image_pixels : image_type := (others => (others => 0));

    --component declaration.
    component rgb2gray is
        generic(
            ADDR_WIDTH : integer := 16;
            IM_SIZE_D1 : integer := 64;
            IM_SIZE_D2 : integer := 64
        );
        port (
            Clk : in std_logic;
            reset : in std_logic;
            data_valid : out  std_logic;
            gray_out : out unsigned(7 downto 0)
        );
    end component;

    --temperory signal declarations.
    signal Clk,reset,data_valid : std_logic := '0';
    signal gray_out : unsigned(7 downto 0);
    constant Clk_period : time := 10 ns;    --clock period.

begin

    --generate the clock signal.
    Clk <= not Clk after Clk_period/2;

    --Instantiate the Unit under test.
    UUT : rgb2gray generic map(ADDR_WIDTH, IM_SIZE_D1, IM_SIZE_D2)
            port map(Clk, reset, data_valid, gray_out);


    --Process where we apply inputs, read outputs and verify the result.        
    STIMULUS_PROC : process
        variable i,j : integer := 1;    --loop indices.
        variable line_var : line;
        file text_var : text;
        variable pixel : integer;
        variable error : integer := 0;  --this value should be zero at the end of simulation.
        variable diff : image_type := (others => (others => 0));    
    begin
        reset <= '1';
        wait for Clk_period;
        reset <= '0';   --reset is applied for one clock cycle.
        wait until data_valid = '1';    --wait for valid data at the output port.
        while(data_valid = '1') loop
            wait until (falling_edge(Clk)); --sample outputs at the falling edge of clock
            image_pixels(i,j) <= to_integer(gray_out);  --save pixel as integer.
            --generate indices to save the pixels in the correct place.
            if(j = IM_SIZE_D2) then
                j := 1;
                if(i = IM_SIZE_D1) then
                    i := 1;
                else
                    i := i+1;
                end if;
            else
                j := j+1;
            end if; 
            wait until (rising_edge(Clk));  --pause until rising edge of the clock 
        end loop;
        --all output gray pixels are read. Activate reset again.
        reset <= '1';

        --Now check if the results are the same as in Matlab
        --Open the file in read mode. gray.txt contains pixels calculated using Matlab.
        file_open(text_var,"gray.txt",read_mode);   
        while(NOT ENDFILE(text_var)) loop   --until end of file is reached
            for j in 1 to IM_SIZE_D2 loop
                for i in 1 to IM_SIZE_D1 loop
                    readline(text_var,line_var);   --read one row. Each row contains one pixel.
                    read(line_var,pixel);
                    --calculate the difference between actual gray pixels from Matlab
                    --and pixel values from our rgb2gray VHDL design.
                    diff(i,j) := abs(image_pixels(i,j) - pixel);
                    --If the difference is 2 or more then, we take it as an error
                    --and increment the variable 'error' by 1.
                    if(diff(i,j) > 1) then
                        error := error+1;
                    end if;
                    wait for 1 ns;  --pause for 1 ns. 
                end loop;
            end loop;
        end loop;
        file_close(text_var); --close the file after reading.

        wait;   --wait eternally after finishing simulation.
    end process;

end architecture;   --End of Testbench

Note that IM_SIZE_D1 and IM_SIZE_D2 are the number of rows and columns respectively of the image matrix. You can read it as a short form for "Image size along dimension 1 (or) 2".

In the testbench we have a variable matrix called "diff". This matrix contains the absolute value of difference between gray image pixels, from Matlab and VHDL. A difference of "1" is considered okay, as binary multiplications can cause some loss of least significant bits. But more than a difference of "1" is considered as an error. We can see that at the end of simulation, we didn't get any "error".

There is no address output from the top level entity rgb2gray. In our design, I thought it was unnecessary, because the pixel outputs were coming out in a sequential manner. So the testbench "knew" where to place each output pixel.

I hope this post was helpful for you and gave you some ideas on how to do image processing using VHDL.

You can Download the VHDL codes, images etc from Here.

Unlike with Matlab, where image processing is such a simple task, VHDL can give you few sleepless nights, even for simple tasks. But once you know the basic initial steps, it would become much more easier.

Image processing in VHDL is big topic and its impossible to cover all the areas in a single post. What I try to do here is explain some of the basics with an example.

An image is almost always a 2D matrix. But processing a 2D image in FPGA might not be a good idea. It might lead to excessive delays and resources. So we convert the 2D image into a linear 1 D array. This data can be stored in a RAM or ROM. To get the most efficient memory module, its recommended that, we use the Block Memory Generator module available in coregen to do this.

In this example, I am going to read the pixels of an image(of size 3*4), stored in a ROM, and store the transpose of the image(of size 4*3) in a RAM.

In brief the steps are:

Create a .coe file with the image pixels data.
Use coregen in Xilinx ISE to create a simple single port ROM of the required size and load the ROM with the data in steps 1.
Use coregen in Xilinx ISE to create a simple single port RAM of the same size as ROM.
Write the code where both these RAM and ROM are initiated as components and a process is written to get the transpose of the image stored in ROM.
To verify that the RAM contains the correct transposed image, read its contents one by one.

Lets go through the steps in detail now. I have used Xilinx ISE 13.1 for this. The device selected was xc6slx9-2csg324. These steps might be a little different for a different version of Xilinx, but remember that the underlying ideas are still the same.

If you have never used coregen, you might want to go through these examples, before proceeding.

1. Creating .coe file with image pixels:

Open notepad and paste the following text.

memory_initialization_radix=10;

memory_initialization_vector=22,12,200,126,127,128,129,255,10,0,1,98;

Save the file as "bram_data.coe".

2. Create the ROM module:

Look at the screenshots posted below. They should be self explanatory. If a page of settings is missing below, then assume that they remain at their default values.

Click generate and coregen would create the necessary files for you.

3.Create the RAM module:

Once again look at the screenshots below.

4. VHDL code:

This code initiates the RAM and ROM created above and calculates the transpose of the input image. The code also acts as a testbench and reads the data from RAM, to verify the working of the design.

Its not synthesisable because I have incorporated the functionalites of a testbench into this. But if you remove the testbench part its synthesisable.

The code is self explanatory with line by line comments.

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity image_process is
end image_process;

architecture Behavioral of image_process is

COMPONENT image1
  PORT (
clka : IN STD_LOGIC;
addra : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
douta : OUT STD_LOGIC_VECTOR(7 DOWNTO 0)
  );
END COMPONENT;

COMPONENT image2
  PORT (
clka : IN STD_LOGIC;
wea : IN STD_LOGIC_VECTOR(0 DOWNTO 0);
addra : IN STD_LOGIC_VECTOR(3 DOWNTO 0);
dina : IN STD_LOGIC_VECTOR(7 DOWNTO 0);
douta : OUT STD_LOGIC_VECTOR(7 DOWNTO 0)
  );
END COMPONENT;

signal done,clk : std_logic := '0';
signal wr_enable : STD_LOGIC_VECTOR(0 DOWNTO 0) := "0";
signal addr_rom,addr_ram : STD_LOGIC_VECTOR(3 DOWNTO 0) := (others => '0');
signal data_rom,data_in_ram,data_out_ram : STD_LOGIC_VECTOR(7 DOWNTO 0) := (others => '0');
signal row_index,col_index : integer := 0;

begin

--the original image of size 3*4 stored here in rom.
--[22,12,200,126,
--127,128,129,255,
--10,0,1,98]
image_rom : image1 port map(Clk,addr_rom,data_rom);
--the transpose of image1, of size 4*3, is stored here in ram.
--[22,127,10,
--12,128,0,
--200,129,1,
--126,255,98]
image_ram : image2 port map(Clk,wr_enable,addr_ram,data_in_ram,data_out_ram);

--generate the clock.
clk <= not clk after 5 ns;

--transpose the image1 into image2.
--To do this I have to store the pixel at location (a,b) into location (b,a).
process(clk)
begin
  if(falling_edge(clk)) then
  if(done = '0') then
addr_rom <= addr_rom + "0001"; --start reading each pixel from rom

  --row and column index of the image.
  if(col_index = 3) then  --check if last column has reached
col_index <= 0; --reset it to zero.
  if(row_index = 2) then --check if last row has reached.
row_index <= 0; --reset it to zero
done <= '1'; --the processing is done.
  else
row_index <= row_index + 1; --increment row index.
  end if;
  else
col_index <= col_index + 1; --increment column index.
  end if;

wr_enable <= "1"; --write enable for the RAM
data_in_ram <= data_rom; --store the current read data from rom into ram.
addr_ram <= conv_std_logic_vector((col_index*3 + row_index),4); --set the address for RAM.
  else
  --this segment reads the transposed image(data written into RAM).
wr_enable <= "0";  --after processing write enable is disabled
addr_rom <= "0000";
  if(addr_ram = "1011") then
addr_ram <= "0000";
  else
addr_ram <= addr_ram + 1;
  end if;
  end if;
  end if;
end process;

end Behavioral;

5. Simulated waveform:

The design was simulated using Xilinx ISIM. The waveform should look like the following: