VHDL coding tips and tricks: 2020

Sunday, December 20, 2020

Image Processsing: RGB to Gray scale Converter in VHDL

    Implementing image processing algorithms in VHDL is a scary thing for many. Though I agree that its much more difficult to do it in VHDL than in a high level programming language like C, Matlab etc, it needn't be that scary.

    In this post I am going to share the code for a simple image processing algorithm - A RGB to Gray scale image converter. 

    There are many ways, from simple to complex, in which you can do this. I have done it in a way, which makes sense to me. Touching upon few topics related to this subject. For example reading the image data from a text file, storing it in RAM and accessing the data within the code and then manipulating them etc...

    I have used the standard Matlab image Lenna.bmp for this. The original image was 512*512*3 pixels in size. This takes a long time to load and run in Modelsim. So I first reduced its size to 1/8th of its original size, making it a 64*64*3 pixel image. Each pixel ranges from 0 to 255 and the dimension "3" indicates the presence of Red, Green and Blue components.

    VHDL text file operations aren't ideal for reading multiple pixels from the same row. So I converted the 3 Dimensional image data into a 1 Dimensional matrix. And then used dlmwrite Matlab command to write it to a text file named rgb.txt. This will be our input image.

    In Matlab, I manually converted the above RGB image into a gray scale image using the formula:

 Grayscale Image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).

    The above grayscale pixels were converted to a 1-D matrix and written to a text file called gray.txt. This text file would be read by our VHDL testbench to verify that the results from our VHDL design is the same as the ideal result obtained from Matlab.

The Matlab program which I used for achieving all this is shared below:


I=imread('Lenna.bmp');  %read the image into memory

I=imresize(I,1/8);  %reduce the size by 8 times.

%convert image to 1-D array and write it to a text file.

dlmwrite('rgb.txt',reshape(I,64*64*3,1,1));

I4=double(I);   %convert it to double format

%convert rgb pixels to gray manually as per formula.

for i=1:64

    for j=1:64

        I2(i,j) = I4(i,j,1)*0.3 + I4(i,j,2)*0.59 + I4(i,j,3)*0.11;

    end

end

%the converted image is changed to 1-D and then written to a text file.

I3=reshape(I2,64*64,1);

dlmwrite('gray.txt',I3);


    The standard RGB format Lenna image, the resized version of it and the converted Grayscale image are shared below:




    Now that we have prepared the above basic data to work with, we can go and explore the VHDL designs. There are three VHDL files in this project:

1) im_ram.vhd:

    The input RGB image is internally stored in a RAM. The RAM is declared and initialized with the pixel values from the text file rgb.txt. The std.textio library is used, to do the file reading operations.


--RAM entity for storing the image.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
--the below libraries are needed for reading text file.
library std;
use std.textio.all;

entity im_ram is
    generic(
        ADDR_WIDTH : integer := 16--Address bus size of the Image Ram.
        IM_SIZE_D1 : integer := 64--Size along Dimension 1
        IM_SIZE_D2 : integer := 64  --Size along Dimension 2
    );
    port (
        Clk : in std_logic;
        addr_in : in unsigned(ADDR_WIDTH-1 downto 0);   --Address bus to the Image Ram.
        rgb_out : out std_logic_vector(23 downto 0--24 bit RGB pixel output
    );
end im_ram;

architecture Behav of im_ram is

--custom array declaration.
type im_ram_type is array (0 to  IM_SIZE_D1*IM_SIZE_D2-1of std_logic_vector(23 downto 0);

--function for reading the image pixels from text file and use
--it to initialize the RAM.
impure function im_ram_initialize return im_ram_type is
    variable line_var : line;
    file text_var : text;
    variable pixel : integer;
    variable image_pixels : im_ram_type;
begin        
    --Open the file in read mode.
    file_open(text_var,"rgb.txt",read_mode);    
    while(NOT ENDFILE(text_var)) loop   --until end of file is reached
        for k in 1 to 3 loop    --through R, G and B.
            for j in 0 to IM_SIZE_D2-1 loop
                for i in 0 to IM_SIZE_D1-1 loop
                    readline(text_var,line_var);   --read one row. Each row contains one pixel.
                    read(line_var,pixel);   --From the read line, read the integer value.
                    --save the pixel in the RAM.
                    image_pixels(i*IM_SIZE_D2+j)(k*8-1 downto k*8-8) := std_logic_vector(to_unsigned(pixel,8));
                end loop;
            end loop;
        end loop;
    end loop;
    file_close(text_var); --close the file after reading.
    return image_pixels;    
end function;

--declare and initialize the image ram.
signal ram : im_ram_type := im_ram_initialize;

begin

--read the R,G and B pixels from RAM with the addr_in input.
rgb_out <= ram(to_integer(addr_in));

end architecture;


2) rgb2gray.vhd:

    This is the top level entity which converts the image from RGB format to Grayscale. The RAM block is instantiated as a component inside this entity. 


--Convert a internally stored RGB image into gray image.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity rgb2gray is
    generic(
        ADDR_WIDTH : integer := 16--Address bus size of the Image Ram.
        IM_SIZE_D1 : integer := 64--Size along Dimension 1
        IM_SIZE_D2 : integer := 64  --Size along Dimension 2
    );
    port (
        Clk : in std_logic;
        reset : in std_logic;   --active high asynchronous reset
        data_valid : out  std_logic;    --High when gray_out has valid output.
        gray_out : out unsigned(7 downto 0--8 bit gray pixel output
    );
end rgb2gray;

architecture Behav of rgb2gray is

    component im_ram is
        generic(
            ADDR_WIDTH : integer := 16--Address bus size of the Image Ram.
            IM_SIZE_D1 : integer := 64--Size along Dimension 1
            IM_SIZE_D2 : integer := 64  --Size along Dimension 2
        );
        port (
            Clk : in std_logic;
            addr_in : in unsigned(ADDR_WIDTH-1 downto 0);   --Address bus to the Image Ram.
            rgb_out : out std_logic_vector(23 downto 0--24 bit RGB pixel output
        );
    end component;

    signal rgb_out : std_logic_vector(23 downto 0);
    signal addr_in : unsigned(ADDR_WIDTH-1 downto 0);

begin

    --Instantiation of Image RAM. Internally stored image.
    image_ram : im_ram generic map(ADDR_WIDTH,  IM_SIZE_D1 ,IM_SIZE_D2)
        port map(Clk, addr_in, rgb_out);

    --Process to convert RGB to Gray image.
    CONVERTER_PROC : process(Clk,reset)
        --temperary variables
        variable temp1,temp2,temp3,temp4 : unsigned(15 downto 0);
    begin
        if(reset = '1'then    --active high asynchronous reset
            addr_in <= (others => '0');
            data_valid <= '0';
        elsif rising_edge(Clk) then
            --output is ready when the last address in the ram has reached.
            if(to_integer(addr_in) = IM_SIZE_D1*IM_SIZE_D2-1then  
                addr_in <= (others => '0');
                data_valid <= '0';
            else    --otherwise keep incrementing the address value.
                addr_in <= addr_in + 1;
                data_valid <= '1';  --indicates output is ready
            end if;
            --Gray pixel = 0.3*Red pixel + 0.59*Green pixel + 0.11*Blue pixel
            --the 24 bit value is split into R,G and B components and multiplied
            --with their respective weights and then added together.
            temp1 := "01001100" * unsigned(rgb_out(7 downto 0));        --(0.3 * R)  
            temp2 := "10010111" * unsigned(rgb_out(15 downto 8));       --(0.59 * G) 
            temp3 := "00011100" * unsigned(rgb_out(23 downto 16));  --(0.11 * B)
            temp4 := temp1 + temp2 + temp3;
            --Most significant bit of the LSB portion is added to the MSB portion. 
            --To round off the result.
            gray_out <= temp4(15 downto 8) + ("0000000" & temp4(7));
        end if;
    end process;

end architecture;


3) tb.vhd:

    This is the testbench for testing our main designs. The output image is received from the main design and compared with the actual result. Previously using Matlab, we have saved the actual result in a file called gray.txt .


--Testbench
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
--the below libraries are needed for reading text file.
library std;
use std.textio.all;

--Testbench entity is always empty.
entity tb is
end tb;

architecture sim of tb is

    --the generic parameters are declared and initialized as constants here.
    constant IM_SIZE_D1: integer := 64;
    constant IM_SIZE_D2: integer := 64--we have a 64*64 image.
    constant ADDR_WIDTH: integer := 12--64*64=4096 which needs 12 bits.
    
    --this array is used to store the output gray pixels.
    type image_type is array (1 to  IM_SIZE_D1, 1 to  IM_SIZE_D2) of integer;
    signal image_pixels : image_type := (others => (others => 0));

    --component declaration.
    component rgb2gray is
        generic(
            ADDR_WIDTH : integer := 16;
            IM_SIZE_D1 : integer := 64;
            IM_SIZE_D2 : integer := 64
        );
        port (
            Clk : in std_logic;
            reset : in std_logic;
            data_valid : out  std_logic;
            gray_out : out unsigned(7 downto 0)
        );
    end component;

    --temperory signal declarations.
    signal Clk,reset,data_valid : std_logic := '0';
    signal gray_out : unsigned(7 downto 0);
    constant Clk_period : time := 10 ns;    --clock period.

begin

    --generate the clock signal.
    Clk <= not Clk after Clk_period/2;

    --Instantiate the Unit under test.
    UUT : rgb2gray generic map(ADDR_WIDTH, IM_SIZE_D1, IM_SIZE_D2)
            port map(Clk, reset, data_valid, gray_out);


    --Process where we apply inputs, read outputs and verify the result.        
    STIMULUS_PROC : process
        variable i,j : integer := 1;    --loop indices.
        variable line_var : line;
        file text_var : text;
        variable pixel : integer;
        variable error : integer := 0;  --this value should be zero at the end of simulation.
        variable diff : image_type := (others => (others => 0));    
    begin
        reset <= '1';
        wait for Clk_period;
        reset <= '0';   --reset is applied for one clock cycle.
        wait until data_valid = '1';    --wait for valid data at the output port.
        while(data_valid = '1'loop
            wait until (falling_edge(Clk)); --sample outputs at the falling edge of clock
            image_pixels(i,j) <= to_integer(gray_out);  --save pixel as integer.
            --generate indices to save the pixels in the correct place.
            if(j = IM_SIZE_D2) then
                j := 1;
                if(i = IM_SIZE_D1) then
                    i := 1;
                else
                    i := i+1;
                end if;
            else
                j := j+1;
            end if
            wait until (rising_edge(Clk));  --pause until rising edge of the clock 
        end loop;
        --all output gray pixels are read. Activate reset again.
        reset <= '1';

        --Now check if the results are the same as in Matlab
        --Open the file in read mode. gray.txt contains pixels calculated using Matlab.
        file_open(text_var,"gray.txt",read_mode);   
        while(NOT ENDFILE(text_var)) loop   --until end of file is reached
            for j in 1 to IM_SIZE_D2 loop
                for i in 1 to IM_SIZE_D1 loop
                    readline(text_var,line_var);   --read one row. Each row contains one pixel.
                    read(line_var,pixel);
                    --calculate the difference between actual gray pixels from Matlab
                    --and pixel values from our rgb2gray VHDL design.
                    diff(i,j) := abs(image_pixels(i,j) - pixel);
                    --If the difference is 2 or more then, we take it as an error
                    --and increment the variable 'error' by 1.
                    if(diff(i,j) > 1then
                        error := error+1;
                    end if;
                    wait for 1 ns;  --pause for 1 ns. 
                end loop;
            end loop;
        end loop;
        file_close(text_var); --close the file after reading.

        wait;   --wait eternally after finishing simulation.
    end process;

end architecture;   --End of Testbench


    Note that IM_SIZE_D1 and IM_SIZE_D2 are the number of rows and columns respectively of the image matrix. You can read it as a short form for "Image size along dimension 1 (or) 2".

     In the testbench we have a variable matrix called "diff". This matrix contains the absolute value  of difference between gray image pixels, from Matlab and VHDL. A difference of "1" is considered okay, as binary multiplications can cause some loss of least significant bits. But more than a difference of "1" is considered as an error. We can see that at the end of simulation, we didn't get any "error". 

    There is no address output from the top level entity rgb2gray. In our design, I thought it was unnecessary, because the pixel outputs were coming out in a sequential manner. So the testbench "knew" where to place each output pixel.

    I hope this post was helpful for you and gave you some ideas on how to do image processing using VHDL.

    You can Download the VHDL codes, images etc from Here.


Tuesday, December 8, 2020

Synthesizable Clocked Square Root Calculator In VHDL

    Long back I had shared VHDL function for finding the square root of a number. This function too was synthesisable, but as it was a function, it was purely combinatorial. If you want to find the square root of a relatively larger number, then the resource usage was very high.

    In such cases, it makes sense to use a clocked design. Such a clocked design enables us to reuse one set of resources over and over. The advantage of such a design is that it uses far less resources while the disadvantage being the low speed. 

    For example, in the design I have shared in this post, to find the square root of a N-bit number, you need to wait N/2 clock cycles. 

    The code is written based on Figure (8) from this paper: A New Non-Restoring Square Root Algorithm and Its VLSI Implementations.

    The codes are well commented, so I wont write much about how it works here. Please refer to the block diagram from the paper in case you have some doubts.

Let me share the codes now:

square_root.vhd:


--Synthesisable Design for Finding Square root of a number.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity square_root is
    generic(N : integer := 32);
    port (
        Clk : in std_logic;     --Clock
        rst : in std_logic;     --Asynchronous active high reset.
        input : in unsigned(N-1 downto 0);  --this is the number for which we want to find square root.
        done : out std_logic;   --This signal goes high when output is ready
        sq_root : out unsigned(N/2-1 downto 0)  --square root of 'input'
    );
end square_root;

architecture Behav of square_root is

begin

    SQROOT_PROC : process(Clk,rst)
        variable a : unsigned(N-1 downto 0);  --original input.
        variable left,right,r : unsigned(N/2+1 downto 0):=(others => '0');  --input to adder/sub.r-remainder.
        variable q : unsigned(N/2-1 downto 0) := (others => '0');  --result.
        variable i : integer := 0;  --index of the loop. 
    begin
        if(rst = '1'then  --reset the variables.
            done <= '0';
            sq_root <= (others => '0');
            i := 0;
            a := (others => '0');
            left := (others => '0');
            right := (others => '0');
            r := (others => '0');
            q := (others => '0');
        elsif(rising_edge(Clk)) then
            --Before we start the first clock cycle get the 'input' to the variable 'a'.
            if(i = 0then  
                a := input;
                done <= '0';    --reset 'done' signal.
                i := i+1;   --increment the loop index.
            elsif(i < N/2then --keep incrementing the loop index.
                i := i+1;  
            end if;
            --These statements below are derived from the block diagram.
            right := q & r(N/2+1) & '1';
            left := r(N/2-1 downto 0) & a(N-1 downto N-2);
            a := a(N-3 downto 0) & "00";  --shifting left by 2 bit.
            if ( r(N/2+1) = '1'then   --add or subtract as per this bit.
                r := left + right;
            else
                r := left - right;
            end if;
            q := q(N/2-2 downto 0) & (not r(N/2+1));
            if(i = N/2then    --This means the max value of loop index has reached. 
                done <= '1';    --make 'done' high because output is ready.
                i := 0--reset loop index for beginning the next cycle.
                sq_root <= q;   --assign 'q' to the output port.
                --reset other signals for using in the next cycle.
                left := (others => '0');
                right := (others => '0');
                r := (others => '0');
                q := (others => '0');
            end if;
        end if;    
    end process;

end architecture;


Testbench: tb.vhd


--Testbench for out square root calculator design.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.math_real.all;

--empty entity as its a testbench
entity tb is
end tb;

architecture sim of tb is

    --Declare the component which we want to test.
    component square_root is
        generic(N : integer := 32);
        port (
            Clk : in std_logic;
            rst : in std_logic;
            input : in unsigned(N-1 downto 0);
            done : out std_logic;
            sq_root : out unsigned(N/2-1 downto 0)
        );
    end component;

    constant clk_period : time := 10 ns;    --set the clock period for simulation.
    constant N : integer := 16;    --width of the input.
    signal Clk,rst,done : std_logic := '0';
    signal input : unsigned(N-1 downto 0) := (others => '0');
    signal sq_root : unsigned(N/2-1 downto 0) := (others => '0');
    signal error : integer := 0;    --this indicates the number of errors encountered during simulation.
    

begin

    Clk <= not Clk after clk_period / 2;    --generate clock by toggling 'Clk'.

    --entity instantiation.
    DUT : entity work.square_root generic map(N => N)
             port map(Clk,rst,input,done,sq_root);

    --Apply the inputs to the design and check if the results are correct. 
    --The number of inputs for which the results were wrongly calculated are counted by 'error'.     
    SEQUENCER_PROC : process
        variable actual_result,i : integer := 0;
    begin
        --First we apply reset input for one clock period.
        rst <= '1';
        wait for clk_period;
        rst <= '0';
        --Test the design for all the combination of inputs.
        --Since we have (2^16)-1 inputs, we test all of them one by one. 
        while(i <= 2**N-1loop
            input <= to_unsigned(i,N);  --convert 'i' from integer to unsigned format.
            wait until done='1';    --wait until the 'done' output signal goes high.
            wait until falling_edge(Clk);   --we sample the output at the falling edge of the clock.
            actual_result := integer(floor(sqrt(real(i)))); --Calculate the actual result.
            --if actual result and calculated result are different increment 'error' by 1.
            if (actual_result /= to_integer(sq_root)) then  
                error <= error + 1;
            end if
            i := i+1;   --increment the loop index.
        end loop;
        reset <= '1';   --all inputs are tested. Apply reset
        input <= (others => '0');   --reset the 'input'
        wait;
    end process;

end architecture;


Simulation Waveform from ModelSim:



        To reach the end of the testbench, you need to simulate only for 5.5 msec of simulation time.


Monday, December 7, 2020

Synthesizable Polynomial Equation Calculator in VHDL

     A polynomial equation is an equation which is formed with variables, coefficients and exponents. They can be written in the following format:

                        y =anxn + an-1xn-1 + ... +a1x + a0 .

Here an, an-1  ... , a1 ,a0  are coefficients,

n, n-1, ... etc are exponents and x is the variable. 

In this post, I want to share, VHDL code for computing the value of 'y' if you know the value of 'x' and coefficients forming the polynomial. 

Directly calculating the polynomial with the above equation is very inefficient. So, first we reformat the equation as per Horner's rule:

                        y =(((anx + an-1)x + an-2)x + an-3)x + ... )x +a0 .

This modified equation can be represented by the following block diagram:





There are 3 sub-entities in the above block diagram.

1) Block number 1:

    This block keep generating successive powers of input x. 

In the first clock cycle it generates x^0 = 1.

In the 2ndclock cycle it generates x^1.

In the 3rd clock cycle it generates x^2.

In the nth clock cycle it generates x^(n-1).

    Every clock cycle, the previous outputs are multiplied with 'x' to get the next power of x.

2)Block number 2:

    The powers of x generated in the first block are multiplied with the corresponding coefficients of the polynomial in Block numbered 2. The coefficients are declared and initialized in the top level entity and can be changed easily.

3)Block number 4:

    This block is basically an accumulative adder which accumulates all the products generated by the multiplier. Once it does 'n' additions, we have the final result available in output 'y'. An output 'done' is set as high to indicate that the output is ready.


Let me share the VHDL codes for these entities along with the testbench.

1) power_module.vhd


--This module generates successive powers of input x in each clock cycle.
--For example 1,x,x^2,x^3,x^4 etc. 
--The output from the previous cycle is multiplied again by x to get the next power.
--The output from this entity is passed to the multiplier module, where it gets
--multiplied by the corresponding coefficient.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity power_module is
    generic(N : integer := 32);
    port (
        Clock : in std_logic;
        reset : in std_logic;
        x : in signed(N-1 downto 0);
        x_powered : out signed(N-1 downto 0)
    );
end power_module;

architecture Behav of power_module is

    signal x_pow : signed(N-1 downto 0) := to_signed(1,N);

begin

    POWER_PROC : process(Clock,reset)
        variable temp_prod : signed(2*N-1 downto 0);
    begin
        if (reset='1'then
            x_pow <= to_signed(1,N);
        elsif(rising_edge(Clock)) then
            temp_prod := x*x_pow;
            --The MSB half of the result is ignored. We assume that all intermediate powers of x
            --can be represented by a max of N bits.
            x_pow <= temp_prod(N-1 downto 0);
        end if;
    end process;

    x_powered <= x_pow;

end architecture;


2) multiplier.vhd


--The successive powers of x are multiplied by the corresponding coeffcients here. 
--The results are sent to the adder module which acts as an "accumulator".
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity multiplier is
    generic(N : integer := 32);
    port (
        Clock : in std_logic;
        x,y : in signed(N-1 downto 0);
        prod : out signed(N-1 downto 0)
    );
end multiplier;

architecture Behav of multiplier is

begin

    MULT_PROC : process(Clock)
        variable temp_prod : signed(2*N-1 downto 0) := (others => '0');
    begin
        if(rising_edge(Clock)) then
            temp_prod := x*y;
            --The MSB half of the result is ignored. We assume that all intermediate numbers
            --be represented by a max of N bits.
            prod <= temp_prod(N-1 downto 0);
        end if;    
    end process;
    

end architecture;


3) adder.vhd


--The output from multiplier module is received by this module and accumulated to
--form the output. If the polynomial equation has a degree of 'n' then 'n' additions has to take place
--to get the final result.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity adder is
    generic(N : integer := 32);
    port (
        Clock : in std_logic;
        reset : in std_logic;
        x : in signed(N-1 downto 0);
        sum : out signed(N-1 downto 0)
    );
end adder;

architecture Behav of adder is

    signal temp_sum : signed(N-1 downto 0) := (others => '0');

begin

    ADDER_PROC : process(Clock,reset)
    begin
        if (reset = '1'then
            temp_sum <= (others => '0');
        elsif(rising_edge(Clock)) then
            temp_sum <= temp_sum + x;
        end if;
    end process;

    sum <= temp_sum;

end architecture;


4) Top Level Entity : poly_eq_calc.vhd


--This is the top level entity for the polynomial equation calculator.
--The power_module, multiplier and adders entities are instantiated inside this top level block.
--When the output signal done is High, the output available at the 'y' output is the result we want.
--Ignore all other values of 'y' when done is Low.
--We have assumed that all the intermediate numbers calculated to reach the final result, can be represented
--by a maximum of N bits. This simplifies the design very much.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity poly_eq_calc is
    generic(N : integer := 32);
    port (
        Clock : in std_logic;
        reset : in std_logic;
        x : in signed(N-1 downto 0);
        y : out signed(N-1 downto 0);
        done : out std_logic
    );
end poly_eq_calc;

architecture Behav of poly_eq_calc is

    component power_module is
        generic(N : integer := 32);
        port (
            Clock : in std_logic;
            reset : in std_logic;
            x : in signed(N-1 downto 0);
            x_powered : out signed(N-1 downto 0)
        );
    end component;

    component multiplier is
        generic(N : integer := 32);
        port (
            Clock : in std_logic;
            x,y : in signed(N-1 downto 0);
            prod : out signed(N-1 downto 0)
        );
    end component;

    component adder is
        generic(N : integer := 32);
        port (
            Clock : in std_logic;
            reset : in std_logic;
            x : in signed(N-1 downto 0);
            sum : out signed(N-1 downto 0)
        );
    end component;

    signal x_powered,coeff,product_from_mult : signed(N-1 downto 0) := (others => '0');
    constant NUM_COEFFS : integer := 4--Change here to change the degree of the polynomial. 
    type arr_type is array (0 to  NUM_COEFFS-1of signed(N-1 downto 0);  
     --Eq : 3*x^3 - 2*x^2 - 4*x + 5;
    --The coefficients belonging to higher powers of x are stored in higher addresses in Coeffs array.
    signal Coeffs : arr_type := (to_signed(5,N),  --change coefficients here. 
                                to_signed(-4,N),
                                to_signed(-2,N),
                                to_signed(3,N));
 
    signal reset_adder :  std_logic := '0';

begin

    --Instantiate the 3 sub-entities.    
    calc_power_of_x : power_module generic map(N => N)
        port map(Clock,reset,x,x_powered);

    multiply : multiplier  generic map(N => N)
        port map(Clock,x_powered,coeff,product_from_mult);

    add : adder  generic map(N => N)
        port map(Clock,reset_adder,product_from_mult,y);

    --The process controlling the 3 sub-entities and also supplying them with coefficients.    
    MAIN_PROC : process(Clock,reset)
        variable coeff_index : integer := 0;  
    begin
        if(reset = '1'then
            done <= '0';
            coeff_index := 0;
            coeff <= Coeffs(0);
            reset_adder <= '1';
        elsif(rising_edge(Clock)) then
            reset_adder <= '0'--the disabling of 'reset of adder' gets noticed by adder entity in the next clock cycle.
            if(coeff_index < NUM_COEFFS-1then
                coeff_index := coeff_index+1;
                coeff <= Coeffs(coeff_index);   --send the coefficients one by one to the multiplier entity. 
            elsif(coeff_index = NUM_COEFFS-1then
                coeff_index := coeff_index+1;
            else
                done <= '1';    --The final result is available in 'y' now.
            end if;    
        end if;
    end process;

end architecture;


5) Testbench : tb_poly_eq_calc.vhd


--Testbench code.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

--The testbench entity is always empty.
entity tb_poly_eq_calc is
end tb_poly_eq_calc;

architecture sim of tb_poly_eq_calc is

    constant clk_period : time := 10 ns--set the clock period here.

    signal Clk : std_logic := '1';
    signal reset : std_logic := '1';
    constant N : integer := 32--change the width of input and output here.
    signal x,y : signed(N-1 downto 0) := (others => '0');
    signal done : std_logic := '0';

begin

    Clk <= not Clk after clk_period / 2;    --generate clock

    --Instantiating Design Under Test.
    DUT : entity work.poly_eq_calc generic map(N => N)
        port map (Clk, reset, x, y, done);

    --Apply inputs here. Only 'x' can be changed here. 
    --To change coefficients and degree of polynomial edit the top level entity directly.
    SEQUENCER_PROC : process
    begin
        --first input.
        reset <= '1';
        wait for clk_period * 2.5;
        reset <= '0';
        x <= to_signed(4, N);
        wait for clk_period;
        wait until done='1';    --wait for result to be out.
        wait for clk_period;

        --second input.
        reset <= '1';
        wait for clk_period;
        reset <= '0';
        x <= to_signed(7, N);
        wait for clk_period;
        wait until done='1';   --wait for result to be out.
        wait for clk_period;

        --third input.
        reset <= '1';
        wait for clk_period;
        reset <= '0';
        x <= to_signed(15, N);
        wait for clk_period;
        wait until done='1';   --wait for result to be out.
        wait for clk_period;

        --fourth input.
        reset <= '1';
        wait for clk_period;
        reset <= '0';
        x <= to_signed(-9, N);
        wait for clk_period;
        wait until done='1';   --wait for result to be out.
        wait for clk_period;
        reset <= '1';

        wait;
    end process;

    --Check if the results are correct and report the result.
    CHECK_RESULTS_PROC : process(Clk)
        variable actual_res : integer := 0;
        variable input_num,x_int : integer := 0;
    begin
        if(rising_edge(Clk)) then
            if(done = '1'then
                x_int := to_integer(x);
                --Change this equation if you change the polynomial eq inside the top level entity.
                actual_res := 3*x_int*x_int*x_int - 2*x_int*x_int - 4*x_int + 5;  
                input_num := input_num+1;
                if(actual_res = to_integer(y)) then
                    report "Input number "  & integer'image(input_num) & " Worked Well";
                else
                    report "Input number "  & integer'image(input_num) & " Has Error"
                end if;   
            end if;                   
        end if;
    end process;

end architecture;


The code was simulated successfully using Modelsim 10.4a version. The simulation waveform below shows the signals for the first two set of inputs:



The design was synthesized successfully using Vivado 2020.2 version.