Long back I had shared VHDL function for finding the square root of a number. This function too was synthesisable, but as it was a function, it was purely combinatorial. If you want to find the square root of a relatively larger number, then the resource usage was very high.
In such cases, it makes sense to use a clocked design. Such a clocked design enables us to reuse one set of resources over and over. The advantage of such a design is that it uses far less resources while the disadvantage being the low speed.
For example, in the design I have shared in this post, to find the square root of a N-bit number, you need to wait N/2 clock cycles.
The code is written based on Figure (8) from this paper: A New Non-Restoring Square Root Algorithm and Its VLSI Implementations.
The codes are well commented, so I wont write much about how it works here. Please refer to the block diagram from the paper in case you have some doubts.
Let me share the codes now:
square_root.vhd:
--Synthesisable Design for Finding Square root of a number.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity square_root is
generic(N : integer := 32);
port (
Clk : in std_logic; --Clock
rst : in std_logic; --Asynchronous active high reset.
input : in unsigned(N-1 downto 0); --this is the number for which we want to find square root.
done : out std_logic; --This signal goes high when output is ready
sq_root : out unsigned(N/2-1 downto 0) --square root of 'input'
);
end square_root;
architecture Behav of square_root is
begin
SQROOT_PROC : process(Clk,rst)
variable a : unsigned(N-1 downto 0); --original input.
variable left,right,r : unsigned(N/2+1 downto 0):=(others => '0'); --input to adder/sub.r-remainder.
variable q : unsigned(N/2-1 downto 0) := (others => '0'); --result.
variable i : integer := 0; --index of the loop.
begin
if(rst = '1') then --reset the variables.
done <= '0';
sq_root <= (others => '0');
i := 0;
a := (others => '0');
left := (others => '0');
right := (others => '0');
r := (others => '0');
q := (others => '0');
elsif(rising_edge(Clk)) then
--Before we start the first clock cycle get the 'input' to the variable 'a'.
if(i = 0) then
a := input;
done <= '0'; --reset 'done' signal.
i := i+1; --increment the loop index.
elsif(i < N/2) then --keep incrementing the loop index.
i := i+1;
end if;
--These statements below are derived from the block diagram.
right := q & r(N/2+1) & '1';
left := r(N/2-1 downto 0) & a(N-1 downto N-2);
a := a(N-3 downto 0) & "00"; --shifting left by 2 bit.
if ( r(N/2+1) = '1') then --add or subtract as per this bit.
r := left + right;
else
r := left - right;
end if;
q := q(N/2-2 downto 0) & (not r(N/2+1));
if(i = N/2) then --This means the max value of loop index has reached.
done <= '1'; --make 'done' high because output is ready.
i := 0; --reset loop index for beginning the next cycle.
sq_root <= q; --assign 'q' to the output port.
--reset other signals for using in the next cycle.
left := (others => '0');
right := (others => '0');
r := (others => '0');
q := (others => '0');
end if;
end if;
end process;
end architecture;
Testbench: tb.vhd
--Testbench for out square root calculator design.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use ieee.math_real.all;
--empty entity as its a testbench
entity tb is
end tb;
architecture sim of tb is
--Declare the component which we want to test.
component square_root is
generic(N : integer := 32);
port (
Clk : in std_logic;
rst : in std_logic;
input : in unsigned(N-1 downto 0);
done : out std_logic;
sq_root : out unsigned(N/2-1 downto 0)
);
end component;
constant clk_period : time := 10 ns; --set the clock period for simulation.
constant N : integer := 16; --width of the input.
signal Clk,rst,done : std_logic := '0';
signal input : unsigned(N-1 downto 0) := (others => '0');
signal sq_root : unsigned(N/2-1 downto 0) := (others => '0');
signal error : integer := 0; --this indicates the number of errors encountered during simulation.
begin
Clk <= not Clk after clk_period / 2; --generate clock by toggling 'Clk'.
--entity instantiation.
DUT : entity work.square_root generic map(N => N)
port map(Clk,rst,input,done,sq_root);
--Apply the inputs to the design and check if the results are correct.
--The number of inputs for which the results were wrongly calculated are counted by 'error'.
SEQUENCER_PROC : process
variable actual_result,i : integer := 0;
begin
--First we apply reset input for one clock period.
rst <= '1';
wait for clk_period;
rst <= '0';
--Test the design for all the combination of inputs.
--Since we have (2^16)-1 inputs, we test all of them one by one.
while(i <= 2**N-1) loop
input <= to_unsigned(i,N); --convert 'i' from integer to unsigned format.
wait until done='1'; --wait until the 'done' output signal goes high.
wait until falling_edge(Clk); --we sample the output at the falling edge of the clock.
actual_result := integer(floor(sqrt(real(i)))); --Calculate the actual result.
--if actual result and calculated result are different increment 'error' by 1.
if (actual_result /= to_integer(sq_root)) then
error <= error + 1;
end if;
i := i+1; --increment the loop index.
end loop;
reset <= '1'; --all inputs are tested. Apply reset
input <= (others => '0'); --reset the 'input'
wait;
end process;
end architecture;
Simulation Waveform from ModelSim:
To reach the end of the testbench, you need to simulate only for 5.5 msec of simulation time.