VHDL coding tips and tricks: June 2011

Monday, June 27, 2011

Non-synthesisable VHDL code for 8 point FFT algorithm

A Fast Fourier Transform(FFT) is an efficient algorithm for calculating the discrete Fourier transform of a set of data. A DFT basically decomposes a set of data in time domain into different frequency components. DFT is defined by the following equation:
$X_k = \sum_{n=0}^{N-1} x_n e^{-{i 2\pi k \frac{n}{N}}} \qquad k = 0,\dots,N-1.$
A FFT algorithm uses some interesting properties of the above formula to simply the calculations. You can read more about these FFT algorithms here.
Many students have been asking doubts regarding vhdl implementation of FFT, so I decided to write a sample code. I have selected 8 point decimation in time(DIT) FFT algorithm for this purpose. In short I have wrote the code for this flow diagram of FFT.
This is just a sample code, which means it is not synthesisable. I have used real data type for the inputs and outputs and all calculations are done using the math_real library. The inputs can be complex numbers too.
To define the basic arithmetic operations between two complex numbers I have defined some new functions which are available in the package named fft_pkg. The component named, butterfly , contains the basic butterfly calculations for FFT as shown in this flow diagram.
I wont be going any deep into the theory behind FFT here. Please visit the link given above or Google for in depth theory. There are 4 vhdl codes in the design,including the testbench code, and are given below.

The package file - fft_pkg.vhd:

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.MATH_REAL.ALL;

package fft_pkg is

type complex is
record
r : real;
i : real;
end record;

type comp_array is array (0 to 7) of complex;
type comp_array2 is array (0 to 3) of complex;

function add (n1,n2 : complex) return complex;
function sub (n1,n2 : complex) return complex;
function mult (n1,n2 : complex) return complex;

end fft_pkg;

package body fft_pkg is

--addition of complex numbers
function add (n1,n2 : complex) return complex is

variable sum : complex;

begin
sum.r:=n1.r + n2.r;
sum.i:=n1.i + n2.i;
return sum;
end add;

--subtraction of complex numbers.
function sub (n1,n2 : complex) return complex is

variable diff : complex;

begin
diff.r:=n1.r - n2.r;
diff.i:=n1.i - n2.i;
return diff;
end sub;

--multiplication of complex numbers.
function mult (n1,n2 : complex) return complex is

variable prod : complex;

begin
prod.r:=(n1.r * n2.r) - (n1.i * n2.i);
prod.i:=(n1.r * n2.i) + (n1.i * n2.r);
return prod;
end mult;

end fft_pkg;

The top level entity - fft8.vhd:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.MATH_REAL.ALL;
library work;
use work.fft_pkg.ALL;

entity fft8 is
port( s : in comp_array; --input signals in time domain
y : out comp_array --output signals in frequency domain
);
end fft8;

architecture Behavioral of fft8 is

component butterfly is
port(
s1,s2 : in complex; --inputs
w :in complex; -- phase factor
g1,g2 :out complex -- outputs
);
end component;

signal g1,g2 : comp_array := (others => (0.0,0.0));
--phase factor, W_N = e^(-j*2*pi/N) and N=8 here.
--W_N^i = cos(2*pi*i/N) - j*sin(2*pi*i/N); and i has range from 0 to 7.
constant w : comp_array2 := ( (1.0,0.0), (0.7071,-0.7071), (0.0,-1.0), (-0.7071,-0.7071) );

begin

--first stage of butterfly's.
bf11 : butterfly port map(s(0),s(4),w(0),g1(0),g1(1));
bf12 : butterfly port map(s(2),s(6),w(0),g1(2),g1(3));
bf13 : butterfly port map(s(1),s(5),w(0),g1(4),g1(5));
bf14 : butterfly port map(s(3),s(7),w(0),g1(6),g1(7));

--second stage of butterfly's.
bf21 : butterfly port map(g1(0),g1(2),w(0),g2(0),g2(2));
bf22 : butterfly port map(g1(1),g1(3),w(2),g2(1),g2(3));
bf23 : butterfly port map(g1(4),g1(6),w(0),g2(4),g2(6));
bf24 : butterfly port map(g1(5),g1(7),w(2),g2(5),g2(7));

--third stage of butterfly's.
bf31 : butterfly port map(g2(0),g2(4),w(0),y(0),y(4));
bf32 : butterfly port map(g2(1),g2(5),w(1),y(1),y(5));
bf33 : butterfly port map(g2(2),g2(6),w(2),y(2),y(6));
bf34 : butterfly port map(g2(3),g2(7),w(3),y(3),y(7));

end Behavioral;

Butterfly component - butterfly.vhd:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
library work;
use work.fft_pkg.ALL;

entity butterfly is
port(
s1,s2 : in complex; --inputs
w :in complex; -- phase factor
g1,g2 :out complex -- outputs
);
end butterfly;

architecture Behavioral of butterfly is

begin

--butterfly equations.
g1 <= add(s1,mult(s2,w));
g2 <= sub(s1,mult(s2,w));

end Behavioral;

Testbench code - tb_fft8.vhd:

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
library work;
use work.fft_pkg.all;

ENTITY tb IS
END tb;

ARCHITECTURE behavior OF tb IS
signal s,y : comp_array;

BEGIN

-- Instantiate the Unit Under Test (UUT)
uut: entity work.fft8 PORT MAP (
s => s,
y => y
);

-- Stimulus process
stim_proc: process
begin
--sample inputs in time domain.
s(0) <= (-2.0,1.2);
s(1) <= (-2.2,1.7);
s(2) <= (1.0,-2.0);
s(3) <= (-3.0,-3.2);
s(4) <= (4.5,-2.5);
s(5) <= (-1.6,0.2);
s(6) <= (0.5,1.5);
s(7) <= (-2.8,-4.2);
wait;
end process;

END;

Copy and paste the above codes into their respective files and simulate. You will get the output in the output signal 'y'. Once again, the code is not synthesisable.

Hope the codes are helpful for you. In case you want a synthesisable version of the codes or want a customized FFT vhdl code then contact me.

Saturday, June 25, 2011

VHDL code for a 4 tap FIR filter

Finite Impulse Response(FIR) filters are one of the two main type of filters available for signal processing. As the name suggests the output of a FIR filter is finite and it settles down to zero after some time. For a basic FAQ on FIR filters see this post by dspguru.

A FIR filter output, 'y' can be defined by the following equation:

$y[n] = \sum_{i=0}^{N} b_i x[n-i]$

Here, 'y' is the filter output, 'x' in the input signal and 'b' is the filter coefficients. 'N' is the filter order. The higher the value of N is, the more complex the filter will be.

For writing the code in VHDL I have referred to the paper, VHDL generation of optimized FIR filters. You can say I have coded the exact block diagram available in the paper, "Figure 2".

This is a 4 tap filter. That means the order of the filter is 4 and so it has 4 coefficients. I have defined the input as signed type of 8 bits wide. The output is also of signed type with 16 bits width. The design contains two files. One is the main file with all multiplications and adders defined in it, and another for defining the D flip flop operation.

The main file is given below:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity fir_4tap is
port( Clk : in std_logic; --clock signal
Xin : in signed(7 downto 0); --input signal
Yout : out signed(15 downto 0) --filter output
);
end fir_4tap;

architecture Behavioral of fir_4tap is

component DFF is
port(
Q : out signed(15 downto 0); --output connected to the adder
Clk :in std_logic; -- Clock input
D :in signed(15 downto 0) -- Data input from the MCM block.
);
end component;

signal H0,H1,H2,H3 : signed(7 downto 0) := (others => '0');
signal MCM0,MCM1,MCM2,MCM3,add_out1,add_out2,add_out3 : signed(15 downto 0) := (others => '0');
signal Q1,Q2,Q3 : signed(15 downto 0) := (others => '0');

begin

--filter coefficient initializations.
--H = [-2 -1 3 4].
H0 <= to_signed(-2,8);
H1 <= to_signed(-1,8);
H2 <= to_signed(3,8);
H3 <= to_signed(4,8);

--Multiple constant multiplications.
MCM3 <= H3*Xin;
MCM2 <= H2*Xin;
MCM1 <= H1*Xin;
MCM0 <= H0*Xin;

--adders
add_out1 <= Q1 + MCM2;
add_out2 <= Q2 + MCM1;
add_out3 <= Q3 + MCM0;

--flipflops(for introducing a delay).
dff1 : DFF port map(Q1,Clk,MCM3);
dff2 : DFF port map(Q2,Clk,add_out1);
dff3 : DFF port map(Q3,Clk,add_out2);

--an output produced at every positive edge of clock cycle.
process(Clk)
begin
if(rising_edge(Clk)) then
Yout <= add_out3;
end if;
end process;

end Behavioral;

VHDL code for the component DFF is given below:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity DFF is
port(
Q : out signed(15 downto 0); --output connected to the adder
Clk :in std_logic; -- Clock input
D :in signed(15 downto 0) -- Data input from the MCM block.
);
end DFF;

architecture Behavioral of DFF is

signal qt : signed(15 downto 0) := (others => '0');

begin

Q <= qt;

process(Clk)
begin
if ( rising_edge(Clk) ) then
qt <= D;
end if;
end process;

end Behavioral;

I have written a small test bench code for testing the design. It contains 8 test inputs which are serially applied to the filter module. See below:

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;

ENTITY tb IS
END tb;

ARCHITECTURE behavior OF tb IS

signal Clk : std_logic := '0';
signal Xin : signed(7 downto 0) := (others => '0');
signal Yout : signed(15 downto 0) := (others => '0');
constant Clk_period : time := 10 ns;

BEGIN

-- Instantiate the Unit Under Test (UUT)
uut: entity work.fir_4tap PORT MAP (
Clk => Clk,
Xin => Xin,
Yout => Yout
);

-- Clock process definitions
Clk_process :process
begin
Clk <= '0';
wait for Clk_period/2;
Clk <= '1';
wait for Clk_period/2;
end process;

-- Stimulus process
stim_proc: process
begin
wait for Clk_period*2;
Xin <= to_signed(-3,8); wait for clk_period*1;
Xin <= to_signed(1,8); wait for clk_period*1;
Xin <= to_signed(0,8); wait for clk_period*1;
Xin <= to_signed(-2,8); wait for clk_period*1;
Xin <= to_signed(-1,8); wait for clk_period*1;
Xin <= to_signed(4,8); wait for clk_period*1;
Xin <= to_signed(-5,8); wait for clk_period*1;
Xin <= to_signed(6,8); wait for clk_period*1;
Xin <= to_signed(0,8);

wait;
end process;

END;

The simulation waveform is given below:

The code is synthesisable and with a few changes can be ported for a higher order filter.

Thursday, June 23, 2011

VHDL code for a simple ALU

ALU(Arithmetic Logic Unit) is a digital circuit which does arithmetic and logical operations. Its a basic block in any processor.
In this article I have shared vhdl code for a simple ALU. Note that this is one of the simplest architecture of an ALU. Most of the ALU's used in practical designs are far more complicated and requires good design experience.
The block diagram of the ALU is given below. As you can see, it receives two input operands 'A' and 'B' which are 8 bits long. The result is denoted by 'R' which is also 8 bit long. The input signal 'Op' is a 3 bit value which tells the ALU what operation to be performed by the ALU. Since 'Op' is 3 bits long we can have 2^3=8 operations.

Our ALU is capable of doing the following operations:

ALU Operation	Description
Add Signed	R = A + B : Treating A, B, and R as signed two's complement integers.
Subtract Signed	R = A - B : Treating A, B, and R as signed two's complement integers.
Bitwise AND	R(i) = A(i) AND B(i).
Bitwise NOR	R(i) = A(i) NOR B(i).
Bitwise OR	R(i) = A(i) OR B(i).
Bitwise NAND	R(i) = A(i) NAND B(i).
Bitwise XOR	R(i) = A(i) XOR B(i).
Biwise NOT	R(i) = NOT A(i).

These functions are implemented using a case statement. The ALU calculates the outputs at every positive edge of clock cycle. As soon as the outputs are calculated it is available at the port signal 'R'. See the code below, to know how it is done:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity simple_alu is
port( Clk : in std_logic; --clock signal
A,B : in signed(7 downto 0); --input operands
Op : in unsigned(2 downto 0); --Operation to be performed
R : out signed(7 downto 0) --output of ALU
);
end simple_alu;

architecture Behavioral of simple_alu is

--temporary signal declaration.
signal Reg1,Reg2,Reg3 : signed(7 downto 0) := (others => '0');

begin

Reg1 <= A;
Reg2 <= B;
R <= Reg3;

process(Clk)
begin

if(rising_edge(Clk)) then --Do the calculation at the positive edge of clock cycle.
case Op is
when "000" =>
Reg3 <= Reg1 + Reg2; --addition
when "001" =>
Reg3 <= Reg1 - Reg2; --subtraction
when "010" =>
Reg3 <= not Reg1; --NOT gate
when "011" =>
Reg3 <= Reg1 nand Reg2; --NAND gate
when "100" =>
Reg3 <= Reg1 nor Reg2; --NOR gate
when "101" =>
Reg3 <= Reg1 and Reg2; --AND gate
when "110" =>
Reg3 <= Reg1 or Reg2; --OR gate
when "111" =>
Reg3 <= Reg1 xor Reg2; --XOR gate
when others =>
NULL;
end case;
end if;

end process;

end Behavioral;

The testbench code used for testing the ALU code is given below:

LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;

ENTITY tb IS
END tb;

ARCHITECTURE behavior OF tb IS

signal Clk : std_logic := '0';
signal A,B,R : signed(7 downto 0) := (others => '0');
signal Op : unsigned(2 downto 0) := (others => '0');
constant Clk_period : time := 10 ns;

BEGIN

-- Instantiate the Unit Under Test (UUT)
uut: entity work.simple_alu PORT MAP (
Clk => Clk,
A => A,
B => B,
Op => Op,
R => R
);

-- Clock process definitions
Clk_process :process
begin
Clk <= '0';
wait for Clk_period/2;
Clk <= '1';
wait for Clk_period/2;
end process;

-- Stimulus process
stim_proc: process
begin
wait for Clk_period*1;
A <= "00010010"; --18 in decimal
B <= "00001010"; --10 in decimal
Op <= "000"; wait for Clk_period; --add A and B
Op <= "001"; wait for Clk_period; --subtract B from A.
Op <= "010"; wait for Clk_period; --Bitwise NOT of A
Op <= "011"; wait for Clk_period; --Bitwise NAND of A and B
Op <= "100"; wait for Clk_period; --Bitwise NOR of A and B
Op <= "101"; wait for Clk_period; --Bitwise AND of A and B
Op <= "110"; wait for Clk_period; --Bitwise OR of A and B
Op <= "111"; wait for Clk_period; --Bitwise XOR of A and B
wait;
end process;

END;

This is the simulation waveform:

The code is sythesisable. I haven't used std_logic_vector in this code. You can read this post to know why.

The block diagram was adopted from this original article. To make sure you have understood the concepts well, try implementing the block diagram given in the original article.

Thursday, June 16, 2011

Using real data types in VHDL

Apart from the standard types like integer and std_logic_vector's VHDL also offer real data types. But a real data type has a big disadvantage. It is not synthesis-able. It can be used only for simulation purposes. This disadvantage limits its use to a large extend, but there are plenty of projects where we look only for simulation results.

Before starting the coding part of a VHDL project,one has to decide whether the project to be implemented on a real FPGA or just a computer simulation is required. If it has to be ran on FPGA, then forget about the real package and use only synthesis-able data types like std_logic,integer etc... Otherwise you can reduce the time and complexity of your project by using real data types.

The real data type is defined in the library called MATH_REAL. So you have to include the following line before the entity declaration in the code:
use ieee.math_real.all;

The math_real package also offers some elementary mathematical functions for real data types. You can see the math_real.vhd file at the following address.

See the below code to get an idea on how to use these functions:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.MATH_REAL.ALL;

entity real_demo is
end real_demo;

architecture Behavioral of real_demo is

--signals declared with the REAL data type.
--MATH_PI is a constant defined in the math_real package.
signal X : real := -MATH_PI/3.0; --A real variable X, initialized to pi/3(60 degreee).
signal sign_result,ceil_result,floor_result,round_result,trunc_result : real := 0.0;
signal max,min,root,cube,power1,power2,exp_result : real := 0.0;
signal log_result,log2_result,log10_result,log_result2,sine,cosine,tangent : real := 0.0;
signal sin_inv,cos_inv,tan_inv,sin_hyp,cos_hyp,tan_hyp : real := 0.0;
signal inv_sin_hyp,inv_cos_hyp,inv_tan_hyp : real := 0.0;

begin

process
begin

sign_result <= SIGN(X); --sign of X
ceil_result <= CEIL(X); --smallest integer value not less than X
floor_result <= FLOOR(X); --largest integer value not greater than X
round_result <= ROUND(X); --round to the nearest integer.
trunc_result <= TRUNC(X); --truncation.
max <= REALMAX(4.5,4.6); --return the maximum
min <= REALMIN(2.3,3.2); --return the minimum
root <= SQRT(4.0); --square root
cube <= CBRT(64.0); --cube root
power1 <= 2**3.0; --power of an integer
power2 <= 3.0**3.0; --power of a real
exp_result <= EXP(1.0); --returns e**X.
log_result <= LOG(2.73); --natural logarithm
log2_result <= LOG2(16.0); --log to the base 2.
log10_result <= LOG10(100.0); --log to the base 10.
log_result2 <= LOG(27.0,3.0); --log to the given base.
sine <= SIN(X); --sine of the given angle(in rad)
cosine <= COS(X);--cosine of the given angle(in rad)
tangent <= TAN(X);--tangent of the given angle(in rad)
sin_inv <= ARCSIN(SIN(X)); --sine inverse.
cos_inv <= ARCCOS(COS(X)); --cosine inverse.
tan_inv <= ARCTAN(TAN(X)); --tangent inverse.
sin_hyp <= SINH(X); --Hyperbolic sine
cos_hyp <= COSH(X); --Hyperbolic cosine.
tan_hyp <= TANH(X); --Hyperbolic tangent.
inv_sin_hyp <= ARCSINH(SINH(X)); --Inverse hyperbolic sine.
inv_cos_hyp <= ARCCOSH(COSH(X)); --Inverse hyperbolic cosine.
inv_tan_hyp <= ARCTANH(TANH(X)); --Inverse hyperbolic tangent.
wait;

end process;

end Behavioral;

Almost all the functions available in the real package are shown above and comments are provided on what they are used for. For your reference I have attached the simulation result below:

As you can see its a pretty useful library. Dont forget to use it if you have a simulation project in VHDL. Contact me for any kind of help. I always use this package when I need to convert a matlab code to corresponding VHDL(for just simulation). Using this package, helps me to write the vhdl code like a pseudo code.

VHDL coding tips and tricks

Pages

Monday, June 27, 2011

Non-synthesisable VHDL code for 8 point FFT algorithm

Saturday, June 25, 2011

VHDL code for a 4 tap FIR filter

Thursday, June 23, 2011

VHDL code for a simple ALU

Thursday, June 16, 2011

Using real data types in VHDL

Get updates

VHDL: Simple Digital Clock with Testbench

VHDL: 4 to 1 Multiplexer(MUX) Using Case statements with Testbench

Block and distributed RAM's on Xilinx FPGA's

VHDL: 8 bit Binary to BCD converter with Testbench

File reading and writing in VHDL - Part 2

Total Pageviews

Facebook