Thursday, March 3, 2011

FFT v8.0 AXI with scaled output

The output of the FFT core can be set to "Scaled" to save some logic resources if the full precision is not required. A few things are worth mentioning to get the scaling to work:

  •  Scaling is done in stages. There are 2 bits in the scaling schedule input for each stage to scale the result by 0 to 3 bits.
  • The scaling schedule needs to be set BEFORE the start of the transformation. 
  • It's always a good idea to bring the overflow output out and monitor it to make sure no overflow occurs during scaling.
  • The bit accurate C model also has a scaling option and scaling schedule input in a slightly difference format than the FFT SysGen block. When comparing the results between the FFT SysGen block and the C model, make sure the same scaling settings are used in both.
  • When comparing the result between the FFT SysGen block and the Matlab FFT function, remember to scale the output of the FFT function by the same scaling factor.
Below are more details about the points above:

Scaling Input Format:

The description in the FFT Datasheet (view all Xilinx documents in Document Navigator) is pretty good, so I just pasted it here for your reference:

Scaling Schedule Setup

In FFT 8.0, the scaling schedule is set via the AXI configuration channel. The data are sent to the core via the data input channel. To make the scaling schedule to take effect, make sure the configuration data are taken by the core before the input samples. Remember that in AXI, the data are only transferred when both TVALID and TREADY are asserted. In this case, check the waveform (see below) to confirm that the condition that s_axis_config_tvalid=1 and s_axis_config_tready=1 with desired a_axis_config_scale_sch happens before s_axis_data_tvalid=1 and s_axis_data_tready=1

Overflow output

The bit growth of the FFT core in full precision mode is NFFT+1 where NFFT is log2(FFT size). For example, for a 256 point FFT, the bit growth is 9 bits. When the scaled output is used, the output has the same width as the input samples, so theoretically the scaling schedule needs to scale back the full bit growth to avoid the overflow. For the 256 point FFT, the scaling schedule can be something like [3 2 2 2] to account for the additional 9 bits. However, for some applications the input samples may never reach the full bit growth going through the FFT core, so the scaling schedule can be smaller to get more dynamic range. This is when you need to monitor the overflow output closely to make sure the overflow doesn't happen. The best way I found for selecting a scaling schedule for maximum dynamic range is to run the bit accurate C model with large set of real input samples and adjust the scaling schedule as needed.

The snapshot below shows that overflow is observed on the m_axis_data_ovflow output as well as the m_axis_status_ovflow output when the scaling schedule is set too small. 

The snapshot below shows a good transformation of a frame when the scaling schedule is good.

Set up Scaling in Bit Accurate C Model

Using the same 256 point FFT as an example, below are the things need to be set up to use the scaling with the bit accurate C model

generics.C_HAS_SCALING = 1; % Set to 0 if C_USE_FLT_PT = 1
% Scaling schedule
scaling_sch0 = 2;
scaling_sch1 = 2;
scaling_sch2 = 2;
scaling_sch3 = 2;

scaling_sch = [scaling_sch3, scaling_sch2, scaling_sch1, scaling_sch0];
[fft_ba, blkexp, overflow] = xfft_v8_0_bitacc_mex(generics, nfft, input, scaling_sch, direction);

Scale Matlab FFT Output

When comparing the Matlab FFT function with the FFT SysGen block, make sure the FFT function output is also scaled by the same scaling factor. e.g.

scaling_factor = 2^sum(scaling_sch);
fft_mat = fft(input);
mag_mat = abs(fft_mat)/scaling_factor;


OK, now comes my favorite 256 point FFT example that is used a few other blogs. I updated it to turn on the scaling and bring out the overflow output. Below is how the model looks like followed by the plot of the comparison.


    1. Hello Dear,
      Thank you for you design, i want to implement this design in Xilinx Spartan 3An card, but i have a problem with pins (gatway in, gatway out), i want to view the result in the real scope, so can you please help me with the out pins? in the out we have a 16 bits signal so we need 16 pins or what? i am waiting for response thank you .

    2. oscilloscope , this one