Wednesday, December 16, 2009

SysGen: Create New HWCOSIM Target with NMM Ports

Hardware Co-Simulation (HWCOSIM) is a great feature in System Generator (SysGen) that allows users to run the full or part of a SysGen design on the FPGA and increase the simulation speed dramatically. SysGen already includes HWCOSIM plugins for commonly used DSP demo boards. Users can easily use the SysGen Board Description Builder (SBDBuilder) to create new HWCOSIM plugins for unsupported boards or unsupported features (for example non-memory mapped or NMM ports) on existing boards. Below are  step-by-step instructions on how to create a new HWCOSIM plugin with NMM ports using ML506 as an example.

Step 1: Open "System Generator" properties window and select "New Compilation Target".

Step 2: The SBDBuilder window pops up.
Board Name: use a descriptive name so you know what the target is later.
System Clock Frequency/Pin Location: enter the actual clock frequency and pin location for your board.
JTAG Options Boundary Scan Position: the position of the target device in the JTAG chain. You can use IMPACT to help fill this in (see a snapshot below for the JTAG chain detected by IMPACT). As shown in the JTAG chain snapshot, the target device xc5vsx50t is the 5th in the chain.
JTAG Options IR Lengths: click the "Detect" button next to it auto-fill it.
Target Devices: click "Add" button to select the target device.


Step 3: Click "Add" button in the "Non-Memory Mapped Ports" section to add NMM ports for HWCOSIM. This brings up the "Configure a Port" window below. Enter a port name, select port direction, Pin LOC and select PULLUP or PULLDOWN if needed. Click "Add Pin" button and the newly added pin will show up in the Pin List below. Click "Save and Start New" button if you want to add more pins or click "Save and Close" if you are done adding NMM ports.

Step 4: The SBDBuilder with all information entered is shown below. Click the "Install" button to install the new board configuration to the SysGen plugins directory.

Step 5: A window with tokens for the NMM ports will pop up once the installation is complete.

Step 6: Save the library with the NMM port tokens and they can now be used in your SysGen model like any other SysGen block. A test model is shown below.


 Step 7: Open "System Generator" properties window again and now you should see the newly created board "ML506 JTAG NMM" in the HWCOSIM target list. Select it as the compilation target.


Step 8: That's it. The rest of steps to use the new compilation target are the same as any other HWCOSIM. The only difference is that Simulink no longer has control/visibility of the two NMM ports. They are now implemented as IOs on FPGA. In this example, while the simulation is running, if you toggle the GPIO SW1 on the board, it will turn on/off the GPIO LED0.

Saturday, November 28, 2009

IFFT in System Generator

The FFT block in Xilinx Blockset can be used to calculate both DFT and IDFT because the two equations are almost identical:


By default, the FFT block is configured to calculate DFT. The setup and timing of control/data signals for IDFT are the same as DFT except for two things:
  1. The FFT block needs to be set up for IDFT by setting fwd_inv_we signal to 1 and fwd_inv signal to 0 before the start of the transform 
  2. The FFT output needs to be manually scaled to account for the factor 1/N in Equation 2 above. The scaling can be done either by using the scaling schedule input or shifting the FFT output if the FFT block is set to "unscaled".
The picture below shows the timing of control/data signals at the beginning of a data frame.
Now let's use a simple 8-point IDFT example to show how everything is put together. Below is the IDFT calculation of a test vector xn_re in Matlab:
>>xn_re=[0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8];
>>ifft(xn_re)
ans =
Columns 1 through 5
0.4500  -0.0500-0.1207i  -0.0500-0.0500i  -0.0500-0.0207i 0.0500          

Columns 6 through 8
-0.0500+0.0207i  -0.0500+0.0500i  -0.0500+0.1207i

The idft_test simulink/sysgen model for the 8-point can be downloaded here. The model includes a block called WaveScope, which is a "hidden" gem in System Generator for debugging SysGen designs, especially for hardware engineers who are used to viewing waveforms in HDL simulators.

The picture below shows the waveform at the beginning of the simulation in Wavescope. fwd_inv_we=1 and fwd_inv=0 for 1 cycle to set up the block for IDFT. Also the scale_sch is set to "010101" at the beginning to scale the FFT result down by 8 (the 1/N factor in Equation 2).

The picture below shows the waveform in Wavescope at the end of simulation, which shows that the xk_re and xk_im outputs when dv=1 match the Matlab results above when taking the quantization errors into account.

Tuesday, November 24, 2009

Really impressed with 7-Zip

Update Aug 20, 2011:  Just found out that 7-Zip can also extract CD/DVD image files (.iso). Nice! Below are all the file formats it supports:
  • Packing / unpacking: 7z, XZ, BZIP2, GZIP, TAR, ZIP and WIM
  • Unpacking only: ARJ, CAB, CHM, CPIO, CramFS, DEB, DMG, FAT, HFS, ISO, LZH, LZMA, MBR, MSI, NSIS, NTFS, RAR, RPM, SquashFS, UDF, VHD, WIM, XAR and Z.
I have been using Winzip to compress files for as long as I can remember. One of my colleagues mentioned another compression tool called 7-Zip to me a couple of weeks ago and I am really impressed with the high compression ratio achieved by 7-Zip. Besides it's free and supports all compression file formats (.zip, .rar, .gz, .7z) that I know of. Below is a chart that compares the compressed file sizes between Winzip 10 and 7-Zip 4.65 on several files that I use for my daily work:

FileUncompressed .zip.7z%smaller
Virtex-6 UG365
15.8MB
5.4MB 4MB 26%
ADEPT 0.38.6 12.6MB5.3MB4.4MB17%
SP601 Base Reference Design rdf0003 56.3MB17.5MB10MB43%

Tuesday, October 27, 2009

Don't optimize my LUT please!

Sometimes you manually instantiate a LUT primitive (e.g. LUT6_2) to add routing delays to the signal path or precisely control the routing resources used but only to find out that the tool either optimizes it out or swaps pins. This can be prevented by using the LOCK_PINS and SAVE NET FLAG (S) constraints (Xilinx Constrants Guide). Below are the code snippets for Verilog and VHDL that work in ISE 11.3:

Sunday, October 11, 2009

Wait, what about DDR OFFSET IN/OUT using DCM clock with phase shift?

I recently wrote two blogs about DDR OFFSET constraints:
Looks like I'm going to make a career out of talking about the OFFSET constraints on DDR IOs ;).  Here comes another one on DDR IOs clocked by DCM clock with phase shift.

The design example used here is exactly the same as in DDR OFFSET IN/OUT constraints with DCM except that a 30 degree phase shift is added to the DCM CLK0 output. The clock period in this example is 20ns, so 30 degrees phase shift translates to ~1.7ns (20ns*30/360). The timing reports on the OFFSET constraints are almost the same. The only difference is that now a ~1.7ns delay added to the time when the clock rising and falling edges.

Below are the timing reports showing the effect of the 30 degree (or 1.7ns) phase shift (highlighted in red) from the DCM:

Timing constraint: TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE COMP "clk_i"         "RISING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected. (0 setup errors, 0 hold errors)
 Minimum allowable offset is   2.718ns.
--------------------------------------------------------------------------------
Slack (setup path):     2.282ns (requirement - (data path - clock path - clock arrival + uncertainty))
  Source:               ddr_d_i (PAD)
  Destination:          IDDR2_inst (FF)
  Destination Clock:    clk1 rising at 1.641ns
  Requirement:          5.000ns
  Data Path Delay:      2.724ns (Levels of Logic = 2)
  Clock Path Delay:     -1.515ns (Levels of Logic = 4)
  Clock Uncertainty:    0.120ns

Timing constraint:  TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE COMP "clk_i"         "FALLING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected. (0 setup errors, 0 hold errors)
 Minimum allowable offset is   2.729ns.
--------------------------------------------------------------------------------
Slack (setup path):      2.271 ns (requirement - (data path - clock path - clock arrival + uncertainty))
  Source:                ddr_d_i  (PAD)
  Destination:           IDDR2_inst  (FF)
  Destination Clock:    clk1 falling at 1.641ns
  Requirement:          5.000ns
  Data Path Delay:      2.724ns (Levels of Logic = 2)
  Clock Path Delay:     -1.526ns (Levels of Logic = 4)
  Clock Uncertainty:    0.120ns

 Timing constraint:  TIMEGRP "TN_ddr_out_pads" OFFSET = OUT 8 ns AFTER COMP "clk_i" "RISING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected.
 Minimum allowable offset is   6.846ns.
--------------------------------------------------------------------------------
Slack (slowest paths):   1.154 ns (requirement - (clock arrival + clock path + data path + uncertainty))
  Source:                ODDR2_inst  (FF)
  Destination:           ddr_d_o  (PAD)
  Source Clock:         clk1 rising at 1.641ns
  Requirement:          8.000ns
  Data Path Delay:      3.561ns (Levels of Logic = 1)
  Clock Path Delay:     1.524ns (Levels of Logic = 4)
  Clock Uncertainty:    0.120ns


Timing constraint:  TIMEGRP "TN_ddr_out_pads" OFFSET = OUT 8 ns AFTER COMP "clk_i" "FALLING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected.
 Minimum allowable offset is   6.850ns.
--------------------------------------------------------------------------------
Slack (slowest paths):   1.150 ns (requirement - (clock arrival + clock path + data path + uncertainty))
  Source:                ODDR2_inst  (FF)
  Destination:           ddr_d_o  (PAD)
  Source Clock:         clk1 falling at 1.641ns
  Requirement:          8.000ns
  Data Path Delay:      3.579ns (Levels of Logic = 1)
  Clock Path Delay:     1.510ns (Levels of Logic = 4)
  Clock Uncertainty:    0.120ns

DDR OFFSET IN/OUT constraints with DCM

UG612: Xilinx Timing Constraints User Guide shows two options to constrain the OFFSET for DDR inputs and outputs. The option 1 is how OFFSET values for DDR IOs are constrained in ISE 9.x and earlier versions. Although it still works in the latest ISE versions (i.e. 10.x and 11.x), but it has always been difficult and painful to use because of a couple of reasons:
  • The offset values in the OFFSET constraints for the falling edge need to be manually adjusted as the reference point (i.e. time 0) is always the rising edge of the clock. 
  • If the clock for the DDR IOs comes from a DCM, you need to watch out in the ngdbuild report for the TIMESPEC on the DCM input clock not propagated through the DCM due the TNM used in more than one constraint.
With that, I highly recommend the option 2 for people using the latest IDS. Below is a Spartan6 example with IDDR, ODDR, and DCM:

Design top level:
module ss_ddr (
     input  clk_i,
     input  rst_i,
     input  ddr_d_i,
     output ddr_d_o
);

wire clk0, clk180;
wire clkgen1_locked;
wire d_rising_d, d_falling_d;
reg d_rising_r, d_falling_r;

clkgen_dcm clkgen1 (
   // Clock in ports
  .CLK_IN1 (clk_i),
  // Clock out ports
  .CLK_OUT1 (clk1),
  .CLK_OUT2 (clk2),
  // Status and control signals
  .RESET    (rst_i),
  .LOCKED   (clkgen1_locked)
 );

assign clk0 = clk1;
assign clk180 = ~clk1;
    
IDDR2 #(
   .DDR_ALIGNMENT ("NONE"), // Sets output alignment to "NONE", "C0" or "C1" 
   .INIT_Q0       (1'b0), // Sets initial state of the Q0 output to 1'b0 or 1'b1
   .INIT_Q1       (1'b0), // Sets initial state of the Q1 output to 1'b0 or 1'b1
   .SRTYPE        ("SYNC") // Specifies "SYNC" or "ASYNC" set/reset
) IDDR2_inst (
   .Q0 (d_rising_d), // 1-bit output captured with C0 clock
   .Q1 (d_falling_d), // 1-bit output captured with C1 clock
   .C0 (clk0), // 1-bit clock input
   .C1 (clk180), // 1-bit clock input
   .CE (1'b1), // 1-bit clock enable input
   .D  (ddr_d_i),   // 1-bit DDR data input
   .R  (1'b0),   // 1-bit reset input
   .S  (1'b0)    // 1-bit set input
);


always @(posedge clk0)
    d_rising_r <= d_rising_d;

always @(posedge clk180)
    d_falling_r <= d_falling_d;


ODDR2 #(
      .DDR_ALIGNMENT ("NONE"), // Sets output alignment to "NONE", "C0" or "C1" 
      .INIT          (1'b0),    // Sets initial state of the Q output to 1'b0 or 1'b1
      .SRTYPE        ("SYNC") // Specifies "SYNC" or "ASYNC" set/reset
   ) ODDR2_inst (
      .Q  (ddr_d_o),   // 1-bit DDR output data
      .C0 (clk0),   // 1-bit clock input
      .C1 (clk180),   // 1-bit clock input
      .CE (1'b1), // 1-bit clock enable input
      .D0 (d_rising_r), // 1-bit data input (associated with C0)
      .D1 (d_falling_r), // 1-bit data input (associated with C1)
      .R  (1'b0),   // 1-bit reset input
      .S  (1'b0)    // 1-bit set input
 );

endmodule                  

UCF constraints:
NET "clk_i" TNM_NET = "TN_clk_i";
TIMESPEC TS_clk_i = PERIOD "TN_clk_i" 20 ns HIGH 50%;

#UG612: Option 2
INST "ddr_d_i" TNM = TN_ddr_in_pads;
TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE "clk_i" RISING;
TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE "clk_i" FALLING;

INST "ddr_d_o" TNM = TN_ddr_out_pads;
TIMEGRP "TN_ddr_out_pads" OFFSET = OUT 8 ns AFTER "clk_i" RISING;
TIMEGRP "TN_ddr_out_pads" OFFSET = OUT 8 ns AFTER "clk_i" FALLING;

Timing reports: (only OFFSET OUT reports are shown below. Please check this older blog for the OFFSET IN reports)
Timing constraint: TIMEGRP "TN_ddr_out_pads" OFFSET = OUT 8 ns AFTER COMP "clk_i" "RISING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected.
 Minimum allowable offset is   5.205ns.
--------------------------------------------------------------------------------
Slack (slowest paths):  2.795ns (requirement - (clock arrival + clock path + data path + uncertainty))
  Source:               ODDR2_inst (FF)
  Destination:          ddr_d_o (PAD)
  Source Clock:         clk1 rising at 0.000ns
  Requirement:          8.000ns
  Data Path Delay:      3.561ns (Levels of Logic = 1)
  Clock Path Delay:     1.524ns (Levels of Logic = 4)
  Clock Uncertainty:    0.120ns

  Clock Uncertainty:          0.120ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.000ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.120ns
    Phase Error (PE):           0.060ns

  Maximum Clock Path: clk_i to ODDR2_inst
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    N4.I                 Tiopi                 1.140   clk_i
                                                       clkgen1/clkin1_buf
                                                       ProtoComp0.IMUX
    BUFIO2_X1Y15.I       net (fanout=1)        0.418   clkgen1/clkin1
    BUFIO2_X1Y15.DIVCLK  Tbufcko_DIVCLK        0.070   SP6_BUFIO_INSERT_ML_BUFIO2_5
    DCM_X0Y1.CLKIN       net (fanout=1)        0.854   clkgen1/dcm_sp_inst_ML_NEW_DIVCLK
    DCM_X0Y1.CLK0        Tdmcko_CLK           -3.868   clkgen1/dcm_sp_inst
    BUFGMUX_X2Y3.I0      net (fanout=1)        0.943   clkgen1/clk0
    BUFGMUX_X2Y3.O       Tgi0o                 0.239   clkgen1/clkout1_buf
    OLOGIC_X0Y10.CLK0    net (fanout=7)        1.728   clk1
    -------------------------------------------------  ---------------------------
    Total                                      1.524ns (-2.419ns logic, 3.943ns route)

  Maximum Data Path: ODDR2_inst to ddr_d_o
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    OLOGIC_X0Y10.OQ      Tockq                 0.775   ODDR2_inst
    V3.O                 net (fanout=1)        0.296   ddr_d_o_OBUF
    V3.PAD               Tioop                 2.490   ddr_d_o_OBUF
                                                       ddr_d_o
    -------------------------------------------------  ---------------------------
    Total                                      3.561ns (3.265ns logic, 0.296ns route)
                                                       (91.7% logic, 8.3% route)
iming constraint: TIMEGRP "TN_ddr_out_pads" OFFSET = OUT 8 ns AFTER COMP "clk_i" "FALLING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected.
 Minimum allowable offset is   5.209ns.
--------------------------------------------------------------------------------
Slack (slowest paths):  2.791ns (requirement - (clock arrival + clock path + data path + uncertainty))
  Source:               ODDR2_inst (FF)
  Destination:          ddr_d_o (PAD)
  Source Clock:         clk1 falling at 0.000ns
  Requirement:          8.000ns
  Data Path Delay:      3.579ns (Levels of Logic = 1)
  Clock Path Delay:     1.510ns (Levels of Logic = 4)
  Clock Uncertainty:    0.120ns

  Clock Uncertainty:          0.120ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.000ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.120ns
    Phase Error (PE):           0.060ns

  Maximum Clock Path: clk_i to ODDR2_inst
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    N4.I                 Tiopi                 1.140   clk_i
                                                       clkgen1/clkin1_buf
                                                       ProtoComp0.IMUX
    BUFIO2_X1Y15.I       net (fanout=1)        0.418   clkgen1/clkin1
    BUFIO2_X1Y15.DIVCLK  Tbufcko_DIVCLK        0.070   SP6_BUFIO_INSERT_ML_BUFIO2_5
    DCM_X0Y1.CLKIN       net (fanout=1)        0.854   clkgen1/dcm_sp_inst_ML_NEW_DIVCLK
    DCM_X0Y1.CLK0        Tdmcko_CLK           -3.868   clkgen1/dcm_sp_inst
    BUFGMUX_X2Y3.I0      net (fanout=1)        0.943   clkgen1/clk0
    BUFGMUX_X2Y3.O       Tgi0o                 0.239   clkgen1/clkout1_buf
    OLOGIC_X0Y10.CLK1    net (fanout=7)        1.714   clk1
    -------------------------------------------------  ---------------------------
    Total                                      1.510ns (-2.419ns logic, 3.929ns route)

  Maximum Data Path: ODDR2_inst to ddr_d_o
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    OLOGIC_X0Y10.OQ      Tockq                 0.793   ODDR2_inst
    V3.O                 net (fanout=1)        0.296   ddr_d_o_OBUF
    V3.PAD               Tioop                 2.490   ddr_d_o_OBUF
                                                       ddr_d_o
    -------------------------------------------------  ---------------------------
    Total                                      3.579ns (3.283ns logic, 0.296ns route)
                                                       (91.7% logic, 8.3% route)

OK, I know this blog is long, but here comes the reward for those who stick around: the complete ISE project targeting Spartan6 6slx45t is available for download here.

Monday, October 5, 2009

OFFSET IN constraints for source synchronous DDR inputs

There are several ways to set up OFFSET IN constraints for source synchronous DDR inputs. Personally I like to put all inputs into a timing group and add OFFSET IN constraint on the timing group. There are several advantages with this method:
  • Inputs with different names can be easily grouped together.
  • Only one (SDR) or two (DDR) OFFSET IN constraints are required for each timing group.
  • Timing report is more concise because of fewer OFFSET IN constraints required.
Below is a Spartan6 design example with just one IDDR2 primitive instantiated:
module ss_ddr_in (
     input  clk_i,
     input  ddr_d_i,
     output d_rising_o,
     output d_falling_o
);

wire clk0, clk180;

assign clk0 = clk_i;
assign clk180 = ~clk_i;
    
IDDR2 #(
   .DDR_ALIGNMENT ("NONE"), // Sets output alignment to "NONE", "C0" or "C1" 
   .INIT_Q0       (1'b0), // Sets initial state of the Q0 output to 1'b0 or 1'b1
   .INIT_Q1       (1'b0), // Sets initial state of the Q1 output to 1'b0 or 1'b1
   .SRTYPE        ("SYNC") // Specifies "SYNC" or "ASYNC" set/reset
) IDDR2_inst (
   .Q0 (d_rising_o), // 1-bit output captured with C0 clock
   .Q1 (d_falling_o), // 1-bit output captured with C1 clock
   .C0 (clk0), // 1-bit clock input
   .C1 (clk180), // 1-bit clock input
   .CE (1'b1), // 1-bit clock enable input
   .D  (ddr_d_i),   // 1-bit DDR data input
   .R  (1'b0),   // 1-bit reset input
   .S  (1'b0)    // 1-bit set input
);

endmodule
Below are UCF constraints for the data input.
NET "clk_i" TNM_NET = clk_i;
TIMESPEC TS_clk_i = PERIOD "clk_i" 20 ns HIGH 50%;

INST "ddr_d_i" TNM = TN_ddr_in_pads;
TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE "clk_i" RISING;
TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE "clk_i" FALLING;
Timing report on the OFFSET constraints:
Timing constraint: TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE COMP "clk_i"         "RISING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected. (0 setup errors, 0 hold errors)
 Offset is  -0.666ns.
--------------------------------------------------------------------------------
Slack (setup path):     5.666ns (requirement - (data path - clock path - clock arrival + uncertainty))
  Source:               ddr_d_i (PAD)
  Destination:          IDDR2_inst (FF)
  Destination Clock:    clk_i_BUFGP rising at 0.000ns
  Requirement:          5.000ns
  Data Path Delay:      2.724ns (Levels of Logic = 2)
  Clock Path Delay:     3.390ns (Levels of Logic = 2)
  Clock Uncertainty:    0.000ns

  Maximum Data Path: ddr_d_i to IDDR2_inst
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    W4.I                 Tiopi                 1.140   ddr_d_i
                                                       ddr_d_i_IBUF
                                                       ProtoComp0.IMUX.1
    ILOGIC_X0Y7.D        net (fanout=1)        0.128   ddr_d_i_IBUF
    ILOGIC_X0Y7.CLK0     Tidock                1.456   ProtoComp2.D2OFFBYP_SRC
                                                       IDDR2_inst
    -------------------------------------------------  ---------------------------
    Total                                      2.724ns (2.596ns logic, 0.128ns route)
                                                       (95.3% logic, 4.7% route)

  Minimum Clock Path: clk_i to IDDR2_inst
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    N4.I                 Tiopi                 1.049   clk_i
                                                       SP6_AUTOBUF_BUFGP_ML_IBUF_4
                                                       ProtoComp0.IMUX
    BUFGMUX_X3Y13.I0     net (fanout=1)        0.747   SP6_AUTOBUF_BUFGP_ML_IBUF_4_ML_NEW_I
    BUFGMUX_X3Y13.O      Tgi0o                 0.220   clk_i_BUFGP
    ILOGIC_X0Y7.CLK0     net (fanout=2)        1.374   clk_i_BUFGP
    -------------------------------------------------  ---------------------------
    Total                                      3.390ns (1.269ns logic, 2.121ns route)
                                                       (37.4% logic, 62.6% route)
Timing constraint: TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE COMP "clk_i"         "FALLING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected. (0 setup errors, 0 hold errors)
 Offset is  -0.666ns.
--------------------------------------------------------------------------------
Slack (setup path):     5.666ns (requirement - (data path - clock path - clock arrival + uncertainty))
  Source:               ddr_d_i (PAD)
  Destination:          IDDR2_inst (FF)
  Destination Clock:    clk_i_BUFGP falling at 0.000ns
  Requirement:          5.000ns
  Data Path Delay:      2.724ns (Levels of Logic = 2)
  Clock Path Delay:     3.390ns (Levels of Logic = 2)
  Clock Uncertainty:    0.000ns

  Maximum Data Path: ddr_d_i to IDDR2_inst
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    W4.I                 Tiopi                 1.140   ddr_d_i
                                                       ddr_d_i_IBUF
                                                       ProtoComp0.IMUX.1
    ILOGIC_X0Y7.D        net (fanout=1)        0.128   ddr_d_i_IBUF
    ILOGIC_X0Y7.CLK1     Tidock                1.456   ProtoComp2.D2OFFBYP_SRC
                                                       IDDR2_inst
    -------------------------------------------------  ---------------------------
    Total                                      2.724ns (2.596ns logic, 0.128ns route)
                                                       (95.3% logic, 4.7% route)

  Minimum Clock Path: clk_i to IDDR2_inst
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    N4.I                 Tiopi                 1.049   clk_i
                                                       SP6_AUTOBUF_BUFGP_ML_IBUF_4
                                                       ProtoComp0.IMUX
    BUFGMUX_X3Y13.I0     net (fanout=1)        0.747   SP6_AUTOBUF_BUFGP_ML_IBUF_4_ML_NEW_I
    BUFGMUX_X3Y13.O      Tgi0o                 0.220   clk_i_BUFGP
    ILOGIC_X0Y7.CLK1     net (fanout=2)        1.374   clk_i_BUFGP
    -------------------------------------------------  ---------------------------
    Total                                      3.390ns (1.269ns logic, 2.121ns route)
                                                       (37.4% logic, 62.6% route)

Please note that ISE continues to improve the constraint syntax and report format, so you may see things work slightly differently in previous versions of ISE.

Below is a list of references that can be very useful:

Monday, September 28, 2009

OFFSET IN constraints on diff inputs ignored in IDS 11.3. Fixed in 12.4

Below is a simple test case with differential input clock and data.
`timescale 1ns / 1ps

module s3a_ibufds (
    input  clk_i_p, clk_i_n,
    input  d_i_p, d_i_n,                      
    output d_o_p, d_o_n                      
    );


wire d_in, clk_in;
reg d_r1, d_r2;

IBUFDS #(
   .IBUF_DELAY_VALUE("0"),    // Specify the amount of added input delay for
                              //    the buffer: "0"-"16" (Spartan-3A)
   .IFD_DELAY_VALUE("AUTO"),  // Specify the amount of added delay for input
                              //    register: "AUTO", "0"-"8" (Spartan-3A)
   .IOSTANDARD("DEFAULT")     // Specify the input I/O standard
) IBUFDS_clk (
   .O  (clk_in),  // Buffer output
   .I  (clk_i_p),  // Diff_p buffer input (connect directly to top-level port)
   .IB (clk_i_n) // Diff_n buffer input (connect directly to top-level port)
);

IBUFDS #(
   .IBUF_DELAY_VALUE("0"),    // Specify the amount of added input delay for
                              //    the buffer: "0"-"16" (Spartan-3A)
   .IFD_DELAY_VALUE("AUTO"),  // Specify the amount of added delay for input
                              //    register: "AUTO", "0"-"8" (Spartan-3A)
   .IOSTANDARD("DEFAULT")     // Specify the input I/O standard
) IBUFDS_d (
   .O  (d_in),  // Buffer output
   .I  (d_i_p),  // Diff_p buffer input (connect directly to top-level port)
   .IB (d_i_n) // Diff_n buffer input (connect directly to top-level port)
);


always @(posedge clk_in) begin
    d_r1 <= d_in;
    d_r2 <= d_r1;
end

assign d_o_p = d_r2;

endmodule

When the OFFSET IN constraints are specified with input nets,  they are simply ignored by the timing analyzer in ISE 11.3 (see UCF constraints and TA snapshot below):

NET "clk_i_p" TNM_NET = clk_i_p;
TIMESPEC TS_clk_i_p = PERIOD "clk_i_p" 20 ns HIGH 50%;
NET "d_i_p" OFFSET = IN 2 ns VALID 4 ns BEFORE "clk_i_p" RISING;
NET "d_i_n" OFFSET = IN 2 ns VALID 4 ns BEFORE "clk_i_p" RISING;


This is probably caused by a bug in TA. As a workaround, the OFFSET IN constraints can be also specified with TIMEGRP and the tool will correctly analyze the timing constraint (see the UCF constraints and TA snapshot below).
[Update Feb 10th, 2011: Verified that the bug has been fixed in IDS 12.4] The project archive can be downloaded here in case anybody is interested.

NET "clk_i_p" TNM_NET = clk_i_p;
TIMESPEC TS_clk_i_p = PERIOD "clk_i_p" 20 ns HIGH 50%; 
INST "d_i_p" TNM = TN_d_pads;
INST "d_i_n" TNM = TN_d_pads;
TIMEGRP "TN_d_pads" OFFSET = IN 2 ns VALID 4 ns BEFORE clk_i_p; 
 

Wednesday, September 23, 2009

Virtex6 Configuration Examples

The web links below will bring you directly to the block diagram for the selected configuration mode from the "Virtex6 FPGA Configuration User Guide" (UG360 v2.0 Nov 15, 2009). This is also the landing page for ADEPT (website and blog) when the "Example" button on the "Special Pin Setup" window is pressed.

Friday, September 18, 2009

Virtex4 ODDR tristate control

The tristate control of the tristate buffer in IOB is active low. It's important to code this in RTL for the tool to correctly implement it in HW. e.g.

assign dqbit = (ts_ddrbit == 1'b0) ? out_ddrbit : 1'bz;

A code example with IDDR, ODDRs for both the data and tristate control in Virtex4 is shown below. It's tested with Xilinx IDS 11.3.


`timescale 1ns / 1ps

module tristate_oddr(
input iddr_clk,
input oddr_clk,
inout dqbit,
input ts_ddrbit1,
input ts_ddrbit2,
input dq_bit1,
input dq_bit2,

output test_out1,
output test_out2
);

wire out_ddrbit, ts_ddr_bit;

ODDR #(
.DDR_CLK_EDGE("OPPOSITE_EDGE"), // "OPPOSITE_EDGE" or "SAME_EDGE"
.INIT (1'b0), // Initial value of Q: 1’b0 or 1’b1
.SRTYPE ("SYNC") // Set/Reset type: "SYNC" or "ASYNC"
) oddr_out_ddrbit (
.Q (out_ddrbit), // 1-bit DDR output
.C (oddr_clk), // 1-bit clock input
.CE (1'b1), // 1-bit clock enable input
.D1 (dq_bit1), // 1-bit data input (positive edge)
.D2 (dq_bit2), // 1-bit data input (negative edge)
.R (1'b0), // 1-bit reset
.S (1'b0) // 1-bit set
);

ODDR #(
.DDR_CLK_EDGE("OPPOSITE_EDGE"), // "OPPOSITE_EDGE" or "SAME_EDGE"
.INIT (1'b0), // Initial value of Q: 1’b0 or 1’b1
.SRTYPE ("SYNC") // Set/Reset type: "SYNC" or "ASYNC"
) oddr_ts_ddrbit (
.Q (ts_ddrbit), // 1-bit DDR output
.C (oddr_clk), // 1-bit clock input
.CE (1'b1), // 1-bit clock enable input
.D1 (ts_ddrbit1), // 1-bit data input (positive edge)
.D2 (ts_ddrbit2), // 1-bit data input (negative edge)
.R (1'b0), // 1-bit reset
.S (1'b0) // 1-bit set
);


//IMPORTANT: the tristate control of the tristate buffer in IOB is active low.
assign dqbit = (ts_ddrbit == 1'b0) ? out_ddrbit : 1'bz;


IDDR #(
.DDR_CLK_EDGE ("OPPOSITE_EDGE"), // "OPPOSITE_EDGE", "SAME_EDGE"
// or "SAME_EDGE_PIPELINED"
.INIT_Q1 (1'b0), // Initial value of Q1: 1’b0 or 1’b1
.INIT_Q2 (1'b0), // Initial value of Q2: 1’b0 or 1’b1
.SRTYPE ("SYNC") // Set/Reset type: "SYNC" or "ASYNC"
) IDDR_dqbit (
.Q1 (test_out1), // 1-bit output for positive edge of clock
.Q2 (test_out2), // 1-bit output for negative edge of clock
.C (iddr_clk), // 1-bit clock input
.CE (1'b1), // 1-bit clock enable input
.D (dqbit), // 1-bit DDR data input
.R (1'b0), // 1-bit reset
.S (1'b0) // 1-bit set
);

endmodule


Below is what the implementation looks like in FPGA_EDITOR:

IDDR, ODDR and IOBUF:


Inside ODDR:


Inside IOB: