Tuesday, October 27, 2009

Don't optimize my LUT please!

Sometimes you manually instantiate a LUT primitive (e.g. LUT6_2) to add routing delays to the signal path or precisely control the routing resources used but only to find out that the tool either optimizes it out or swaps pins. This can be prevented by using the LOCK_PINS and SAVE NET FLAG (S) constraints (Xilinx Constrants Guide). Below are the code snippets for Verilog and VHDL that work in ISE 11.3:

Sunday, October 11, 2009

Wait, what about DDR OFFSET IN/OUT using DCM clock with phase shift?

I recently wrote two blogs about DDR OFFSET constraints:
Looks like I'm going to make a career out of talking about the OFFSET constraints on DDR IOs ;).  Here comes another one on DDR IOs clocked by DCM clock with phase shift.

The design example used here is exactly the same as in DDR OFFSET IN/OUT constraints with DCM except that a 30 degree phase shift is added to the DCM CLK0 output. The clock period in this example is 20ns, so 30 degrees phase shift translates to ~1.7ns (20ns*30/360). The timing reports on the OFFSET constraints are almost the same. The only difference is that now a ~1.7ns delay added to the time when the clock rising and falling edges.

Below are the timing reports showing the effect of the 30 degree (or 1.7ns) phase shift (highlighted in red) from the DCM:

Timing constraint: TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE COMP "clk_i"         "RISING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected. (0 setup errors, 0 hold errors)
 Minimum allowable offset is   2.718ns.
--------------------------------------------------------------------------------
Slack (setup path):     2.282ns (requirement - (data path - clock path - clock arrival + uncertainty))
  Source:               ddr_d_i (PAD)
  Destination:          IDDR2_inst (FF)
  Destination Clock:    clk1 rising at 1.641ns
  Requirement:          5.000ns
  Data Path Delay:      2.724ns (Levels of Logic = 2)
  Clock Path Delay:     -1.515ns (Levels of Logic = 4)
  Clock Uncertainty:    0.120ns

Timing constraint:  TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE COMP "clk_i"         "FALLING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected. (0 setup errors, 0 hold errors)
 Minimum allowable offset is   2.729ns.
--------------------------------------------------------------------------------
Slack (setup path):      2.271 ns (requirement - (data path - clock path - clock arrival + uncertainty))
  Source:                ddr_d_i  (PAD)
  Destination:           IDDR2_inst  (FF)
  Destination Clock:    clk1 falling at 1.641ns
  Requirement:          5.000ns
  Data Path Delay:      2.724ns (Levels of Logic = 2)
  Clock Path Delay:     -1.526ns (Levels of Logic = 4)
  Clock Uncertainty:    0.120ns

 Timing constraint:  TIMEGRP "TN_ddr_out_pads" OFFSET = OUT 8 ns AFTER COMP "clk_i" "RISING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected.
 Minimum allowable offset is   6.846ns.
--------------------------------------------------------------------------------
Slack (slowest paths):   1.154 ns (requirement - (clock arrival + clock path + data path + uncertainty))
  Source:                ODDR2_inst  (FF)
  Destination:           ddr_d_o  (PAD)
  Source Clock:         clk1 rising at 1.641ns
  Requirement:          8.000ns
  Data Path Delay:      3.561ns (Levels of Logic = 1)
  Clock Path Delay:     1.524ns (Levels of Logic = 4)
  Clock Uncertainty:    0.120ns


Timing constraint:  TIMEGRP "TN_ddr_out_pads" OFFSET = OUT 8 ns AFTER COMP "clk_i" "FALLING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected.
 Minimum allowable offset is   6.850ns.
--------------------------------------------------------------------------------
Slack (slowest paths):   1.150 ns (requirement - (clock arrival + clock path + data path + uncertainty))
  Source:                ODDR2_inst  (FF)
  Destination:           ddr_d_o  (PAD)
  Source Clock:         clk1 falling at 1.641ns
  Requirement:          8.000ns
  Data Path Delay:      3.579ns (Levels of Logic = 1)
  Clock Path Delay:     1.510ns (Levels of Logic = 4)
  Clock Uncertainty:    0.120ns

DDR OFFSET IN/OUT constraints with DCM

UG612: Xilinx Timing Constraints User Guide shows two options to constrain the OFFSET for DDR inputs and outputs. The option 1 is how OFFSET values for DDR IOs are constrained in ISE 9.x and earlier versions. Although it still works in the latest ISE versions (i.e. 10.x and 11.x), but it has always been difficult and painful to use because of a couple of reasons:
  • The offset values in the OFFSET constraints for the falling edge need to be manually adjusted as the reference point (i.e. time 0) is always the rising edge of the clock. 
  • If the clock for the DDR IOs comes from a DCM, you need to watch out in the ngdbuild report for the TIMESPEC on the DCM input clock not propagated through the DCM due the TNM used in more than one constraint.
With that, I highly recommend the option 2 for people using the latest IDS. Below is a Spartan6 example with IDDR, ODDR, and DCM:

Design top level:
module ss_ddr (
     input  clk_i,
     input  rst_i,
     input  ddr_d_i,
     output ddr_d_o
);

wire clk0, clk180;
wire clkgen1_locked;
wire d_rising_d, d_falling_d;
reg d_rising_r, d_falling_r;

clkgen_dcm clkgen1 (
   // Clock in ports
  .CLK_IN1 (clk_i),
  // Clock out ports
  .CLK_OUT1 (clk1),
  .CLK_OUT2 (clk2),
  // Status and control signals
  .RESET    (rst_i),
  .LOCKED   (clkgen1_locked)
 );

assign clk0 = clk1;
assign clk180 = ~clk1;
    
IDDR2 #(
   .DDR_ALIGNMENT ("NONE"), // Sets output alignment to "NONE", "C0" or "C1" 
   .INIT_Q0       (1'b0), // Sets initial state of the Q0 output to 1'b0 or 1'b1
   .INIT_Q1       (1'b0), // Sets initial state of the Q1 output to 1'b0 or 1'b1
   .SRTYPE        ("SYNC") // Specifies "SYNC" or "ASYNC" set/reset
) IDDR2_inst (
   .Q0 (d_rising_d), // 1-bit output captured with C0 clock
   .Q1 (d_falling_d), // 1-bit output captured with C1 clock
   .C0 (clk0), // 1-bit clock input
   .C1 (clk180), // 1-bit clock input
   .CE (1'b1), // 1-bit clock enable input
   .D  (ddr_d_i),   // 1-bit DDR data input
   .R  (1'b0),   // 1-bit reset input
   .S  (1'b0)    // 1-bit set input
);


always @(posedge clk0)
    d_rising_r <= d_rising_d;

always @(posedge clk180)
    d_falling_r <= d_falling_d;


ODDR2 #(
      .DDR_ALIGNMENT ("NONE"), // Sets output alignment to "NONE", "C0" or "C1" 
      .INIT          (1'b0),    // Sets initial state of the Q output to 1'b0 or 1'b1
      .SRTYPE        ("SYNC") // Specifies "SYNC" or "ASYNC" set/reset
   ) ODDR2_inst (
      .Q  (ddr_d_o),   // 1-bit DDR output data
      .C0 (clk0),   // 1-bit clock input
      .C1 (clk180),   // 1-bit clock input
      .CE (1'b1), // 1-bit clock enable input
      .D0 (d_rising_r), // 1-bit data input (associated with C0)
      .D1 (d_falling_r), // 1-bit data input (associated with C1)
      .R  (1'b0),   // 1-bit reset input
      .S  (1'b0)    // 1-bit set input
 );

endmodule                  

UCF constraints:
NET "clk_i" TNM_NET = "TN_clk_i";
TIMESPEC TS_clk_i = PERIOD "TN_clk_i" 20 ns HIGH 50%;

#UG612: Option 2
INST "ddr_d_i" TNM = TN_ddr_in_pads;
TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE "clk_i" RISING;
TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE "clk_i" FALLING;

INST "ddr_d_o" TNM = TN_ddr_out_pads;
TIMEGRP "TN_ddr_out_pads" OFFSET = OUT 8 ns AFTER "clk_i" RISING;
TIMEGRP "TN_ddr_out_pads" OFFSET = OUT 8 ns AFTER "clk_i" FALLING;

Timing reports: (only OFFSET OUT reports are shown below. Please check this older blog for the OFFSET IN reports)
Timing constraint: TIMEGRP "TN_ddr_out_pads" OFFSET = OUT 8 ns AFTER COMP "clk_i" "RISING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected.
 Minimum allowable offset is   5.205ns.
--------------------------------------------------------------------------------
Slack (slowest paths):  2.795ns (requirement - (clock arrival + clock path + data path + uncertainty))
  Source:               ODDR2_inst (FF)
  Destination:          ddr_d_o (PAD)
  Source Clock:         clk1 rising at 0.000ns
  Requirement:          8.000ns
  Data Path Delay:      3.561ns (Levels of Logic = 1)
  Clock Path Delay:     1.524ns (Levels of Logic = 4)
  Clock Uncertainty:    0.120ns

  Clock Uncertainty:          0.120ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.000ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.120ns
    Phase Error (PE):           0.060ns

  Maximum Clock Path: clk_i to ODDR2_inst
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    N4.I                 Tiopi                 1.140   clk_i
                                                       clkgen1/clkin1_buf
                                                       ProtoComp0.IMUX
    BUFIO2_X1Y15.I       net (fanout=1)        0.418   clkgen1/clkin1
    BUFIO2_X1Y15.DIVCLK  Tbufcko_DIVCLK        0.070   SP6_BUFIO_INSERT_ML_BUFIO2_5
    DCM_X0Y1.CLKIN       net (fanout=1)        0.854   clkgen1/dcm_sp_inst_ML_NEW_DIVCLK
    DCM_X0Y1.CLK0        Tdmcko_CLK           -3.868   clkgen1/dcm_sp_inst
    BUFGMUX_X2Y3.I0      net (fanout=1)        0.943   clkgen1/clk0
    BUFGMUX_X2Y3.O       Tgi0o                 0.239   clkgen1/clkout1_buf
    OLOGIC_X0Y10.CLK0    net (fanout=7)        1.728   clk1
    -------------------------------------------------  ---------------------------
    Total                                      1.524ns (-2.419ns logic, 3.943ns route)

  Maximum Data Path: ODDR2_inst to ddr_d_o
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    OLOGIC_X0Y10.OQ      Tockq                 0.775   ODDR2_inst
    V3.O                 net (fanout=1)        0.296   ddr_d_o_OBUF
    V3.PAD               Tioop                 2.490   ddr_d_o_OBUF
                                                       ddr_d_o
    -------------------------------------------------  ---------------------------
    Total                                      3.561ns (3.265ns logic, 0.296ns route)
                                                       (91.7% logic, 8.3% route)
iming constraint: TIMEGRP "TN_ddr_out_pads" OFFSET = OUT 8 ns AFTER COMP "clk_i" "FALLING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected.
 Minimum allowable offset is   5.209ns.
--------------------------------------------------------------------------------
Slack (slowest paths):  2.791ns (requirement - (clock arrival + clock path + data path + uncertainty))
  Source:               ODDR2_inst (FF)
  Destination:          ddr_d_o (PAD)
  Source Clock:         clk1 falling at 0.000ns
  Requirement:          8.000ns
  Data Path Delay:      3.579ns (Levels of Logic = 1)
  Clock Path Delay:     1.510ns (Levels of Logic = 4)
  Clock Uncertainty:    0.120ns

  Clock Uncertainty:          0.120ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.000ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.120ns
    Phase Error (PE):           0.060ns

  Maximum Clock Path: clk_i to ODDR2_inst
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    N4.I                 Tiopi                 1.140   clk_i
                                                       clkgen1/clkin1_buf
                                                       ProtoComp0.IMUX
    BUFIO2_X1Y15.I       net (fanout=1)        0.418   clkgen1/clkin1
    BUFIO2_X1Y15.DIVCLK  Tbufcko_DIVCLK        0.070   SP6_BUFIO_INSERT_ML_BUFIO2_5
    DCM_X0Y1.CLKIN       net (fanout=1)        0.854   clkgen1/dcm_sp_inst_ML_NEW_DIVCLK
    DCM_X0Y1.CLK0        Tdmcko_CLK           -3.868   clkgen1/dcm_sp_inst
    BUFGMUX_X2Y3.I0      net (fanout=1)        0.943   clkgen1/clk0
    BUFGMUX_X2Y3.O       Tgi0o                 0.239   clkgen1/clkout1_buf
    OLOGIC_X0Y10.CLK1    net (fanout=7)        1.714   clk1
    -------------------------------------------------  ---------------------------
    Total                                      1.510ns (-2.419ns logic, 3.929ns route)

  Maximum Data Path: ODDR2_inst to ddr_d_o
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    OLOGIC_X0Y10.OQ      Tockq                 0.793   ODDR2_inst
    V3.O                 net (fanout=1)        0.296   ddr_d_o_OBUF
    V3.PAD               Tioop                 2.490   ddr_d_o_OBUF
                                                       ddr_d_o
    -------------------------------------------------  ---------------------------
    Total                                      3.579ns (3.283ns logic, 0.296ns route)
                                                       (91.7% logic, 8.3% route)

OK, I know this blog is long, but here comes the reward for those who stick around: the complete ISE project targeting Spartan6 6slx45t is available for download here.

Monday, October 5, 2009

OFFSET IN constraints for source synchronous DDR inputs

There are several ways to set up OFFSET IN constraints for source synchronous DDR inputs. Personally I like to put all inputs into a timing group and add OFFSET IN constraint on the timing group. There are several advantages with this method:
  • Inputs with different names can be easily grouped together.
  • Only one (SDR) or two (DDR) OFFSET IN constraints are required for each timing group.
  • Timing report is more concise because of fewer OFFSET IN constraints required.
Below is a Spartan6 design example with just one IDDR2 primitive instantiated:
module ss_ddr_in (
     input  clk_i,
     input  ddr_d_i,
     output d_rising_o,
     output d_falling_o
);

wire clk0, clk180;

assign clk0 = clk_i;
assign clk180 = ~clk_i;
    
IDDR2 #(
   .DDR_ALIGNMENT ("NONE"), // Sets output alignment to "NONE", "C0" or "C1" 
   .INIT_Q0       (1'b0), // Sets initial state of the Q0 output to 1'b0 or 1'b1
   .INIT_Q1       (1'b0), // Sets initial state of the Q1 output to 1'b0 or 1'b1
   .SRTYPE        ("SYNC") // Specifies "SYNC" or "ASYNC" set/reset
) IDDR2_inst (
   .Q0 (d_rising_o), // 1-bit output captured with C0 clock
   .Q1 (d_falling_o), // 1-bit output captured with C1 clock
   .C0 (clk0), // 1-bit clock input
   .C1 (clk180), // 1-bit clock input
   .CE (1'b1), // 1-bit clock enable input
   .D  (ddr_d_i),   // 1-bit DDR data input
   .R  (1'b0),   // 1-bit reset input
   .S  (1'b0)    // 1-bit set input
);

endmodule
Below are UCF constraints for the data input.
NET "clk_i" TNM_NET = clk_i;
TIMESPEC TS_clk_i = PERIOD "clk_i" 20 ns HIGH 50%;

INST "ddr_d_i" TNM = TN_ddr_in_pads;
TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE "clk_i" RISING;
TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE "clk_i" FALLING;
Timing report on the OFFSET constraints:
Timing constraint: TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE COMP "clk_i"         "RISING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected. (0 setup errors, 0 hold errors)
 Offset is  -0.666ns.
--------------------------------------------------------------------------------
Slack (setup path):     5.666ns (requirement - (data path - clock path - clock arrival + uncertainty))
  Source:               ddr_d_i (PAD)
  Destination:          IDDR2_inst (FF)
  Destination Clock:    clk_i_BUFGP rising at 0.000ns
  Requirement:          5.000ns
  Data Path Delay:      2.724ns (Levels of Logic = 2)
  Clock Path Delay:     3.390ns (Levels of Logic = 2)
  Clock Uncertainty:    0.000ns

  Maximum Data Path: ddr_d_i to IDDR2_inst
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    W4.I                 Tiopi                 1.140   ddr_d_i
                                                       ddr_d_i_IBUF
                                                       ProtoComp0.IMUX.1
    ILOGIC_X0Y7.D        net (fanout=1)        0.128   ddr_d_i_IBUF
    ILOGIC_X0Y7.CLK0     Tidock                1.456   ProtoComp2.D2OFFBYP_SRC
                                                       IDDR2_inst
    -------------------------------------------------  ---------------------------
    Total                                      2.724ns (2.596ns logic, 0.128ns route)
                                                       (95.3% logic, 4.7% route)

  Minimum Clock Path: clk_i to IDDR2_inst
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    N4.I                 Tiopi                 1.049   clk_i
                                                       SP6_AUTOBUF_BUFGP_ML_IBUF_4
                                                       ProtoComp0.IMUX
    BUFGMUX_X3Y13.I0     net (fanout=1)        0.747   SP6_AUTOBUF_BUFGP_ML_IBUF_4_ML_NEW_I
    BUFGMUX_X3Y13.O      Tgi0o                 0.220   clk_i_BUFGP
    ILOGIC_X0Y7.CLK0     net (fanout=2)        1.374   clk_i_BUFGP
    -------------------------------------------------  ---------------------------
    Total                                      3.390ns (1.269ns logic, 2.121ns route)
                                                       (37.4% logic, 62.6% route)
Timing constraint: TIMEGRP "TN_ddr_in_pads" OFFSET = IN 5 ns VALID 10 ns BEFORE COMP "clk_i"         "FALLING";
 1 path analyzed, 1 endpoint analyzed, 0 failing endpoints
 0 timing errors detected. (0 setup errors, 0 hold errors)
 Offset is  -0.666ns.
--------------------------------------------------------------------------------
Slack (setup path):     5.666ns (requirement - (data path - clock path - clock arrival + uncertainty))
  Source:               ddr_d_i (PAD)
  Destination:          IDDR2_inst (FF)
  Destination Clock:    clk_i_BUFGP falling at 0.000ns
  Requirement:          5.000ns
  Data Path Delay:      2.724ns (Levels of Logic = 2)
  Clock Path Delay:     3.390ns (Levels of Logic = 2)
  Clock Uncertainty:    0.000ns

  Maximum Data Path: ddr_d_i to IDDR2_inst
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    W4.I                 Tiopi                 1.140   ddr_d_i
                                                       ddr_d_i_IBUF
                                                       ProtoComp0.IMUX.1
    ILOGIC_X0Y7.D        net (fanout=1)        0.128   ddr_d_i_IBUF
    ILOGIC_X0Y7.CLK1     Tidock                1.456   ProtoComp2.D2OFFBYP_SRC
                                                       IDDR2_inst
    -------------------------------------------------  ---------------------------
    Total                                      2.724ns (2.596ns logic, 0.128ns route)
                                                       (95.3% logic, 4.7% route)

  Minimum Clock Path: clk_i to IDDR2_inst
    Location             Delay type         Delay(ns)  Logical Resource(s)
    -------------------------------------------------  -------------------
    N4.I                 Tiopi                 1.049   clk_i
                                                       SP6_AUTOBUF_BUFGP_ML_IBUF_4
                                                       ProtoComp0.IMUX
    BUFGMUX_X3Y13.I0     net (fanout=1)        0.747   SP6_AUTOBUF_BUFGP_ML_IBUF_4_ML_NEW_I
    BUFGMUX_X3Y13.O      Tgi0o                 0.220   clk_i_BUFGP
    ILOGIC_X0Y7.CLK1     net (fanout=2)        1.374   clk_i_BUFGP
    -------------------------------------------------  ---------------------------
    Total                                      3.390ns (1.269ns logic, 2.121ns route)
                                                       (37.4% logic, 62.6% route)

Please note that ISE continues to improve the constraint syntax and report format, so you may see things work slightly differently in previous versions of ISE.

Below is a list of references that can be very useful: