From c62c80f7becfd04398485e744848086c791de630 Mon Sep 17 00:00:00 2001 From: Keith Rothman <537074+litghost@users.noreply.github.com> Date: Wed, 7 Apr 2021 14:52:39 -0700 Subject: Initial BEL and site design discussion. Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com> --- docs/bel_and_site_design.md | 132 +++++++++++++++++++++++++++++++ docs/eos_slice.png-057.png | Bin 0 -> 67356 bytes docs/frac_lut4.png | Bin 0 -> 14712 bytes docs/frac_lut4_a.png | Bin 0 -> 16987 bytes docs/frac_lut4_b.png | Bin 0 -> 17135 bytes docs/highlight_bottom_lut6.png | Bin 0 -> 107058 bytes docs/highlight_muxf5.png | Bin 0 -> 105240 bytes docs/highlight_muxf5_muxf6.png | Bin 0 -> 105303 bytes docs/highlight_top_lut6.png | Bin 0 -> 107113 bytes docs/stratix10_highlight_lut5.png | Bin 0 -> 65872 bytes docs/stratix10_highlight_lut6.png | Bin 0 -> 65726 bytes docs/stratix10_highlight_muxf5_muxf6.png | Bin 0 -> 64485 bytes docs/stratix10_slice.png-11.png | Bin 0 -> 52592 bytes docs/stratix2_slice.png-026.png | Bin 0 -> 84484 bytes docs/stratix2_slice.png-026_rotate.png | Bin 0 -> 95601 bytes docs/versal_slice.png-12.png | Bin 0 -> 181163 bytes 16 files changed, 132 insertions(+) create mode 100644 docs/bel_and_site_design.md create mode 100644 docs/eos_slice.png-057.png create mode 100644 docs/frac_lut4.png create mode 100644 docs/frac_lut4_a.png create mode 100644 docs/frac_lut4_b.png create mode 100644 docs/highlight_bottom_lut6.png create mode 100644 docs/highlight_muxf5.png create mode 100644 docs/highlight_muxf5_muxf6.png create mode 100644 docs/highlight_top_lut6.png create mode 100644 docs/stratix10_highlight_lut5.png create mode 100644 docs/stratix10_highlight_lut6.png create mode 100644 docs/stratix10_highlight_muxf5_muxf6.png create mode 100644 docs/stratix10_slice.png-11.png create mode 100644 docs/stratix2_slice.png-026.png create mode 100644 docs/stratix2_slice.png-026_rotate.png create mode 100644 docs/versal_slice.png-12.png diff --git a/docs/bel_and_site_design.md b/docs/bel_and_site_design.md new file mode 100644 index 0000000..943e8c3 --- /dev/null +++ b/docs/bel_and_site_design.md @@ -0,0 +1,132 @@ +## Cell, BEL and Site Design + +One of the key concepts within the FPGA interchange device resources is the +relationship between the cell library and the device BEL and site definitions. +A well designed cell library and a flexible but consise BEL and site +definition is important for exposing the hardware in an efficient way that +enables a place and route tool to succeed. + +Good design is hard to capture, but this document will talk about some of the +considerations. + +## Granularity of the cell library + +It is important to divide the place and route problem and the synthesis +problem, at least as defined for the purpose of the FPGA interchange. The +synthesis tool operates on the **cell library**, which should be designed to +expose logic elements at a useful level of granularity. + +As a concrete example, a LUT4 element is techinically just two LUT3 elements, +connected by a mux (e.g. MUXF4), a LUT3 element is just two LUT2 elements, +connected by a mux (e.g. MUXF3), etc. If the outputs of those interior muxes +are not accessible to the place and route tool, then exposing those interior +function muxes as cells in the cell library is not a useful. + +Cell definitions should be granular enough that the synthesis can map to +them, but not so granular that the place and route tool will be making few if +any choices. If there is only one legal placement of the cell, it's value is +relatively low. + +## Drawing site boundries + +When designing an FPGA interchange device resource for a new fabric, one +important consideration is where to draw the site boundary. The primary goal +of lumping BELs within a site is to capture some local congestion due to +fanout limitations. Interior static routing muxes and output muxes may +accomidate significantly fewer signals than the possible number of BELs that +drive them. In this case, it is important to draw the site boundary large +enough to capture these cases so as to enable the local congestion to be +resolved during either packing for clustered approaches, or during placement +during unclustered approaches. In either case, local congestion that is +strongly placement dependant must be resolved prior to general routing, +unless a fused placement and routing algorithm is used. + +## Drawing BEL boundaries + +BEL definitions require that creating a boundary around primitive elements of +the fabric. The choice of where to place that boundary has a strong influence +on the design of the cell library in the FPGA interchange. + +In general, the smaller the BEL boundary, the more complexity is exposed to +the place and route tool. In some cases exposing this complexity is +important, because it enables some goal. For example, leaving static routing +muxes outside of BELs enables a place and route tool to have greater +flexiblity when resolving site congestion. But as a counter point, if only +a handful of static mux configurations are useful and those choices can be +made at synthesis time, then lumping those muxes into synthesis reduces the +complexity required in the place and route tool. + +The most common case where the static routing muxes are typically lumped into +the BEL is BRAM's and FIFO's address and routing configuration. At synthesis +time, a choice is made about the address and data widths, which are encoded as +parameters on the cell. The place and route tool does not typically make +meaningful choices on to configuration those static routing muxes, but they +do exist in the hardware. + +The most common case where the static routing muxes are almost never lumped +into the BEL is SLICE-type situations. The remainder of this document will +show examples of why the BEL boundary should typically exclude the static +routing muxes, and leave the choice to the place and route tooling. + +### Stratix II and Stratix 10 ALM + +![Stratix II](stratix2_slice.png-026_rotate.png) + +![Stratix 10](stratix10_slice.png-11.png) + +Consider both Stratix II and Stratix 10 logic sites. The first thing to note +is that the architectures at this level are actually mostly the same. Though +it isn't immediately apparent, both designs are structured around 4 4-LUT +elements. + +Take note that of the following structure: + +![Stratix II fractured LUT4](frac_lut4.png) + +This is actually just two LUT4 elements, where the top select line is +independent. + +See the following two figures: + +![Stratix II fractured LUT4 Top](frac_lut4_a.png) +![Stratix II fractured LUT4 Bottom](frac_lut4_b.png) + +In Stratix 10, the LUT4 element is still present, but the top select line +fracturing was removed. + +So now consider the output paths from the the 4 LUT4 elements in the Stratix +II site. Some of the LUT4 outputs route directly to the carry element, so it +will be important for the place and route tool be able to place a LUT4 or +smaller to access that direct connection. But if the output is not used in +the carry element, then it can only be accessed in Stratix II via the MUXF5 +(blue below) and MUXF6 (red below) elements. + +![Stratix II Highlight MUXF5 and MUXF6](highlight_muxf5_muxf6.png) + +So given the Stratix II site layout, the following BELs will be requires: + + - 4 LUT4 BELs that connect to the carry + - 2 LUT6 BELs that connect to the output FF or output MUX. + +The two LUT6 BELs are shown below: + +![Stratix II Top LUT6](highlight_top_lut6.png) +![Stratix II Top LUT6](highlight_bottom_lut6.png) + +Drawing a smaller BEL boundary has little value, because a LUT5 element would +still always require routing through the MUXF6 element. + +Now consider the Stratix 10 output arrangement. The LUT4 elements direct to +the carry element is the same, so those BELs would be identical. The Stratix +10 site now has an output tap directly on the top LUT5, similiar to the Xilinx +Versal LUT6 / LUT5 fracture setup. See diagram below. LUT5 element is shown +in blue, and LUT6 element is shown in red. + +![Stratix 10 2 LUT5](stratix10_highlight_lut5.png) +![Stratix 10 LUT6](stratix10_highlight_lut6.png) + +So given the Stratix 10 site layout, the following BELs will be requires: + + - 4 LUT4 BELs that connect to the carry + - 2 LUT5 BELs that connect to the output FF or output MUX + - 1 LUT6 BELs that connect to the output FF or output MUX diff --git a/docs/eos_slice.png-057.png b/docs/eos_slice.png-057.png new file mode 100644 index 0000000..d0597ca Binary files /dev/null and b/docs/eos_slice.png-057.png differ diff --git a/docs/frac_lut4.png b/docs/frac_lut4.png new file mode 100644 index 0000000..8ae555d Binary files /dev/null and b/docs/frac_lut4.png differ diff --git a/docs/frac_lut4_a.png b/docs/frac_lut4_a.png new file mode 100644 index 0000000..9f70043 Binary files /dev/null and b/docs/frac_lut4_a.png differ diff --git a/docs/frac_lut4_b.png b/docs/frac_lut4_b.png new file mode 100644 index 0000000..4974781 Binary files /dev/null and b/docs/frac_lut4_b.png differ diff --git a/docs/highlight_bottom_lut6.png b/docs/highlight_bottom_lut6.png new file mode 100644 index 0000000..2f82340 Binary files /dev/null and b/docs/highlight_bottom_lut6.png differ diff --git a/docs/highlight_muxf5.png b/docs/highlight_muxf5.png new file mode 100644 index 0000000..94f0228 Binary files /dev/null and b/docs/highlight_muxf5.png differ diff --git a/docs/highlight_muxf5_muxf6.png b/docs/highlight_muxf5_muxf6.png new file mode 100644 index 0000000..5512685 Binary files /dev/null and b/docs/highlight_muxf5_muxf6.png differ diff --git a/docs/highlight_top_lut6.png b/docs/highlight_top_lut6.png new file mode 100644 index 0000000..a78a1c2 Binary files /dev/null and b/docs/highlight_top_lut6.png differ diff --git a/docs/stratix10_highlight_lut5.png b/docs/stratix10_highlight_lut5.png new file mode 100644 index 0000000..ae621a6 Binary files /dev/null and b/docs/stratix10_highlight_lut5.png differ diff --git a/docs/stratix10_highlight_lut6.png b/docs/stratix10_highlight_lut6.png new file mode 100644 index 0000000..c14aab1 Binary files /dev/null and b/docs/stratix10_highlight_lut6.png differ diff --git a/docs/stratix10_highlight_muxf5_muxf6.png b/docs/stratix10_highlight_muxf5_muxf6.png new file mode 100644 index 0000000..3addc52 Binary files /dev/null and b/docs/stratix10_highlight_muxf5_muxf6.png differ diff --git a/docs/stratix10_slice.png-11.png b/docs/stratix10_slice.png-11.png new file mode 100644 index 0000000..a84aa6a Binary files /dev/null and b/docs/stratix10_slice.png-11.png differ diff --git a/docs/stratix2_slice.png-026.png b/docs/stratix2_slice.png-026.png new file mode 100644 index 0000000..c1efec6 Binary files /dev/null and b/docs/stratix2_slice.png-026.png differ diff --git a/docs/stratix2_slice.png-026_rotate.png b/docs/stratix2_slice.png-026_rotate.png new file mode 100644 index 0000000..6021abd Binary files /dev/null and b/docs/stratix2_slice.png-026_rotate.png differ diff --git a/docs/versal_slice.png-12.png b/docs/versal_slice.png-12.png new file mode 100644 index 0000000..84eb163 Binary files /dev/null and b/docs/versal_slice.png-12.png differ -- cgit v1.2.3 From cfdbd68b967771069022f403828d93fe022938ed Mon Sep 17 00:00:00 2001 From: Keith Rothman <537074+litghost@users.noreply.github.com> Date: Thu, 8 Apr 2021 09:19:30 -0700 Subject: Address review feedback. - Some spelling and grammar fixes - Added a section on control signals Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com> --- docs/bel_and_site_design.md | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/docs/bel_and_site_design.md b/docs/bel_and_site_design.md index 943e8c3..da92c3b 100644 --- a/docs/bel_and_site_design.md +++ b/docs/bel_and_site_design.md @@ -2,7 +2,7 @@ One of the key concepts within the FPGA interchange device resources is the relationship between the cell library and the device BEL and site definitions. -A well designed cell library and a flexible but consise BEL and site +A well designed cell library and a flexible but concise BEL and site definition is important for exposing the hardware in an efficient way that enables a place and route tool to succeed. @@ -16,24 +16,24 @@ problem, at least as defined for the purpose of the FPGA interchange. The synthesis tool operates on the **cell library**, which should be designed to expose logic elements at a useful level of granularity. -As a concrete example, a LUT4 element is techinically just two LUT3 elements, +As a concrete example, a LUT4 element is technically just two LUT3 elements, connected by a mux (e.g. MUXF4), a LUT3 element is just two LUT2 elements, connected by a mux (e.g. MUXF3), etc. If the outputs of those interior muxes are not accessible to the place and route tool, then exposing those interior -function muxes as cells in the cell library is not a useful. +function muxes as cells in the cell library is not as useful. Cell definitions should be granular enough that the synthesis can map to them, but not so granular that the place and route tool will be making few if any choices. If there is only one legal placement of the cell, it's value is relatively low. -## Drawing site boundries +## Drawing site boundaries When designing an FPGA interchange device resource for a new fabric, one important consideration is where to draw the site boundary. The primary goal of lumping BELs within a site is to capture some local congestion due to fanout limitations. Interior static routing muxes and output muxes may -accomidate significantly fewer signals than the possible number of BELs that +accommodate significantly fewer signals than the possible number of BELs that drive them. In this case, it is important to draw the site boundary large enough to capture these cases so as to enable the local congestion to be resolved during either packing for clustered approaches, or during placement @@ -41,6 +41,22 @@ during unclustered approaches. In either case, local congestion that is strongly placement dependant must be resolved prior to general routing, unless a fused placement and routing algorithm is used. +### FF control sets routing + +A common case worth exploring is FF control sets, e.g. SR type signals and CE +type signals. In most fabric SLICE types, the SR and CE control signals are +shared among multiple rows of the SLICE. This is a common example of local +site congestion, and the site boundary should typically encompass all BELs +that share this kind of local routing for all the reasons discussed above. + +Another consideration with control signals is the presence of control signal +constraints that cannot be expressed as local routing congestion. For +example, if a set of BELs share whether the SR control line is a set or reset +(or async set or async reset), it is common to expand the site boundary to +cover the BELs that share these implicit configurations. The constraint +system in the device resources is designed to handle this kind of non-routing +driven configuration. + ## Drawing BEL boundaries BEL definitions require that creating a boundary around primitive elements of @@ -51,7 +67,7 @@ In general, the smaller the BEL boundary, the more complexity is exposed to the place and route tool. In some cases exposing this complexity is important, because it enables some goal. For example, leaving static routing muxes outside of BELs enables a place and route tool to have greater -flexiblity when resolving site congestion. But as a counter point, if only +flexibility when resolving site congestion. But as a counter point, if only a handful of static mux configurations are useful and those choices can be made at synthesis time, then lumping those muxes into synthesis reduces the complexity required in the place and route tool. -- cgit v1.2.3 From d1e3a90051fd8c9fda42e86342568099f07f9872 Mon Sep 17 00:00:00 2001 From: Keith Rothman <537074+litghost@users.noreply.github.com> Date: Thu, 8 Apr 2021 09:28:03 -0700 Subject: Add explicit note about static muxes and bitstream configurations. Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com> --- docs/bel_and_site_design.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/docs/bel_and_site_design.md b/docs/bel_and_site_design.md index da92c3b..21e7e92 100644 --- a/docs/bel_and_site_design.md +++ b/docs/bel_and_site_design.md @@ -84,6 +84,22 @@ into the BEL is SLICE-type situations. The remainder of this document will show examples of why the BEL boundary should typically exclude the static routing muxes, and leave the choice to the place and route tooling. +## Static routing muxes and bitstream formats + +Something to keep in mind when drawing BEL boundaries to include or exclude +static routing muxes is the degree of configurability present in the +underlying bitstream. Some static routing muxes share configuration bits in +the bitstream, and so expressing them as two seperate static routing muxes +potentially gives the place and route tool flexibility than the underlying +fabric cannot express. This will result in physical netlists that cannot be +converted to bitstream. + +In some cases this can be handled through tight coupling of the cell and +BEL library. The idea is to limit cell port to BEL pin mappings that avoid +illegal static routing mux configurations. This approach has it limits. +In general, considering how the bitstream expresses static routing muxes must +be accounted for when drawing BEL boundaries. + ### Stratix II and Stratix 10 ALM ![Stratix II](stratix2_slice.png-026_rotate.png) -- cgit v1.2.3 From 00e01b0549d54592cefdd74ca9a6a37be56e887e Mon Sep 17 00:00:00 2001 From: Keith Rothman <537074+litghost@users.noreply.github.com> Date: Thu, 8 Apr 2021 09:42:19 -0700 Subject: Add section on Versal ACAP decomposition. Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com> --- docs/bel_and_site_design.md | 38 ++++++++++++++++++++++++++++++++------ docs/versal_lut4.png | Bin 0 -> 26611 bytes docs/versal_lut5.png | Bin 0 -> 27287 bytes docs/versal_lut6.png | Bin 0 -> 26976 bytes docs/versal_luts.png | Bin 0 -> 26147 bytes docs/versal_row.png | Bin 0 -> 53873 bytes 6 files changed, 32 insertions(+), 6 deletions(-) create mode 100644 docs/versal_lut4.png create mode 100644 docs/versal_lut5.png create mode 100644 docs/versal_lut6.png create mode 100644 docs/versal_luts.png create mode 100644 docs/versal_row.png diff --git a/docs/bel_and_site_design.md b/docs/bel_and_site_design.md index 21e7e92..b7ad6f1 100644 --- a/docs/bel_and_site_design.md +++ b/docs/bel_and_site_design.md @@ -59,7 +59,7 @@ driven configuration. ## Drawing BEL boundaries -BEL definitions require that creating a boundary around primitive elements of +BEL definitions require creating a boundary around primitive elements of the fabric. The choice of where to place that boundary has a strong influence on the design of the cell library in the FPGA interchange. @@ -76,8 +76,8 @@ The most common case where the static routing muxes are typically lumped into the BEL is BRAM's and FIFO's address and routing configuration. At synthesis time, a choice is made about the address and data widths, which are encoded as parameters on the cell. The place and route tool does not typically make -meaningful choices on to configuration those static routing muxes, but they -do exist in the hardware. +meaningful choices on the configuration of those static routing muxes, but +they do exist in the hardware. The most common case where the static routing muxes are almost never lumped into the BEL is SLICE-type situations. The remainder of this document will @@ -96,7 +96,7 @@ converted to bitstream. In some cases this can be handled through tight coupling of the cell and BEL library. The idea is to limit cell port to BEL pin mappings that avoid -illegal static routing mux configurations. This approach has it limits. +illegal static routing mux configurations. This approach has its limits. In general, considering how the bitstream expresses static routing muxes must be accounted for when drawing BEL boundaries. @@ -135,7 +135,7 @@ the carry element, then it can only be accessed in Stratix II via the MUXF5 ![Stratix II Highlight MUXF5 and MUXF6](highlight_muxf5_muxf6.png) -So given the Stratix II site layout, the following BELs will be requires: +So given the Stratix II site layout, the following BELs will be required: - 4 LUT4 BELs that connect to the carry - 2 LUT6 BELs that connect to the output FF or output MUX. @@ -157,8 +157,34 @@ in blue, and LUT6 element is shown in red. ![Stratix 10 2 LUT5](stratix10_highlight_lut5.png) ![Stratix 10 LUT6](stratix10_highlight_lut6.png) -So given the Stratix 10 site layout, the following BELs will be requires: +So given the Stratix 10 site layout, the following BELs will be required: - 4 LUT4 BELs that connect to the carry - 2 LUT5 BELs that connect to the output FF or output MUX - 1 LUT6 BELs that connect to the output FF or output MUX + +### Versal ICAP + +The Versal ICAP LUT structure is fairly similiar to the Stratix 10 combitorial +elements. + +![Versal ICAP LUTs](versal_luts.png) + +Unless the Stratix 10 ALM, it appears only 1 of the LUT4's connects to the +carry element (the prop signal). The O6 output also has a dedicate +connection to the carry. See image below: + +![Versal SLICE row](versal_row.png) + +The Versal LUT structure likely should be decomposed into 4 BELs, shown in +the next figures: + +![Versal ICAP LUT4](versal_lut4.png) +![Versal ICAP two LUT5](versal_lut5.png) +![Versal ICAP LUT6](versal_lut6.png) + +So given the Versal site layout, the following BELs will be required (per SLICE row): + + - 1 LUT4 BELs that connect to the carry + - 2 LUT5 BELs that connect to the output FF or output MUX + - 1 LUT6 BELs that connect to the output FF or output MUX diff --git a/docs/versal_lut4.png b/docs/versal_lut4.png new file mode 100644 index 0000000..47c958a Binary files /dev/null and b/docs/versal_lut4.png differ diff --git a/docs/versal_lut5.png b/docs/versal_lut5.png new file mode 100644 index 0000000..edf1977 Binary files /dev/null and b/docs/versal_lut5.png differ diff --git a/docs/versal_lut6.png b/docs/versal_lut6.png new file mode 100644 index 0000000..31c907a Binary files /dev/null and b/docs/versal_lut6.png differ diff --git a/docs/versal_luts.png b/docs/versal_luts.png new file mode 100644 index 0000000..94d36e7 Binary files /dev/null and b/docs/versal_luts.png differ diff --git a/docs/versal_row.png b/docs/versal_row.png new file mode 100644 index 0000000..9af681c Binary files /dev/null and b/docs/versal_row.png differ -- cgit v1.2.3 From f5aa07a21f5ebe73238065b14af1a460abfd79c7 Mon Sep 17 00:00:00 2001 From: Keith Rothman <537074+litghost@users.noreply.github.com> Date: Thu, 8 Apr 2021 10:45:18 -0700 Subject: Fix some simple errors. Signed-off-by: Keith Rothman <537074+litghost@users.noreply.github.com> --- docs/bel_and_site_design.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/bel_and_site_design.md b/docs/bel_and_site_design.md index b7ad6f1..fd85064 100644 --- a/docs/bel_and_site_design.md +++ b/docs/bel_and_site_design.md @@ -163,14 +163,14 @@ So given the Stratix 10 site layout, the following BELs will be required: - 2 LUT5 BELs that connect to the output FF or output MUX - 1 LUT6 BELs that connect to the output FF or output MUX -### Versal ICAP +### Versal ACAP -The Versal ICAP LUT structure is fairly similiar to the Stratix 10 combitorial +The Versal ACAP LUT structure is fairly similiar to the Stratix 10 combitorial elements. -![Versal ICAP LUTs](versal_luts.png) +![Versal ACAP LUTs](versal_luts.png) -Unless the Stratix 10 ALM, it appears only 1 of the LUT4's connects to the +Unlike the Stratix 10 ALM, it appears only 1 of the LUT4's connects to the carry element (the prop signal). The O6 output also has a dedicate connection to the carry. See image below: @@ -179,9 +179,9 @@ connection to the carry. See image below: The Versal LUT structure likely should be decomposed into 4 BELs, shown in the next figures: -![Versal ICAP LUT4](versal_lut4.png) -![Versal ICAP two LUT5](versal_lut5.png) -![Versal ICAP LUT6](versal_lut6.png) +![Versal ACAP LUT4](versal_lut4.png) +![Versal ACAP two LUT5](versal_lut5.png) +![Versal ACAP LUT6](versal_lut6.png) So given the Versal site layout, the following BELs will be required (per SLICE row): -- cgit v1.2.3