From 48154cb6f452d3bdb4da36cc267b4b6c45588dc9 Mon Sep 17 00:00:00 2001
From: Clifford Wolf
+Project IceStorm aims at documenting the bitstream format of Lattice iCE40
+FPGAs and providing simple tools for analyzing and creating bitstream files.
+This is work in progress.
+
+The bitstream file starts with the bytes 0xFF 0x00, followed by a sequence of
+zero-terminated comment strings, followed by 0x00 0xFF. However, there seems to be
+a bug in the Lattice "bitstream" tool that moves the terminating 0x00 0xFF a few
+bytes into the comment string in some cases.
+
+After the comment sections the token 0x7EAA997E (MSB first) starts the actual
+bit stream. The bitstream consists of one-byte commands, followed by a payload
+word, followed by an optional block of data. The MSB nibble of the command byte
+is the command opcode, the LSB nibble is the length of the command payload in
+bytes. The commands that do not require a payload are using the opcode 0, with
+the command encoded in the payload field. Note that this "payload" in this
+context refers to a single integer argument, not the blocks of data that
+follows the command in case of the CRAM and BRAM commands.
+
+The following commands are known:
+
+Project IceStorm – Bitstream File Format Documentation
+
+General Description of the File Format
+
+
+
+
+Opcode Description
+0 payload=0: CRAM Data
+ payload=3: BRAM Data
+ payload=5: Reset CRC
+ payload=6: Wakeup
+1 Set bank number
+2 CRC check
+5 Set internal oscillator frequency range
+ payload=0: low
+ payload=1: medium
+ payload=2: high
+6 Set bank width
+7 Set bank height
+8 Set bank offset
+9 payload=0: Disable warm boot
+ payload=32: Enable warm boot
+Use iceunpack -vv to display the commands as they are interpreted by the tool. +
+ ++Note: The format itself seems to be very flexible. At the moment it is unclear what the FPGA +devices will do when presented with a bitstream that use the commands in a different way +than the bitstreams generated by the lattice tools. +
+ ++Most bytes in the bitstream are SRAM data bytes that should be written to the various SRAM banks +in the FPGA. The following sequence is used to program an SRAM cell: +
+ ++The bank width and height parameters reflect the width and height of the SRAM bank. A large SRAM can +be written in smaller junks. In this case height parameter may be smaller and the offset parameter +reflects the vertical start position. +
+ ++There are four CRAM and four BRAM banks in an iCE40 FPGA. The different devices from the family +use different widths and heights, but the same number of banks. +
+ ++The CRAM banks hold the configuration bits for the FPGA fabric and hard IP blocks, the BRAM +corresponds to the contents of the block ram resources. +
+ ++The ordering of the data bits is in MSB first row-major order. +
+ ++The chip is organized into four quadrants. Each CRAM memory bank contains the configuration bits for one quadrant. +The address 0 is always the corner of the quadrant, i.e. in one quadrant the bit addresses increase with the tile x/y +coordinates, in another they increase with the tile x coordinate but decrease with the tile y coordinate, and so on. +
+ ++For an iCE40 1k device, that has 12 x 16 tiles (not counting the io tiles), the CRAM bank 0 is the one containing the corner tile (1 1), +the CRAM bank 1 contains the corner tile (1 16), the CRAM bank 2 contains the corner tile (12 1) and the CRAM bank 3 contains the +corner tile (12 16). The entire CRAM of such a device is depicted on the right (bank 0 is in the lower left corner in blue/green). +
+ ++The checkerboard pattern in the picture visualizes which bits are assoziated +with which tile. The height of the configuration block is 16 for all tile +types, but the width is different for each tile type. IO tiles have +configurations that are 18 bits wide, LOGIC tiles are 54 bits wide, and +RAM tiles are 42 bits wide. (Notice the two slightly smaller columns for the RAM tiles.) +
+ ++The IO tiles on the top and bottom of the chip use a strange permutation pattern for their bits. It can be seen in the picture that +their columns are spread out horizontally. What cannot be seen in the picture is the columns also are not in order and the bit +positions are vertically permutated as well. The CramIndexConverter class in icepack.cc encapsulates the calculations +that are neccessary to convert between tile-relative bit addresses and CRAM bank-relative bit addresses. +
+ ++The black pixels in the image correspond to CRAM bits that are not assoziated with any IO, LOGIC or RAM tile. +Some of them are unused, others are used by hard IPs or other global resources. The iceunpack tool reports +such bits, when set, with the ".extra_bit bank x y" statement in the ASCII output format. +
+ ++This part of the documentation has not been written yet. +
+ ++The CRC is a 16 bit CRC. The (truncated) polynomial is 0x1021 (CRC-16-CCITT). The "Reset CRC" command sets +the CRC to 0xFFFF. No zero padding is performed. +
+ diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 0000000..dbd8ba6 --- /dev/null +++ b/docs/index.html @@ -0,0 +1,321 @@ +
+2015-05-27: We have a working fully Open Source flow with Yosys and Arachne-pnr! Video: http://youtu.be/yUiNlmvVOq8
+2015-04-13: Complete rewrite of IceUnpack, added IcePack, some major documentation updates
+2015-03-22: First public release and short YouTube video demonstrating our work: http://youtu.be/u1ZHcSNDQMM
+
+Project IceStorm aims at documenting the bitstream format of Lattice iCE40 +FPGAs and providing simple tools for analyzing and creating bitstream files. +At the moment the focus of the project is on the HX1K-TQ144 device, but +most of the information is device-independent. +
+ ++It has a very minimalistic architecture with a very regular structure. There are not many +different kinds of tiles or special function units. This makes it both ideal for +reverse engineering and as a reference platform for general purpose FPGA tool development. +
+ ++Also, with the iCEstick there is +a cheap and easy to use development platform available, which makes the part interesting +for all kinds of projects. +
+ ++We have enough bits mapped that we can create a functional verilog model for almost all +bitstreams generated by Lattice iCEcube2 for the iCE40 HX1K-TQ144, as long as no +block memories or PLLs are used. (Both are fully documented, but the +icebox_vlog.py script does not create verilog models for them yet.) +
+ ++Next on the TODO list: PLLs, Timing Analysis, support for HX8K chips. +
+ ++Synthesis for iCE40 FPGAs can be done with Yosys. +Place-and-route can be done with arachne-pnr. +Here is an example script for implementing and programming the rot example from +arachne-pnr (this example targets the iCEstick development board): +
+ +yosys -p "synth_ice40 -blif rot.blif" rot.v +arachne-pnr -d 1k -p rot.pcf rot.blif -o rot.txt +icepack rot.txt rot.bin +iceprog rot.bin+ +
+Here is the current snapshot of our toolchain: icestorm-snapshot-150526.zip
+This is work under construction and highly experimental! Use at your own risk!
+
+All snapshots in reverse chronological order: +
+ ++The iceunpack program converts an iCE40 .bin file into the IceBox ASCII format +that has blocks of 0 and 1 for the config bits for each tile in the chip. The +icepack program converts such an ASCII file back to an iCE40 .bin file. +
+ ++A python library and various tools for working with IceBox ASCII files and accessing +the device database. For example icebox_vlog.py converts our ASCII file +dump of a bitstream into a verilog file that implements an equivalent circuit. +
+ ++A small driver programm for the FTDI-based programmer used on the iCEstick and HX8K development boards. +
+ ++The tools are written by Clifford Wolf. IcePack/IceUnpack is based on a reference implementation provided by Mathias Lasser. +
+ ++Recommended reading: +Lattice iCE40 LP/HX Family Datasheet, +Lattice iCE Technology Library +(Especially the three pages on "Architecture Overview", "PLB Blocks", "Routing", and "Clock/Control Distribution Network" in +the Lattice iCE40 LP/HX Family Datasheet. Read that first, then come back here.) +
+ ++The FPGA fabric is divided into tiles. There are IO, RAM and LOGIC tiles. +
+ ++The iceunpack program can be used to convert the bitstream into an ASCII file +that has a block of 0 and 1 characters for each tile. For example: +
+ +.logic_tile 12 12 +000000000000000000000000000000000000000000000000000000 +000000000000000000000011010000000000000000000000000000 +000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000 +000000000000001011000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000 +000000000000000000000000000000000000000000000000000000 +000000000000000000000000001000001000010101010000000000 +000000000000000000000000000101010000101010100000000000+ +
+This bits are referred to as By[x] in the documentation. For example, B0 is the first +line, B0[0] the first bit in the first line, and B15[53] the last bit in the last line. +
+ ++The icebox_explain.py program can be used to turn this block of config bits into a description of the cell +configuration: +
+ +.logic_tile 12 12 +LC_7 0101010110101010 0000 +buffer local_g0_2 lutff_7/in_3 +buffer local_g1_4 lutff_7/in_0 +buffer sp12_h_r_18 local_g0_2 +buffer sp12_h_r_20 local_g1_4+ +
+IceBox contains a database of the wires and configuration bits that can be found in iCE40 tiles. This database can be accessed +via the IceBox Python API. But IceBox is a large hack. So it is recommended to only use the IceBox API +to export this database into a format that fits the target application. See icebox_chipdb.py for +an example program that does that. +
+ ++The recommended approach for learning how to use this documentation is to synthesize very simple circuits using +Lattice iCEcube2, run our toolchain on the resulting bitstream files, and analyze the results using the HTML export of the database +mentioned above. icebox_vlog.py can be used to convert the bitstream to verilog. The output file of +this tool will also outline the signal paths in comments added to the generated verilog. +
+ ++For example, using the top_bitmap.bin from the following Verilog and PCF files: +
+ +module top (input a, b, output y); + assign y = a & b; +endmodule + +set_io a 1 +set_io b 10 +set_io y 11+ +
+We would get something like the following icebox_explain.py output: +
+ +$ iceunpack top_bitmap.bin top_bitmap.txt +$ icebox_explain top_bitmap.txt +Reading file 'top_bitmap.txt'.. +Fabric size (without IO tiles): 12 x 16 + +.io_tile 0 10 +IOB_1 PINTYPE_0 +IOB_1 PINTYPE_3 +IOB_1 PINTYPE_4 +IoCtrl IE_0 +IoCtrl IE_1 +IoCtrl REN_0 +buffer local_g1_2 io_1/D_OUT_0 +buffer logic_op_tnr_2 local_g1_2 + +.io_tile 0 14 +IOB_1 PINTYPE_0 +IoCtrl IE_1 +IoCtrl REN_0 +buffer io_1/D_IN_0 span4_horz_28 + +.io_tile 0 11 +IOB_0 PINTYPE_0 +IoCtrl IE_0 +IoCtrl REN_1 + +.logic_tile 1 11 +LC_2 0000000001010101 0000 +buffer local_g1_4 lutff_2/in_3 +buffer local_g3_1 lutff_2/in_0 +buffer neigh_op_lft_4 local_g1_4 +buffer sp4_r_v_b_41 local_g3_1 + +.logic_tile 2 14 +routing sp4_h_l_41 sp4_v_b_4+ +
+And something like the following icebox_vlog.py output: +
+ +$ icebox_vlog top_bitmap.txt +// Reading file 'top_bitmap.txt'.. + +module chip (output io_0_10_1, input io_0_11_0, input io_0_14_1); + +wire io_0_10_1; +// io_0_10_1 +// (0, 10, 'io_1/D_OUT_0') +// (0, 10, 'io_1/PAD') +// (0, 10, 'local_g1_2') +// (0, 10, 'logic_op_tnr_2') +// (0, 11, 'logic_op_rgt_2') +// (0, 12, 'logic_op_bnr_2') +// (1, 10, 'neigh_op_top_2') +// (1, 11, 'lutff_2/out') +// (1, 12, 'neigh_op_bot_2') +// (2, 10, 'neigh_op_tnl_2') +// (2, 11, 'neigh_op_lft_2') +// (2, 12, 'neigh_op_bnl_2') + +wire io_0_11_0; +// io_0_11_0 +// (0, 11, 'io_0/D_IN_0') +// (0, 11, 'io_0/PAD') +// (1, 10, 'neigh_op_tnl_0') +// (1, 10, 'neigh_op_tnl_4') +// (1, 11, 'local_g1_4') +// (1, 11, 'lutff_2/in_3') +// (1, 11, 'neigh_op_lft_0') +// (1, 11, 'neigh_op_lft_4') +// (1, 12, 'neigh_op_bnl_0') +// (1, 12, 'neigh_op_bnl_4') + +wire io_0_14_1; +// io_0_14_1 +// (0, 14, 'io_1/D_IN_0') +// (0, 14, 'io_1/PAD') +// (0, 14, 'span4_horz_28') +// (1, 11, 'local_g3_1') +// (1, 11, 'lutff_2/in_0') +// (1, 11, 'sp4_r_v_b_41') +// (1, 12, 'sp4_r_v_b_28') +// (1, 13, 'neigh_op_tnl_2') +// (1, 13, 'neigh_op_tnl_6') +// (1, 13, 'sp4_r_v_b_17') +// (1, 14, 'neigh_op_lft_2') +// (1, 14, 'neigh_op_lft_6') +// (1, 14, 'sp4_h_r_41') +// (1, 14, 'sp4_r_v_b_4') +// (1, 15, 'neigh_op_bnl_2') +// (1, 15, 'neigh_op_bnl_6') +// (2, 10, 'sp4_v_t_41') +// (2, 11, 'sp4_v_b_41') +// (2, 12, 'sp4_v_b_28') +// (2, 13, 'sp4_v_b_17') +// (2, 14, 'sp4_h_l_41') +// (2, 14, 'sp4_v_b_4') + +assign io_0_10_1 = /* LUT 1 11 2 */ io_0_11_0 ? io_0_14_1 : 0; + +endmodule+ +
+
+In papers and reports, please refer to Project IceStorm as follows: Clifford Wolf, Mathias Lasser. Project IceStorm. http://www.clifford.at/icestorm/, +e.g. using the following BibTeX code: +
+ +@MISC{IceStorm, + author = {Clifford Wolf and Mathias Lasser}, + title = {Project IceStorm}, + howpublished = "\url{http://www.clifford.at/icestorm/}" +}+ +
+
+Documentation mostly by Clifford Wolf <clifford@clifford.at> in 2015. Based on research by Mathias Lasser and Clifford Wolf.
+Buy an iCEstick from Lattice and see what you can do with the information provided here. Buy a few because you might break some..
+
+Project IceStorm aims at documenting the bitstream format of Lattice iCE40 +FPGAs and providing simple tools for analyzing and creating bitstream files. +This is work in progress. +
+ ++The image on the right shows the span-wires of a left (or right) io cell (click to enlarge). +
+ ++A left/right io cell has 16 connections named span4_vert_t_0 to span4_vert_t_15 on its top edge and +16 connections named span4_vert_b_0 to span4_vert_b_15 on its bottom edge. The nets span4_vert_t_0 +to span4_vert_t_11 are connected to span4_vert_b_4 to span4_vert_b_15. The span-4 and span-12 wires +of the adjacent logic cell are connected to the nets span4_horz_0 to span4_horz_47 and span12_horz_0 +to span12_horz_23. +
+ ++A top/bottom io cell has 16 connections named span4_vert_l_0 to span4_vert_l_15 on its top edge and +16 connections named span4_vert_r_0 to span4_vert_r_15 on its bottom edge. The nets span4_vert_l_0 +to span4_vert_l_11 are connected to span4_vert_r_4 to span4_vert_r_15. The span-4 and span-12 wires +of the adjacent logic cell are connected to the nets span4_vert_0 to span4_vert_47 and span12_vert_0 +to span12_vert_23. +
+ ++The vertical span4 wires of left/right io cells are connected "around the corner" to the horizontal span4 wires of the top/bottom +io cells. For example span4_vert_b_0 of IO cell (0 1) is connected to span4_horz_l_0 (span4_horz_r_4) +of IO cell (1 0). +
+ ++Note that unlike the span-wires connection LOGIC and RAM tiles, the span-wires +connecting IO tiles to each other are not pairwised crossed out. +
+ ++Each IO tile contains two IO blocks. Each IO block essentially implements the SB_IO +primitive from the Lattice iCE Technology Library. +Some inputs are shared between the two IO blocks. The following table lists how the +wires in the logic tile map to the SB_IO primitive ports: +
+ ++
SB_IO Port | IO Block 0 | IO Block 1 |
---|---|---|
D_IN_0 | io_0/D_IN_0 | io_1/D_IN_0 |
D_IN_1 | io_0/D_IN_1 | io_1/D_IN_1 |
D_OUT_0 | io_0/D_OUT_0 | io_1/D_OUT_0 |
D_OUT_1 | io_0/D_OUT_1 | io_1/D_OUT_1 |
OUTPUT_ENABLE | io_0/OUT_ENB | io_1/OUT_ENB |
CLOCK_ENABLE | io_global/cen | |
INPUT_CLK | io_global/inclk | |
OUTPUT_CLK | io_global/outclk | |
LATCH_INPUT_VALUE | io_global/latch |
+Like the inputs to logic cells, the inputs to IO blocks are routed to the IO block via a two-stage process. A signal +is first routed to one of 16 local tracks in the IO tile and then from the local track to the IO block. +
+ ++The io_global/latch signal is shared among all IO tiles on an edge of the chip and is driven by wire_gbuf/in +from one dedicated IO tile on that edge. For the HX1K chips the tiles driving the io_global/latch signal are: +(0, 7), (13, 10), (5, 0), and (8, 17) +
+ ++A logic tile sends the output of its eight logic cells to its neighbour tiles. An IO tile does the same thing with the four D_IN +signals created by its two IO blocks. The D_IN signals map to logic function indices as follows: +
+ ++
Function Index | D_IN Wire |
---|---|
0 | io_0/D_IN_0 |
1 | io_0/D_IN_1 |
2 | io_1/D_IN_0 |
3 | io_1/D_IN_1 |
4 | io_0/D_IN_0 |
5 | io_0/D_IN_1 |
6 | io_1/D_IN_0 |
7 | io_1/D_IN_1 |
+For example the signal io_1/D_IN_0 in IO tile (0, 5) can be seen as neigh_op_lft_2 and neigh_op_lft_6 in LOGIC tile (1, 5). +
+ ++Each IO Tile has 2 NegClk configuration bits, suggesting that the +clock signals can be inverted independently for the the two IO blocks in the +tile. However, the Lattice tools refuse to pack to IO blocks with different block +polarity into the same IO tile. In our tests we only managed to either set or clear +both NegClk bits. +
+ ++Each IO block has two IoCtrl IE bits that enable the input buffers and +two IoCtrl REN bits that enable the pull up resistors. Both bits are active +low, i.e. an unused IO tile will have both IE bits set and both REN bits cleared (the +default behavior is to enable pullup resistors on all unused pins). Note that +icebox_explain.py will ignore all IO tiles that only have the two IoCtrl +IE bits set. +
+ ++However, the IoCtrl IE_0/IE_1 and IoCtrl REN_0/REN_1 do not +necessarily configure the IO PIN that are connected to the IO block in the same tile, +and if they do the numbers (0/1) do not necessarily match. As a general rule, the pins +on the right and bottom side of the chips match up with the IO blocks and for the pins +on the left and top side the numbers must be swapped. But in some cases the IO block +and the set of IE/REN are not even located in the same tile. The following +table lists the correlation between IO blocks and IE/REN bits for the +1K chip: +
+ ++
+
+
|
+
+
|
+
+
|
+
+
|
+When an input pin pair is used as LVDS pair (IO standard +SB_LVDS_INPUT, bank 3 / left edge only), then the four bits +IoCtrl IE_0/IE_1 and IoCtrl REN_0/REN_1 are all set, as well +as the IoCtrl LVDS bit. +
+ ++In the iCE 8k devices the IoCtrl IE bits are active high. So an unused +IO tile on an 8k chip has all bits cleared. +
+ ++iCE40 FPGAs have 8 global nets. Each global net can be driven directly from an +IO pin. In the FPGA bitstream, routing of external signals to global nets is +not controlled by bits in the IO tile. Instead bits that do not belong to any +tile are used. In IceBox nomenclature such bits are called "extra bits". +
+ ++The following table lists which pins / IO blocks may be used to drive +which global net, and what .extra statements in the IceBox ASCII file +format to represent the corresponding configuration bits: +
+ + ++
Glb Net | Pin (HX1K-TQ144) | IO Tile + Block # | IceBox Statement |
---|---|---|---|
0 | 93 | 13 8 1 | .extra_bit 0 330 142 |
1 | 21 | 0 8 1 | .extra_bit 0 331 142 |
2 | 128 | 7 17 0 | .extra_bit 1 330 143 |
3 | 50 | 7 0 0 | .extra_bit 1 331 143 |
4 | 20 | 0 9 0 | .extra_bit 1 330 142 |
5 | 94 | 13 9 0 | .extra_bit 1 331 142 |
6 | 49 | 6 0 1 | .extra_bit 0 330 143 |
7 | 129 | 6 17 1 | .extra_bit 0 331 143 |
+Signals internal to the FPGA can also be routed to the global nets. This is done by routing the signal +to the wire_gbuf/in net on an IO tile. The same set of I/O tiles is used for this, but in this +case each of the I/O tiles corresponds to a different global net: +
+ ++
Glb Net | +0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 |
---|---|---|---|---|---|---|---|---|
IO Tile | +7 0 | +7 17 | +13 9 | +0 9 | +6 17 | +6 0 | +0 8 | +13 8 |
+Each LOGIC, IO, and RAMB tile has 8 ColBufCtrl bits, one for each global net. In most tiles this +bits have no function, but in tiles in rows 4, 5, 12, and 13 (for RAM columns: rows 3, 5, 11, and 13) this bits +control which global nets are driven to the column of tiles below and/or above that tile (including that tile), +as illustrated in the image to the right (click to enlarge). +
+ ++In 8k chips the rows 8, 9, 24, and 25 contain the column buffers. 8k RAMB and +RAMT tiles can control column buffers, so the pattern looks the same for RAM, LOGIC, and +IO columns. +
+ ++The SB_WARMBOOT primitive in iCE40 FPGAs has three inputs and no outputs. The three inputs of that cell +are driven by the wire_gbuf/in signal from three IO tiles. In HX1K chips the tiles connected to the +SB_WARMBOOT primitive are: +
+ ++
Warmboot Pin | IO Tile |
---|---|
BOOT | 12 0 |
S0 | 13 1 |
S1 | 13 2 |
+The PLL primitives in iCE40 FPGAs are configured using the PLLCONFIG_* +bits in the IO tiles. The configuration for a single PLL cell is spread out +over many IO tiles. For example, the PLL cell in the 1K chip are configured as +follows (bits listed from LSB to MSB): +
+ ++
+
+
|
+
+
|
+The PLL inputs are routed to the PLL via the wire_gbuf/in signal from various IO tiles. The non-clock +PLL outputs are routed via otherwise unused neigh_op_* signals in fabric corners. For example in case +of the 1k chip: +
+ ++
Tile | Net-Segment | SB_PLL40_* Port Name |
---|---|---|
0 1 | wire_gbuf/in | REFERENCECLK |
0 2 | wire_gbuf/in | EXTFEEDBACK |
0 4 | wire_gbuf/in | DYNAMICDELAY |
0 5 | wire_gbuf/in | |
0 6 | wire_gbuf/in | |
0 10 | wire_gbuf/in | |
0 11 | wire_gbuf/in | |
0 12 | wire_gbuf/in | |
0 13 | wire_gbuf/in | |
0 14 | wire_gbuf/in | |
1 1 | neigh_op_bnl_1 | LOCK |
1 0 | wire_gbuf/in | BYPASS |
2 0 | wire_gbuf/in | RESETB |
5 0 | wire_gbuf/in | LATCHINPUTVALUE |
12 1 | neigh_op_bnl_1 | SDO |
4 0 | wire_gbuf/in | SDI |
5 0 | wire_gbuf/in | SCLK |
+The PLL clock outputs are fed directly into the input path of certain IO tiles. +In case of the 1k chip the PORTA clock is fed into PIO 1 of IO Tile (6 0) and +the PORTB clock is fed into PIO 0 of IO Tile (7 0). Because of this, those two +PIOs can only be used as output Pins by the FPGA fabric when the PLL ports +are being used. +
+ diff --git a/docs/iosp.svg b/docs/iosp.svg new file mode 100644 index 0000000..e7b130f --- /dev/null +++ b/docs/iosp.svg @@ -0,0 +1,1394 @@ + + + + diff --git a/docs/logic_tile.html b/docs/logic_tile.html new file mode 100644 index 0000000..8e3dcad --- /dev/null +++ b/docs/logic_tile.html @@ -0,0 +1,327 @@ ++Project IceStorm aims at documenting the bitstream format of Lattice iCE40 +FPGAs and providing simple tools for analyzing and creating bitstream files. +This is work in progress. +
+ ++The span-4 and span-12 wires are the main interconnect resource in iCE40 FPGAs. They "span" (have a length of) +4 or 12 cells in horizontal or vertical direction. +
+ ++The bits marked routing in the bitstream do enable switches (transfer gates) that can +be used to connect wire segments bidirectionally to each other in order to create larger +segments. The bits marked buffer in the bitstream enable tristate buffers that drive +the signal in one direction from one wire to another. Both types of bits exist for routing between +span-wires. See the auto generated documentation for the LOGIC Tile configuration bits for details. +
+ ++Only directional tristate buffers are used to route signals between the span-wires and the logic cells. +
+ ++The image on the right shows the horizontal span-4 wires of a logic or ram cell (click to enlarge). +
+ ++On the left side of the cell there are 48 connections named sp4_h_l_0 to sp4_h_l_47. The lower 36 of those +wires are connected to sp4_h_r_12 to sp4_h_r_47 on the right side of the cell. (IceStorm normalizes this +wire names to sp4_h_r_0 to sp4_h_r_35. Note: the Lattice tools use a different normalization scheme +for this wire names.) The wires connecting the left and right horizontal span-4 ports are pairwise crossed-out. +
+ ++The wires sp4_h_l_36 to sp4_h_l_47 terminate in the cell, so do the wires sp4_h_r_0 to sp4_h_r_11. +
+ ++This wires "span" 4 cells, i.e. they connect 5 cells if you count the cells on +both ends of the wire. +
+ ++For example, the wire sp4_h_r_0 in cell (x, y) has the following names: +
+ ++
Cell Coordinates | sp4_h_l_* wire name | sp4_h_r_* wire name |
---|---|---|
x, y | - | sp4_h_r_0 |
x+1, y | sp4_h_l_0 | sp4_h_r_13 |
x+2, y | sp4_h_l_13 | sp4_h_r_24 |
x+3, y | sp4_h_l_24 | sp4_h_r_37 |
x+4, y | sp4_h_l_37 | - |
+The image on the right shows the veritical span-4 wires of a logic or ram cell (click to enlarge). +
+ ++Similar to the horizontal span-4 wires there are 48 connections on the top (sp4_v_t_0 to sp4_v_t_47) and +48 connections on the bottom (sp4_v_b_0 to sp4_v_b_47). The wires sp4_v_t_0 to sp4_v_t_35 +are connected to sp4_v_b_12 to sp4_v_b_47 (with pairwise crossing out). Wire names are normalized +to sp4_v_b_12 to sp4_v_b_47. +
+ ++But in addition to that, each cell also has access to sp4_v_b_0 to sp4_v_b_47 of its right neighbour. +This are the wires sp4_r_v_b_0 to sp4_r_v_b_47. So over all a single veritical span-4 wire +connects 9 cells. For example, the wire sp4_v_b_0 in cell (x, y) has the following names: +
+ ++
Cell Coordinates | sp4_v_t_* wire name | sp4_v_b_* wire name | sp4_r_v_b_* wire name |
---|---|---|---|
x, y | - | sp4_v_b_0 | - |
x, y-1 | sp4_v_t_0 | sp4_v_b_13 | - |
x, y-2 | sp4_v_t_13 | sp4_v_b_24 | - |
x, y-3 | sp4_v_t_24 | sp4_v_b_37 | - |
x, y-4 | sp4_v_t_37 | - | - |
x-1, y | - | - | sp4_r_v_b_0 |
x-1, y-1 | - | - | sp4_r_v_b_13 |
x-1, y-2 | - | - | sp4_r_v_b_24 |
x-1, y-3 | - | - | sp4_r_v_b_37 |
+Similar to the span-4 wires there are also longer horizontal and vertical span-12 wires. +
+ ++There are 24 connections sp12_v_t_0 to sp12_v_t_23 on the top of the +cell and 24 connections sp12_v_b_0 to sp12_v_b_23 on the bottom of the +cell. The wires sp12_v_t_0 to sp12_v_t_21 are connected to +sp12_v_b_2 to sp12_v_b_23 (with pairwise crossing out). The connections +sp12_v_b_0, sp12_v_b_1, sp12_v_t_22, and sp12_v_t_23 +terminate in the cell. Wire names are normalized to sp12_v_b_2 to sp12_v_b_23. +
+ ++There are also 24 connections sp12_h_l_0 to sp12_h_l_23 on the left of the +cell and 24 connections sp12_h_r_0 to sp12_h_r_23 on the right of the +cell. The wires sp12_h_l_0 to sp12_h_l_21 are connected to +sp12_h_r_2 to sp12_h_r_23 (with pairwise crossing out). The connections +sp12_h_r_0, sp12_h_r_1, sp12_h_l_22, and sp12_h_l_23 +terminate in the cell. Wire names are normalized to sp12_v_r_2 to sp12_h_r_23. +
+ ++The local tracks are the gateway to the logic cell inputs. Signals from the span-wires +and the logic cell ouputs of the eight neighbour cells can be routed to the local tracks and +signals from the local tracks can be routed to the logic cell inputs. +
+ ++Each logic tile has 32 local tracks. They are organized in 4 groups of 8 wires each: +local_g0_0 to local_g3_7. +
+ ++The span wires, global signals, and neighbour outputs can be routed to the local tracks. But not +every of those signals can be routed to every of the local tracks. Instead there is a different +mix of 16 signals for each local track. +
+ ++The buffer driving the local track has 5 configuration bits. One enable bit and 4 bits that select +the input wire. For example for local_g0_0 (copy&paste from the bitstream doku): +
+ ++
B0[14] | +B1[14] | +B1[15] | +B1[16] | +B1[17] | +Function | Source-Net | Destination-Net |
---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 1 | buffer | sp4_r_v_b_24 | local_g0_0 |
0 | 0 | 0 | 1 | 1 | buffer | sp12_h_r_8 | local_g0_0 |
0 | 0 | 1 | 0 | 1 | buffer | neigh_op_bot_0 | local_g0_0 |
0 | 0 | 1 | 1 | 1 | buffer | sp4_v_b_16 | local_g0_0 |
0 | 1 | 0 | 0 | 1 | buffer | sp4_r_v_b_35 | local_g0_0 |
0 | 1 | 0 | 1 | 1 | buffer | sp12_h_r_16 | local_g0_0 |
0 | 1 | 1 | 0 | 1 | buffer | neigh_op_top_0 | local_g0_0 |
0 | 1 | 1 | 1 | 1 | buffer | sp4_h_r_0 | local_g0_0 |
1 | 0 | 0 | 0 | 1 | buffer | lutff_0/out | local_g0_0 |
1 | 0 | 0 | 1 | 1 | buffer | sp4_v_b_0 | local_g0_0 |
1 | 0 | 1 | 0 | 1 | buffer | neigh_op_lft_0 | local_g0_0 |
1 | 0 | 1 | 1 | 1 | buffer | sp4_h_r_8 | local_g0_0 |
1 | 1 | 0 | 0 | 1 | buffer | neigh_op_bnr_0 | local_g0_0 |
1 | 1 | 0 | 1 | 1 | buffer | sp4_v_b_8 | local_g0_0 |
1 | 1 | 1 | 0 | 1 | buffer | sp12_h_r_0 | local_g0_0 |
1 | 1 | 1 | 1 | 1 | buffer | sp4_h_r_16 | local_g0_0 |
+Then the signals on the local tracks can be routed to the input pins of the logic cells. Like before, +not every local track can be routed to every logic cell input pin. Instead there is a different mix +of 16 local track for each logic cell input. For example for lutff_0/in_0: +
+ ++
B0[26] | +B1[26] | +B1[27] | +B1[28] | +B1[29] | +Function | Source-Net | Destination-Net |
---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 1 | buffer | local_g0_0 | lutff_0/in_0 |
0 | 0 | 0 | 1 | 1 | buffer | local_g2_0 | lutff_0/in_0 |
0 | 0 | 1 | 0 | 1 | buffer | local_g1_1 | lutff_0/in_0 |
0 | 0 | 1 | 1 | 1 | buffer | local_g3_1 | lutff_0/in_0 |
0 | 1 | 0 | 0 | 1 | buffer | local_g0_2 | lutff_0/in_0 |
0 | 1 | 0 | 1 | 1 | buffer | local_g2_2 | lutff_0/in_0 |
0 | 1 | 1 | 0 | 1 | buffer | local_g1_3 | lutff_0/in_0 |
0 | 1 | 1 | 1 | 1 | buffer | local_g3_3 | lutff_0/in_0 |
1 | 0 | 0 | 0 | 1 | buffer | local_g0_4 | lutff_0/in_0 |
1 | 0 | 0 | 1 | 1 | buffer | local_g2_4 | lutff_0/in_0 |
1 | 0 | 1 | 0 | 1 | buffer | local_g1_5 | lutff_0/in_0 |
1 | 0 | 1 | 1 | 1 | buffer | local_g3_5 | lutff_0/in_0 |
1 | 1 | 0 | 0 | 1 | buffer | local_g0_6 | lutff_0/in_0 |
1 | 1 | 0 | 1 | 1 | buffer | local_g2_6 | lutff_0/in_0 |
1 | 1 | 1 | 0 | 1 | buffer | local_g1_7 | lutff_0/in_0 |
1 | 1 | 1 | 1 | 1 | buffer | local_g3_7 | lutff_0/in_0 |
+The 8 global nets on the iCE40 can be routed to the local track via the glb2local_0 to glb2local_3 +nets using a similar two-stage process. The logic block clock-enable and set-reset inputs can be driven +directly from one of 4 global nets or from one of 4 local tracks. The logic block clock input can be driven +from any of the global nets and from a few local tracks. See the bitstream documentation for details. +
+ ++Each logic tile has a logic block containing 8 logic cells. Each logic cell contains a 4-input LUT, a carry +unit and a flip-flop. Clock, clock enable, and set/reset inputs are shared along the 8 logic cells. So is the +bit that configures positive/negative edge for the flip flops. But the three configuration bits that specify if +the flip flop should be used, if it is set or reset by the set/reset input, and if the set/reset is synchronous +or asynchrouns exist for each logic cell individually. +
+ ++Each LUT i has four input wires lutff_i/in_0 to lutff_i/in_3. Input +lutff_i/in_3 can be configured to be driven by the carry output of the previous logic cell, +or by carry_in_mux in case of i=0. Input lutff_i/in_2 can be configured to +be driven by the output of the previous LUT for i>0. The LUT uses its 4 input signals to +calculate lutff_i/out. +
+ ++The carry unit calculates lutff_i/cout = lutff_i/in_1 + lutff_i/in_2 + lutff_(i-1)/cout > 1. In case of i=0, carry_in_mux is used as third input. carry_in_mux can be configured to be constant 0, 1 or the lutff_7/cout signal from the logic tile below. +
+ ++Part of the functionality described above is documented as part of the routing +bitstream documentation (see the buffers for luttff_ inputs). The NegClk +bit switches all 8 FFs in the tile to negative edge mode. The CarryInSet +bit drives the carry_in_mux high (it defaults to low when not driven via the buffer from +carry_in). +
+ ++The remaining functions of the logic cell are configured via the LC_i bits. This +are 20 bit per logic cell. We have arbitrarily labeld those bits as follows: +
+ ++
Label | LC_0 | LC_1 | LC_2 | LC_3 | LC_4 | LC_5 | LC_6 | LC_7 |
---|---|---|---|---|---|---|---|---|
LC_i[0] | B0[36] | B2[36] | B4[36] | B6[36] | B8[36] | B10[36] | B12[36] | B14[36] |
LC_i[1] | B0[37] | B2[37] | B4[37] | B6[37] | B8[37] | B10[37] | B12[37] | B14[37] |
LC_i[2] | B0[38] | B2[38] | B4[38] | B6[38] | B8[38] | B10[38] | B12[38] | B14[38] |
LC_i[3] | B0[39] | B2[39] | B4[39] | B6[39] | B8[39] | B10[39] | B12[39] | B14[39] |
LC_i[4] | B0[40] | B2[40] | B4[40] | B6[40] | B8[40] | B10[40] | B12[40] | B14[40] |
LC_i[5] | B0[41] | B2[41] | B4[41] | B6[41] | B8[41] | B10[41] | B12[41] | B14[41] |
LC_i[6] | B0[42] | B2[42] | B4[42] | B6[42] | B8[42] | B10[42] | B12[42] | B14[42] |
LC_i[7] | B0[43] | B2[43] | B4[43] | B6[43] | B8[43] | B10[43] | B12[43] | B14[43] |
LC_i[8] | B0[44] | B2[44] | B4[44] | B6[44] | B8[44] | B10[44] | B12[44] | B14[44] |
LC_i[9] | B0[45] | B2[45] | B4[45] | B6[45] | B8[45] | B10[45] | B12[45] | B14[45] |
LC_i[10] | B1[36] | B3[36] | B5[36] | B7[36] | B9[36] | B11[36] | B13[36] | B15[36] |
LC_i[11] | B1[37] | B3[37] | B5[37] | B7[37] | B9[37] | B11[37] | B13[37] | B15[37] |
LC_i[12] | B1[38] | B3[38] | B5[38] | B7[38] | B9[38] | B11[38] | B13[38] | B15[38] |
LC_i[13] | B1[39] | B3[39] | B5[39] | B7[39] | B9[39] | B11[39] | B13[39] | B15[39] |
LC_i[14] | B1[40] | B3[40] | B5[40] | B7[40] | B9[40] | B11[40] | B13[40] | B15[40] |
LC_i[15] | B1[41] | B3[41] | B5[41] | B7[41] | B9[41] | B11[41] | B13[41] | B15[41] |
LC_i[16] | B1[42] | B3[42] | B5[42] | B7[42] | B9[42] | B11[42] | B13[42] | B15[42] |
LC_i[17] | B1[43] | B3[43] | B5[43] | B7[43] | B9[43] | B11[43] | B13[43] | B15[43] |
LC_i[18] | B1[44] | B3[44] | B5[44] | B7[44] | B9[44] | B11[44] | B13[44] | B15[44] |
LC_i[19] | B1[45] | B3[45] | B5[45] | B7[45] | B9[45] | B11[45] | B13[45] | B15[45] |
+LC_i[8] is the CarryEnable bit. This bit must be set if the carry logic is used. +
+ ++LC_i[9] is the DffEnable bit. It enables the output flip-flop for the LUT. +
+ ++LC_i[18] is the Set_NoReset bit. When this bit is set then the set/reset signal will set, not reset the flip-flop. +
+ ++LC_i[19] is the AsyncSetReset bit. When this bit is set then the set/reset signal is asynchronous to the clock. +
+ ++The LUT implements the following truth table: +
+ ++
in_3 | in_2 | in_1 | in_0 | out |
---|---|---|---|---|
0 | 0 | 0 | 0 | LC_i[4] |
0 | 0 | 0 | 1 | LC_i[14] |
0 | 0 | 1 | 0 | LC_i[15] |
0 | 0 | 1 | 1 | LC_i[5] |
0 | 1 | 0 | 0 | LC_i[6] |
0 | 1 | 0 | 1 | LC_i[16] |
0 | 1 | 1 | 0 | LC_i[17] |
0 | 1 | 1 | 1 | LC_i[7] |
1 | 0 | 0 | 0 | LC_i[3] |
1 | 0 | 0 | 1 | LC_i[13] |
1 | 0 | 1 | 0 | LC_i[12] |
1 | 0 | 1 | 1 | LC_i[2] |
1 | 1 | 0 | 0 | LC_i[1] |
1 | 1 | 0 | 1 | LC_i[11] |
1 | 1 | 1 | 0 | LC_i[10] |
1 | 1 | 1 | 1 | LC_i[0] |
+LUT inputs that are not connected to anything are driven low. The set/reset +signal is also driven low if not connected to any other driver, and the clock +enable signal is driven high when left unconnected. +
+ diff --git a/docs/ram_tile.html b/docs/ram_tile.html new file mode 100644 index 0000000..3121f57 --- /dev/null +++ b/docs/ram_tile.html @@ -0,0 +1,95 @@ ++Project IceStorm aims at documenting the bitstream format of Lattice iCE40 +FPGAs and providing simple tools for analyzing and creating bitstream files. +This is work in progress. +
+ ++Regarding the Span-4 and Span-12 Wires a RAM tile behaves exactly like a LOGIC tile. So for simple +applications that do not need the block ram resources, the RAM tiles can be handled like a LOGIC +tiles without logic cells in them. +
+ ++A pair or RAM tiles (odd and even y-coordinates) provides an interface to a block ram cell. Like with +LOGIC tiles, signals entering the RAM tile have to be routed over local tracks to the block ram +inputs. Tiles with odd y-coordinates are "bottom" RAM Tiles (RAMB Tiles), and tiles with even y-coordinates +are "top" RAM Tiles (RAMT Tiles). Each pair of RAMB/RAMT tiles implements a SB_RAM40_4K cell. The +cell ports are spread out over the two tiles as follows: +
+ ++
SB_RAM40_4K | RAMB Tile | RAMT Tile |
---|---|---|
RDATA[15:0] | RDATA[7:0] | RDATA[15:8] |
RADDR[10:0] | - | RADDR[10:0] |
WADDR[10:0] | WADDR[10:0] | - |
MASK[15:0] | MASK[7:0] | MASK[15:8] |
WDATA[15:0] | WDATA[7:0] | WDATA[15:8] |
RCLKE | - | RCLKE |
RCLK | - | RCLK |
RE | - | RE |
WCLKE | WCLKE | - |
WCLK | WCLK | - |
WE | WE | - |
+The configuration bit RamConfig PowerUp in the RAMB tile enables the memory. This bit +is active-low in 1k chips, i.e. an unused RAM block has only this bit set. Note that icebox_explain.py +will ignore all RAMB tiles that only have the RamConfig PowerUp bit set. +
+ ++In 8k chips the RamConfig PowerUp bit is active-high. So an unused RAM block has all bits cleared +in the 8k config bitstream. +
+ ++The RamConfig CBIT_* bits in the RAMT tile configure the read/write width of the +memory. Those bits map to the SB_RAM40_4K cell parameters as follows: +
+ ++
SB_RAM40_4K | RAMT Config Bit |
---|---|
WRITE_MODE[0] | RamConfig CBIT_0 |
WRITE_MODE[1] | RamConfig CBIT_1 |
READ_MODE[0] | RamConfig CBIT_2 |
READ_MODE[1] | RamConfig CBIT_3 |
+The read/write mode selects the width of the read/write port: +
+ ++
MODE | DATA Width | Used WDATA/RDATA Bits |
---|---|---|
0 | 16 | 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 |
1 | 8 | 14, 12, 10, 8, 6, 4, 2, 0 |
2 | 4 | 13, 9, 5, 1 |
3 | 2 | 11, 3 |
+The NegClk bit in the RAMB tile negates the polarity of the WCLK port, +and the NegClk bit in the RAMT tile negates the polarity of the RCLK port. +
+ ++A logic tile sends the output of its eight logic cells to its neighbour tiles. A RAM tile does the same thing +with the RDATA outputs. Each RAMB tile exports its RDATA[7:0] outputs and each RAMT tile +exports its RDATA[15:8] outputs via this mechanism. +
+ diff --git a/docs/sp4h.svg b/docs/sp4h.svg new file mode 100644 index 0000000..cd074eb --- /dev/null +++ b/docs/sp4h.svg @@ -0,0 +1,2076 @@ + + + + diff --git a/docs/sp4v.svg b/docs/sp4v.svg new file mode 100644 index 0000000..2d4a5b0 --- /dev/null +++ b/docs/sp4v.svg @@ -0,0 +1,3982 @@ + + + + -- cgit v1.2.3