% IEEEtran howto: % http://ftp.univie.ac.at/packages/tex/macros/latex/contrib/IEEEtran/IEEEtran_HOWTO.pdf \documentclass[9pt,technote,a4paper]{IEEEtran} \usepackage[T1]{fontenc} % required for luximono! \usepackage[scaled=0.8]{luximono} % typewriter font with bold face % To install the luximono font files: % getnonfreefonts-sys --all or % getnonfreefonts-sys luximono % % when there are trouble you might need to: % - Create /etc/texmf/updmap.d/99local-luximono.cfg % containing the single line: Map ul9.map % - Run update-updmap followed by mktexlsr and updmap-sys % % This commands must be executed as root with a root environment % (i.e. run "sudo su" and then execute the commands in the root % shell, don't just prefix the commands with "sudo"). \usepackage[unicode,bookmarks=false]{hyperref} \usepackage[english]{babel} \usepackage[utf8]{inputenc} \usepackage{amssymb} \usepackage{amsmath} \usepackage{amsfonts} \usepackage{units} \usepackage{nicefrac} \usepackage{eurosym} \usepackage{graphicx} \usepackage{verbatim} \usepackage{algpseudocode} \usepackage{scalefnt} \usepackage{xspace} \usepackage{color} \usepackage{colortbl} \usepackage{multirow} \usepackage{hhline} \usepackage{listings} \usepackage{float} \usepackage{tikz} \usetikzlibrary{calc} \usetikzlibrary{arrows} \usetikzlibrary{scopes} \usetikzlibrary{through} \usetikzlibrary{shapes.geometric} \def\FIXME{{\color{red}\bf FIXME}} \lstset{basicstyle=\ttfamily,frame=trBL,xleftmargin=0.7cm,xrightmargin=0.2cm,numbers=left} \begin{document} \title{Yosys Application Note 011: \\ Interactive Design Investigation} \author{Clifford Wolf \\ Original Version December 2013} \maketitle \begin{abstract} Yosys \cite{yosys} can be a great environment for building custom synthesis flows. It can also be an excellent tool for teaching and learning Verilog based RTL synthesis. In both applications it is of great importance to be able to analyze the designs it produces easily. This Yosys application note covers the generation of circuit diagrams with the Yosys {\tt show} command, the selection of interesting parts of the circuit using the {\tt select} command, and briefly discusses advanced investigation commands for evaluating circuits and solving SAT problems. \end{abstract} \section{Installation and Prerequisites} This Application Note is based on the Yosys \cite{yosys} GIT Rev. {\tt 2b90ba1} from 2013-12-08. The {\tt README} file covers how to install Yosys. The {\tt show} command requires a working installation of GraphViz \cite{graphviz} and \cite{xdot} for generating the actual circuit diagrams. \section{Overview} This application note is structured as follows: Sec.~\ref{intro_show} introduces the {\tt show} command and explains the symbols used in the circuit diagrams generated by it. Sec.~\ref{navigate} introduces additional commands used to navigate in the design, select portions of the design, and print additional information on the elements in the design that are not contained in the circuit diagrams. Sec.~\ref{poke} introduces commands to evaluate the design and solve SAT problems within the design. Sec.~\ref{conclusion} concludes the document and summarizes the key points. \section{Introduction to the {\tt show} command} \label{intro_show} \begin{figure}[b] \begin{lstlisting} $ cat example.ys read_verilog example.v show -pause proc show -pause opt show -pause $ cat example.v module example(input clk, a, b, c, output reg [1:0] y); always @(posedge clk) if (c) y <= c ? a + b : 2'd0; endmodule \end{lstlisting} \caption{Yosys script with {\tt show} commands and example design} \label{example_src} \end{figure} \begin{figure}[b!] \includegraphics[width=\linewidth]{APPNOTE_011_Design_Investigation/example_00.pdf} \includegraphics[width=\linewidth]{APPNOTE_011_Design_Investigation/example_01.pdf} \includegraphics[width=\linewidth]{APPNOTE_011_Design_Investigation/example_02.pdf} \caption{Output of the three {\tt show} commands from Fig.~\ref{example_src}} \label{example_out} \end{figure} The {\tt show} command generates a circuit diagram for the design in its current state. Various options can be used to change the appearance of the circuit diagram, set the name and format for the output file, and so forth. When called without any special options, it saves the circuit diagram in a temporary file and launches {\tt xdot} to display the diagram. Subsequent calls to {\tt show} re-use the {\tt xdot} instance (if still running). \subsection{A simple circuit} Fig.~\ref{example_src} shows a simple synthesis script and a Verilog file that demonstrate the usage of {\tt show} in a simple setting. Note that {\tt show} is called with the {\tt -pause} option, that halts execution of the Yosys script until the user presses the Enter key. The {\tt show -pause} command also allows the user to enter an interactive shell to further investigate the circuit before continuing synthesis. So this script, when executed, will show the design after each of the three synthesis commands. The generated circuit diagrams are shown in Fig.~\ref{example_out}. The first diagram (from top to bottom) shows the design directly after being read by the Verilog front-end. Input and output ports are displayed as octagonal shapes. Cells are displayed as rectangles with inputs on the left and outputs on the right side. The cell labels are two lines long: The first line contains a unique identifier for the cell and the second line contains the cell type. Internal cell types are prefixed with a dollar sign. The Yosys manual contains a chapter on the internal cell library used in Yosys. Constants are shown as ellipses with the constant value as label. The syntax {\tt '} is used for for constants that are not 32-bit wide and/or contain bits that are not 0 or 1 (i.e. {\tt x} or {\tt z}). Ordinary 32-bit constants are written using decimal numbers. Single-bit signals are shown as thin arrows pointing from the driver to the load. Signals that are multiple bits wide are shown as think arrows. Finally {\it processes\/} are shown in boxes with round corners. Processes are Yosys' internal representation of the decision-trees and synchronization events modelled in a Verilog {\tt always}-block. The label reads {\tt PROC} followed by a unique identifier in the first line and contains the source code location of the original {\tt always}-block in the 2nd line. Note how the multiplexer from the {\tt ?:}-expression is represented as a {\tt \$mux} cell but the multiplexer from the {\tt if}-statement is yet still hidden within the process. \medskip The {\tt proc} command transforms the process from the first diagram into a multiplexer and a d-type flip-flip, which brings us to the 2nd diagram. The Rhombus shape to the right is a dangling wire. (Wire nodes are only shown if they are dangling or have ``public'' names, for example names assigned from the Verilog input.) Also note that the design now contains two instances of a {\tt BUF}-node. This are artefacts left behind by the {\tt proc}-command. It is quite usual to see such artefacts after calling commands that perform changes in the design, as most commands only care about doing the transformation in the least complicated way, not about cleaning up after them. The next call to {\tt clean} (or {\tt opt}, which includes {\tt clean} as one of its operations) will clean up this artefacts. This operation is so common in Yosys scripts that it can simply be abbreviated with the {\tt ;;} token, which doubles as separator for commands. Unless one wants to specifically analyze this artefacts left behind some operations, it is therefore recommended to always call {\tt clean} before calling {\tt show}. \medskip In this script we directly call {\tt opt} as next step, which finally leads us to the 3rd diagram in Fig.~\ref{example_out}. Here we see that the {\tt opt} command not only has removed the artifacts left behind by {\tt proc}, but also determined correctly that it can remove the first {\tt \$mux} cell without changing the behavior of the circuit. \begin{figure}[b!] \includegraphics[width=\linewidth,trim=0 2cm 0 0]{APPNOTE_011_Design_Investigation/splice.pdf} \caption{Output of {\tt yosys -p 'proc; opt; show' splice.v}} \label{splice_dia} \end{figure} \begin{figure}[b!] \lstinputlisting{APPNOTE_011_Design_Investigation/splice.v} \caption{\tt splice.v} \label{splice_src} \end{figure} \begin{figure}[t!] \includegraphics[height=\linewidth]{APPNOTE_011_Design_Investigation/cmos_00.pdf} \includegraphics[width=\linewidth]{APPNOTE_011_Design_Investigation/cmos_01.pdf} \caption{Effects of {\tt splitnets} command and of providing a cell library. (The circuit is a half-adder built from simple CMOS gates.)} \label{splitnets_libfile} \end{figure} \subsection{Break-out boxes for signal vectors} As has been indicated by the last example, Yosys is can manage signal vectors (aka. multi-bit wires or buses) as native objects. This provides great advantages when analyzing circuits that operate on wide integers. But it also introduces some additional complexity when the individual bits of of a signal vector are accessed. The example show in Fig.~\ref{splice_dia} and \ref{splice_src} demonstrates how such circuits are visualized by the {\tt show} command. The key elements in understanding this circuit diagram are of course the boxes with round corners and rows labeled {\tt : -- :}. Each of this boxes has one signal per row on one side and a common signal for all rows on the other side. The {\tt :} tuples specify which bits of the signals are broken out and connected. So the top row of the box connecting the signals {\tt a} and {\tt x} indicates that the bit 0 (i.e. the range 0:0) from signal {\tt a} is connected to bit 1 (i.e. the range 1:1) of signal {\tt x}. Lines connecting such boxes together and lines connecting such boxes to cell ports have a slightly different look to emphasise that they are not actual signal wires but a necessity of the graphical representation. This distinction seems like a technicality, until one wants to debug a problem related to the way Yosys internally represents signal vectors, for example when writing custom Yosys commands. \subsection{Gate level netlists} Finally Fig.~\ref{splitnets_libfile} shows two common pitfalls when working with designs mapped to a cell library. The top figure has two problems: First Yosys did not have access to the cell library when this diagram was generated, resulting in all cell ports defaulting to being inputs. This is why all ports are drawn on the left side the cells are awkwardly arranged in a large column. Secondly the two-bit vector {\tt y} requires breakout-boxes for its individual bits, resulting in an unnecessary complex diagram. For the 2nd diagram Yosys has been given a description of the cell library as Verilog file containing blackbox modules. There are two ways to load cell descriptions into Yosys: First the Verilog file for the cell library can be passed directly to the {\tt show} command using the {\tt -lib } option. Secondly it is possible to load cell libraries into the design with the {\tt read\_verilog -lib } command. The 2nd method has the great advantage that the library only needs to be loaded once and can then be used in all subsequent calls to the {\tt show} command. In addition to that, the 2nd diagram was generated after {\tt splitnet -ports} was run on the design. This command splits all signal vectors into individual signal bits, which is often desirable when looking at gate-level circuits. The {\tt -ports} option is required to also split module ports. Per default the command only operates on interior signals. \subsection{Miscellaneous notes} Per default the {\tt show} command outputs a temporary {\tt dot} file and launches {\tt xdot} to display it. The options {\tt -format}, {\tt -viewer} and {\tt -prefix} can be used to change format, viewer and filename prefix. Note that the {\tt pdf} and {\tt ps} format are the only formats that support plotting multiple modules in one run. In densely connected circuits it is sometimes hard to keep track of the individual signal wires. For this cases it can be useful to call {\tt show} with the {\tt -colors } argument, which randomly assigns colors to the nets. The integer (> 0) is used as seed value for the random color assignments. Sometimes it is necessary it try some values to find an assignment of colors that looks good. The command {\tt help show} prints a complete listing of all options supported by the {\tt show} command. \section{Navigating the design} \label{navigate} Plotting circuit diagrams for entire modules in the design brings us only helps in simple cases. For complex modules the generated circuit diagrams are just stupidly big and are no help at all. In such cases one first has to select the relevant portions of the circuit. In addition to {\it what\/} to display one also needs to carefully decide {\it when\/} to display it, with respect to the synthesis flow. In general it is a good idea to troubleshoot a circuit in the earliest state in which a problem can be reproduced. So if, for example, the internal state before calling the {\tt techmap} command already fails to verify, it is better to troubleshoot the coarse-grain version of the circuit before {\tt techmap} than the gate-level circuit after {\tt techmap}. \medskip Note: It is generally recommended to verify the internal state of a design by writing it to a Verilog file using {\tt write\_verilog -noexpr} and using the simulation models from {\tt simlib.v} and {\tt simcells.v} from the Yosys data directory (as printed by {\tt yosys-config -{}-datdir}). \subsection{Interactive Navigation} \begin{figure} \begin{lstlisting} yosys> ls 1 modules: example yosys> cd example yosys [example]> ls 7 wires: $0\y[1:0] $add$example.v:5$2_Y a b c clk y 3 cells: $add$example.v:5$2 $procdff$7 $procmux$5 \end{lstlisting} \caption{Demonstration of {\tt ls} and {\tt cd} using {\tt example.v} from Fig.~\ref{example_src}} \label{lscd} \end{figure} \begin{figure}[b] \begin{lstlisting} attribute \src "example.v:5" cell $add $add$example.v:5$2 parameter \A_SIGNED 0 parameter \A_WIDTH 1 parameter \B_SIGNED 0 parameter \B_WIDTH 1 parameter \Y_WIDTH 2 connect \A \a connect \B \b connect \Y $add$example.v:5$2_Y end \end{lstlisting} \caption{Output of {\tt dump \$2} using the design from Fig.~\ref{example_src} and Fig.~\ref{example_out}} \label{dump2} \end{figure} Once the right state within the synthesis flow for debugging the circuit has been identified, it is recommended to simply add the {\tt shell} command to the matching place in the synthesis script. This command will stop the synthesis at the specified moment and go to shell mode, where the user can interactively enter commands. For most cases, the shell will start with the whole design selected (i.e. when the synthesis script does not already narrow the selection). The command {\tt ls} can now be used to create a list of all modules. The command {\tt cd} can be used to switch to one of the modules (type {\tt cd ..} to switch back). Now the {\tt ls} command lists the objects within that module. Fig.~\ref{lscd} demonstrates this using the design from Fig.~\ref{example_src}. There is a thing to note in Fig.~\ref{lscd}: We can see that the cell names from Fig.~\ref{example_out} are just abbreviations of the actual cell names, namely the part after the last dollar-sign. Most auto-generated names (the ones starting with a dollar sign) are rather long and contains some additional information on the origin of the named object. But in most cases those names can simply be abbreviated using the last part. Usually all interactive work is done with one module selected using the {\tt cd} command. But it is also possible to work from the design-context ({\tt cd ..}). In this case all object names must be prefixed with {\tt /}. For example {\tt a*/b*} would refer to all objects whose names start with {\tt b} from all modules whose names start with {\tt a}. The {\tt dump} command can be used to print all information about an object. For example {\tt dump \$2} will print Fig.~\ref{dump2}. This can for example be useful to determine the names of nets connected to cells, as the net-names are usually suppressed in the circuit diagram if they are auto-generated. For the remainder of this document we will assume that the commands are run from module-context and not design-context. \subsection{Working with selections} \begin{figure}[t] \includegraphics[width=\linewidth]{APPNOTE_011_Design_Investigation/example_03.pdf} \caption{Output of {\tt show} after {\tt select \$2} or {\tt select t:\$add} (see also Fig.~\ref{example_out})} \label{seladd} \end{figure} When a module is selected using the {\tt cd} command, all commands (with a few exceptions, such as the {\tt read\_*} and {\tt write\_*} commands) operate only on the selected module. This can also be useful for synthesis scripts where different synthesis strategies should be applied to different modules in the design. But for most interactive work we want to further narrow the set of selected objects. This can be done using the {\tt select} command. For example, if the command {\tt select \$2} is executed, a subsequent {\tt show} command will yield the diagram shown in Fig.~\ref{seladd}. Note that the nets are now displayed in ellipses. This indicates that they are not selected, but only shown because the diagram contains a cell that is connected to the net. This of course makes no difference for the circuit that is shown, but it can be a useful information when manipulating selections. Objects can not only be selected by their name but also by other properties. For example {\tt select t:\$add} will select all cells of type {\tt \$add}. In this case this is also yields the diagram shown in Fig.~\ref{seladd}. \begin{figure}[b] \lstinputlisting{APPNOTE_011_Design_Investigation/foobaraddsub.v} \caption{Test module for operations on selections} \label{foobaraddsub} \end{figure} The output of {\tt help select} contains a complete syntax reference for matching different properties. Many commands can operate on explicit selections. For example the command {\tt dump t:\$add} will print information on all {\tt \$add} cells in the active module. Whenever a command has {\tt [selection]} as last argument in its usage help, this means that it will use the engine behind the {\tt select} command to evaluate additional arguments and use the resulting selection instead of the selection created by the last {\tt select} command. Normally the {\tt select} command overwrites a previous selection. The commands {\tt select -add} and {\tt select -del} can be used to add or remove objects from the current selection. The command {\tt select -clear} can be used to reset the selection to the default, which is a complete selection of everything in the current module. \subsection{Operations on selections} \begin{figure}[t] \lstinputlisting{APPNOTE_011_Design_Investigation/sumprod.v} \caption{Another test module for operations on selections} \label{sumprod} \end{figure} \begin{figure}[b] \includegraphics[width=\linewidth]{APPNOTE_011_Design_Investigation/sumprod_00.pdf} \caption{Output of {\tt show a:sumstuff} on Fig.~\ref{sumprod}} \label{sumprod_00} \end{figure} The {\tt select} command is actually much more powerful than it might seem on the first glimpse. When it is called with multiple arguments, each argument is evaluated and pushed separately on a stack. After all arguments have been processed it simply creates the union of all elements on the stack. So the following command will select all {\tt \$add} cells and all objects with the {\tt foo} attribute set: \begin{verbatim} select t:$add a:foo \end{verbatim} (Try this with the design shown in Fig.~\ref{foobaraddsub}. Use the {\tt select -list} command to list the current selection.) In many cases simply adding more and more stuff to the selection is an ineffective way of selecting the interesting part of the design. Special arguments can be used to combine the elements on the stack. For example the {\tt \%i} arguments pops the last two elements from the stack, intersects them, and pushes the result back on the stack. So the following command will select all {\$add} cells that have the {\tt foo} attribute set: \begin{verbatim} select t:$add a:foo %i \end{verbatim} The listing in Fig.~\ref{sumprod} uses the Yosys non-standard {\tt \{* ... *\}} syntax to set the attribute {\tt sumstuff} on all cells generated by the first assign statement. (This works on arbitrary large blocks of Verilog code an can be used to mark portions of code for analysis.) Selecting {\tt a:sumstuff} in this module will yield the circuit diagram shown in Fig.~\ref{sumprod_00}. As only the cells themselves are selected, but not the temporary wire {\tt \$1\_Y}, the two adders are shown as two disjunct parts. This can be very useful for global signals like clock and reset signals: just unselect them using a command such as {\tt select -del clk rst} and each cell using them will get its own net label. In this case however we would like to see the cells connected properly. This can be achieved using the {\tt \%x} action, that broadens the selection, i.e. for each selected wire it selects all cells connected to the wire and vice versa. So {\tt show a:sumstuff \%x} yields the diagram shown in Fig.~\ref{sumprod_01}. \begin{figure}[t] \includegraphics[width=\linewidth]{APPNOTE_011_Design_Investigation/sumprod_01.pdf} \caption{Output of {\tt show a:sumstuff \%x} on Fig.~\ref{sumprod}} \label{sumprod_01} \end{figure} \subsection{Selecting logic cones} Fig.~\ref{sumprod_01} shows what is called the {\it input cone\/} of {\tt sum}, i.e. all cells and signals that are used to generate the signal {\tt sum}. The {\tt \%ci} action can be used to select the input cones of all object in the top selection in the stack maintained by the {\tt select} command. As the {\tt \%x} action, this commands broadens the selection by one ``step''. But this time the operation only works against the direction of data flow. That means, wires only select cells via output ports and cells only select wires via input ports. Fig.~\ref{select_prod} show the sequence of diagrams generated by the following commands: \begin{verbatim} show prod show prod %ci show prod %ci %ci show prod %ci %ci %ci \end{verbatim} When selecting many levels of logic, repeating {\tt \%ci} over and over again can be a bit dull. So there is a shortcut for that: the number of iterations can be appended to the action. So for example the action {\tt \%ci3} is identical to performing the {\tt \%ci} action three times. The action {\tt \%ci*} performs the {\tt \%ci} action over and over again until it has no effect anymore. \begin{figure}[t] \hfill \includegraphics[width=4cm,trim=0 1cm 0 1cm]{APPNOTE_011_Design_Investigation/sumprod_02.pdf} \\ \includegraphics[width=\linewidth,trim=0 0cm 0 1cm]{APPNOTE_011_Design_Investigation/sumprod_03.pdf} \\ \includegraphics[width=\linewidth,trim=0 0cm 0 1cm]{APPNOTE_011_Design_Investigation/sumprod_04.pdf} \\ \includegraphics[width=\linewidth,trim=0 2cm 0 1cm]{APPNOTE_011_Design_Investigation/sumprod_05.pdf} \\ \caption{Objects selected by {\tt select prod \%ci...}} \label{select_prod} \end{figure} \medskip In most cases there are certain cell types and/or ports that should not be considered for the {\tt \%ci} action, or we only want to follow certain cell types and/or ports. This can be achieved using additional patterns that can be appended to the {\tt \%ci} action. Lets consider the design from Fig.~\ref{memdemo_src}. It serves no purpose other than being a non-trivial circuit for demonstrating some of the advanced Yosys features. We synthesize the circuit using {\tt proc; opt; memory; opt} and change to the {\tt memdemo} module with {\tt cd memdemo}. If we type {\tt show} now we see the diagram shown in Fig.~\ref{memdemo_00}. \begin{figure}[b!] \lstinputlisting{APPNOTE_011_Design_Investigation/memdemo.v} \caption{Demo circuit for demonstrating some advanced Yosys features} \label{memdemo_src} \end{figure} \begin{figure*}[t] \includegraphics[width=\linewidth,trim=0 0cm 0 0cm]{APPNOTE_011_Design_Investigation/memdemo_00.pdf} \\ \caption{Complete circuit diagram for the design shown in Fig.~\ref{memdemo_src}} \label{memdemo_00} \end{figure*} But maybe we are only interested in the tree of multiplexers that select the output value. In order to get there, we would start by just showing the output signal and its immediate predecessors: \begin{verbatim} show y %ci2 \end{verbatim} From this we would learn that {\tt y} is driven by a {\tt \$dff cell}, that {\tt y} is connected to the output port {\tt Q}, that the {\tt clk} signal goes into the {\tt CLK} input port of the cell, and that the data comes from a auto-generated wire into the input {\tt D} of the flip-flop cell. As we are not interested in the clock signal we add an additional pattern to the {\tt \%ci} action, that tells it to only follow ports {\tt Q} and {\tt D} of {\tt \$dff} cells: \begin{verbatim} show y %ci2:+$dff[Q,D] \end{verbatim} To add a pattern we add a colon followed by the pattern to the {\tt \%ci} action. The pattern it self starts with {\tt -} or {\tt +}, indicating if it is an include or exclude pattern, followed by an optional comma separated list of cell types, followed by an optional comma separated list of port names in square brackets. Since we know that the only cell considered in this case is a {\tt \$dff} cell, we could as well only specify the port names: \begin{verbatim} show y %ci2:+[Q,D] \end{verbatim} Or we could decide to tell the {\tt \%ci} action to not follow the {\tt CLK} input: \begin{verbatim} show y %ci2:-[CLK] \end{verbatim} \begin{figure}[b] \includegraphics[width=\linewidth,trim=0 0cm 0 0cm]{APPNOTE_011_Design_Investigation/memdemo_01.pdf} \\ \caption{Output of {\tt show y \%ci2:+\$dff[Q,D] \%ci*:-\$mux[S]:-\$dff}} \label{memdemo_01} \end{figure} Next we would investigate the next logic level by adding another {\tt \%ci2} to the command: \begin{verbatim} show y %ci2:-[CLK] %ci2 \end{verbatim} From this we would learn that the next cell is a {\tt \$mux} cell and we would add additional pattern to narrow the selection on the path we are interested. In the end we would end up with a command such as \begin{verbatim} show y %ci2:+$dff[Q,D] %ci*:-$mux[S]:-$dff \end{verbatim} in which the first {\tt \%ci} jumps over the initial d-type flip-flop and the 2nd action selects the entire input cone without going over multiplexer select inputs and flip-flop cells. The diagram produces by this command is shown in Fig.~\ref{memdemo_01}. \medskip Similar to {\tt \%ci} exists an action {\tt \%co} to select output cones that accepts the same syntax for pattern and repetition. The {\tt \%x} action mentioned previously also accepts this advanced syntax. This actions for traversing the circuit graph, combined with the actions for boolean operations such as intersection ({\tt \%i}) and difference ({\tt \%d}) are powerful tools for extracting the relevant portions of the circuit under investigation. See {\tt help select} for a complete list of actions available in selections. \subsection{Storing and recalling selections} The current selection can be stored in memory with the command {\tt select -set }. It can later be recalled using {\tt select @}. In fact, the {\tt @} expression pushes the stored selection on the stack maintained by the {\tt select} command. So for example \begin{verbatim} select @foo @bar %i \end{verbatim} will select the intersection between the stored selections {\tt foo} and {\tt bar}. \medskip In larger investigation efforts it is highly recommended to maintain a script that sets up relevant selections, so they can easily be recalled, for example when Yosys needs to be re-run after a design or source code change. The {\tt history} command can be used to list all recent interactive commands. This feature can be useful for creating such a script from the commands used in an interactive session. \section{Advanced investigation techniques} \label{poke} When working with very large modules, it is often not enough to just select the interesting part of the module. Instead it can be useful to extract the interesting part of the circuit into a separate module. This can for example be useful if one wants to run a series of synthesis commands on the critical part of the module and wants to carefully read all the debug output created by the commands in order to spot a problem. This kind of troubleshooting is much easier if the circuit under investigation is encapsulated in a separate module. Fig.~\ref{submod} shows how the {\tt submod} command can be used to split the circuit from Fig.~\ref{memdemo_src} and \ref{memdemo_00} into its components. The {\tt -name} option is used to specify the name of the new module and also the name of the new cell in the current module. \begin{figure}[t] \includegraphics[width=\linewidth,trim=0 1.3cm 0 0cm]{APPNOTE_011_Design_Investigation/submod_00.pdf} \\ \centerline{\tt memdemo} \vskip1em\par \includegraphics[width=\linewidth,trim=0 1.3cm 0 0cm]{APPNOTE_011_Design_Investigation/submod_01.pdf} \\ \centerline{\tt scramble} \vskip1em\par \includegraphics[width=\linewidth,trim=0 1.3cm 0 0cm]{APPNOTE_011_Design_Investigation/submod_02.pdf} \\ \centerline{\tt outstage} \vskip1em\par \includegraphics[width=\linewidth,trim=0 1.3cm 0 0cm]{APPNOTE_011_Design_Investigation/submod_03.pdf} \\ \centerline{\tt selstage} \vskip1em\par \begin{lstlisting}[basicstyle=\ttfamily\scriptsize] select -set outstage y %ci2:+$dff[Q,D] %ci*:-$mux[S]:-$dff select -set selstage y %ci2:+$dff[Q,D] %ci*:-$dff @outstage %d select -set scramble mem* %ci2 %ci*:-$dff mem* %d @selstage %d submod -name scramble @scramble submod -name outstage @outstage submod -name selstage @selstage \end{lstlisting} \caption{The circuit from Fig.~\ref{memdemo_src} and \ref{memdemo_00} broken up using {\tt submod}} \label{submod} \end{figure} \subsection{Evaluation of combinatorial circuits} The {\tt eval} command can be used to evaluate combinatorial circuits. For example (see Fig.~\ref{submod} for the circuit diagram of {\tt selstage}): {\scriptsize \begin{verbatim} yosys [selstage]> eval -set s2,s1 4'b1001 -set d 4'hc -show n2 -show n1 9. Executing EVAL pass (evaluate the circuit given an input). Full command line: eval -set s2,s1 4'b1001 -set d 4'hc -show n2 -show n1 Eval result: \n2 = 2'10. Eval result: \n1 = 2'10. \end{verbatim} \par} So the {\tt -set} option is used to set input values and the {\tt -show} option is used to specify the nets to evaluate. If no {\tt -show} option is specified, all selected output ports are used per default. If a necessary input value is not given, an error is produced. The option {\tt -set-undef} can be used to instead set all unspecified input nets to undef ({\tt x}). The {\tt -table} option can be used to create a truth table. For example: {\scriptsize \begin{verbatim} yosys [selstage]> eval -set-undef -set d[3:1] 0 -table s1,d[0] 10. Executing EVAL pass (evaluate the circuit given an input). Full command line: eval -set-undef -set d[3:1] 0 -table s1,d[0] \s1 \d [0] | \n1 \n2 ---- ------ | ---- ---- 2'00 1'0 | 2'00 2'00 2'00 1'1 | 2'xx 2'00 2'01 1'0 | 2'00 2'00 2'01 1'1 | 2'xx 2'01 2'10 1'0 | 2'00 2'00 2'10 1'1 | 2'xx 2'10 2'11 1'0 | 2'00 2'00 2'11 1'1 | 2'xx 2'11 Assumed undef (x) value for the following signals: \s2 \end{verbatim} } Note that the {\tt eval} command (as well as the {\tt sat} command discussed in the next sections) does only operate on flattened modules. It can not analyze signals that are passed through design hierarchy levels. So the {\tt flatten} command must be used on modules that instantiate other modules before this commands can be applied. \subsection{Solving combinatorial SAT problems} \begin{figure}[b] \lstinputlisting{APPNOTE_011_Design_Investigation/primetest.v} \caption{A simple miter circuit for testing if a number is prime. But it has a problem (see main text and Fig.~\ref{primesat}).} \label{primetest} \end{figure} \begin{figure*}[!t] \begin{lstlisting}[basicstyle=\ttfamily\small] yosys [primetest]> sat -prove ok 1 -set p 31 8. Executing SAT pass (solving SAT problems in the circuit). Full command line: sat -prove ok 1 -set p 31 Setting up SAT problem: Import set-constraint: \p = 16'0000000000011111 Final constraint equation: \p = 16'0000000000011111 Imported 6 cells to SAT database. Import proof-constraint: \ok = 1'1 Final proof equation: \ok = 1'1 Solving problem with 2790 variables and 8241 clauses.. SAT proof finished - model found: FAIL! ______ ___ ___ _ _ _ _ (_____ \ / __) / __) (_) | | | | _____) )___ ___ ___ _| |__ _| |__ _____ _| | _____ __| | | | ____/ ___) _ \ / _ (_ __) (_ __|____ | | || ___ |/ _ |_| | | | | | |_| | |_| || | | | / ___ | | || ____( (_| |_ |_| |_| \___/ \___/ |_| |_| \_____|_|\_)_____)\____|_| Signal Name Dec Hex Bin -------------------- ---------- ---------- --------------------- \a 15029 3ab5 0011101010110101 \b 4099 1003 0001000000000011 \ok 0 0 0 \p 31 1f 0000000000011111 yosys [primetest]> sat -prove ok 1 -set p 31 -set a[15:8],b[15:8] 0 9. Executing SAT pass (solving SAT problems in the circuit). Full command line: sat -prove ok 1 -set p 31 -set a[15:8],b[15:8] 0 Setting up SAT problem: Import set-constraint: \p = 16'0000000000011111 Import set-constraint: { \a [15:8] \b [15:8] } = 16'0000000000000000 Final constraint equation: { \a [15:8] \b [15:8] \p } = { 16'0000000000000000 16'0000000000011111 } Imported 6 cells to SAT database. Import proof-constraint: \ok = 1'1 Final proof equation: \ok = 1'1 Solving problem with 2790 variables and 8257 clauses.. SAT proof finished - no model found: SUCCESS! /$$$$$$ /$$$$$$$$ /$$$$$$$ /$$__ $$ | $$_____/ | $$__ $$ | $$ \ $$ | $$ | $$ \ $$ | $$ | $$ | $$$$$ | $$ | $$ | $$ | $$ | $$__/ | $$ | $$ | $$/$$ $$ | $$ | $$ | $$ | $$$$$$/ /$$| $$$$$$$$ /$$| $$$$$$$//$$ \____ $$$|__/|________/|__/|_______/|__/ \__/ \end{lstlisting} \caption{Experiments with the miter circuit from Fig.~\ref{primetest}. The first attempt of proving that 31 is prime failed because the SAT solver found a creative way of factorizing 31 using integer overflow.} \label{primesat} \end{figure*} Often the opposite of the {\tt eval} command is needed, i.e. the circuits output is given and we want to find the matching input signals. For small circuits with only a few input bits this can be accomplished by trying all possible input combinations, as it is done by the {\tt eval -table} command. For larger circuits however, Yosys provides the {\tt sat} command that uses a SAT \cite{CircuitSAT} solver \cite{MiniSAT} to solve this kind of problems. The {\tt sat} command works very similar to the {\tt eval} command. The main difference is that it is now also possible to set output values and find the corresponding input values. For Example: {\scriptsize \begin{verbatim} yosys [selstage]> sat -show s1,s2,d -set s1 s2 -set n2,n1 4'b1001 11. Executing SAT pass (solving SAT problems in the circuit). Full command line: sat -show s1,s2,d -set s1 s2 -set n2,n1 4'b1001 Setting up SAT problem: Import set-constraint: \s1 = \s2 Import set-constraint: { \n2 \n1 } = 4'1001 Final constraint equation: { \n2 \n1 \s1 } = { 4'1001 \s2 } Imported 3 cells to SAT database. Import show expression: { \s1 \s2 \d } Solving problem with 81 variables and 207 clauses.. SAT solving finished - model found: Signal Name Dec Hex Bin -------------------- ---------- ---------- --------------- \d 9 9 1001 \s1 0 0 00 \s2 0 0 00 \end{verbatim} } Note that the {\tt sat} command supports signal names in both arguments to the {\tt -set} option. In the above example we used {\tt -set s1 s2} to constraint {\tt s1} and {\tt s2} to be equal. When more complex constraints are needed, a wrapper circuit must be constructed that checks the constraints and signals if the constraint was met using an extra output port, which then can be forced to a value using the {\tt -set} option. (Such a circuit that contains the circuit under test plus additional constraint checking circuitry is called a {\it miter\/} circuit.) Fig.~\ref{primetest} shows a miter circuit that is supposed to be used as a prime number test. If {\tt ok} is 1 for all input values {\tt a} and {\tt b} for a given {\tt p}, then {\tt p} is prime, or at least that is the idea. The Yosys shell session shown in Fig.~\ref{primesat} demonstrates that SAT solvers can even find the unexpected solutions to a problem: Using integer overflow there actually is a way of ``factorizing'' 31. The clean solution would of course be to perform the test in 32 bits, for example by replacing {\tt p != a*b} in the miter with {\tt p != \{16'd0,a\}*b}, or by using a temporary variable for the 32 bit product {\tt a*b}. But as 31 fits well into 8 bits (and as the purpose of this document is to show off Yosys features) we can also simply force the upper 8 bits of {\tt a} and {\tt b} to zero for the {\tt sat} call, as is done in the second command in Fig.~\ref{primesat} (line 31). The {\tt -prove} option used in this example works similar to {\tt -set}, but tries to find a case in which the two arguments are not equal. If such a case is not found, the property is proven to hold for all inputs that satisfy the other constraints. It might be worth noting, that SAT solvers are not particularly efficient at factorizing large numbers. But if a small factorization problem occurs as part of a larger circuit problem, the Yosys SAT solver is perfectly capable of solving it. \subsection{Solving sequential SAT problems} \begin{figure}[t!] \begin{lstlisting}[basicstyle=\ttfamily\scriptsize] yosys [memdemo]> sat -seq 6 -show y -show d -set-init-undef \ -max_undef -set-at 4 y 1 -set-at 5 y 2 -set-at 6 y 3 6. Executing SAT pass (solving SAT problems in the circuit). Full command line: sat -seq 6 -show y -show d -set-init-undef -max_undef -set-at 4 y 1 -set-at 5 y 2 -set-at 6 y 3 Setting up time step 1: Final constraint equation: { } = { } Imported 29 cells to SAT database. Setting up time step 2: Final constraint equation: { } = { } Imported 29 cells to SAT database. Setting up time step 3: Final constraint equation: { } = { } Imported 29 cells to SAT database. Setting up time step 4: Import set-constraint for timestep: \y = 4'0001 Final constraint equation: \y = 4'0001 Imported 29 cells to SAT database. Setting up time step 5: Import set-constraint for timestep: \y = 4'0010 Final constraint equation: \y = 4'0010 Imported 29 cells to SAT database. Setting up time step 6: Import set-constraint for timestep: \y = 4'0011 Final constraint equation: \y = 4'0011 Imported 29 cells to SAT database. Setting up initial state: Final constraint equation: { \y \s2 \s1 \mem[3] \mem[2] \mem[1] \mem[0] } = 24'xxxxxxxxxxxxxxxxxxxxxxxx Import show expression: \y Import show expression: \d Solving problem with 10322 variables and 27881 clauses.. SAT model found. maximizing number of undefs. SAT solving finished - model found: Time Signal Name Dec Hex Bin ---- -------------------- ---------- ---------- --------------- init \mem[0] -- -- xxxx init \mem[1] -- -- xxxx init \mem[2] -- -- xxxx init \mem[3] -- -- xxxx init \s1 -- -- xx init \s2 -- -- xx init \y -- -- xxxx ---- -------------------- ---------- ---------- --------------- 1 \d 0 0 0000 1 \y -- -- xxxx ---- -------------------- ---------- ---------- --------------- 2 \d 1 1 0001 2 \y -- -- xxxx ---- -------------------- ---------- ---------- --------------- 3 \d 2 2 0010 3 \y 0 0 0000 ---- -------------------- ---------- ---------- --------------- 4 \d 3 3 0011 4 \y 1 1 0001 ---- -------------------- ---------- ---------- --------------- 5 \d -- -- 001x 5 \y 2 2 0010 ---- -------------------- ---------- ---------- --------------- 6 \d -- -- xxxx 6 \y 3 3 0011 \end{lstlisting} \caption{Solving a sequential SAT problem in the {\tt memdemo} module from Fig.~\ref{memdemo_src}.} \label{memdemo_sat} \end{figure} The SAT solver functionality in Yosys can not only be used to solve combinatorial problems, but can also solve sequential problems. Let's consider the entire {\tt memdemo} module from Fig.~\ref{memdemo_src} and suppose we want to know which sequence of input values for {\tt d} will cause the output {\tt y} to produce the sequence 1, 2, 3 from any initial state. Fig.~\ref{memdemo_sat} show the solution to this question, as produced by the following command: \begin{verbatim} sat -seq 6 -show y -show d -set-init-undef \ -max_undef -set-at 4 y 1 -set-at 5 y 2 -set-at 6 y 3 \end{verbatim} The {\tt -seq 6} option instructs the {\tt sat} command to solve a sequential problem in 6 time steps. (Experiments with lower number of steps have show that at least 3 cycles are necessary to bring the circuit in a state from which the sequence 1, 2, 3 can be produced.) The {\tt -set-init-undef} option tells the {\tt sat} command to initialize all registers to the undef ({\tt x}) state. The way the {\tt x} state is treated in Verilog will ensure that the solution will work for any initial state. The {\tt -max\_undef} option instructs the {\tt sat} command to find a solution with a maximum number of undefs. This way we can see clearly which inputs bits are relevant to the solution. Finally the three {\tt -set-at} options add constraints for the {\tt y} signal to play the 1, 2, 3 sequence, starting with time step 4. It is not surprising that the solution sets {\tt d = 0} in the first step, as this is the only way of setting the {\tt s1} and {\tt s2} registers to a known value. The input values for the other steps are a bit harder to work out manually, but the SAT solver finds the correct solution in an instant. \medskip There is much more to write about the {\tt sat} command. For example, there is a set of options that can be used to performs sequential proofs using temporal induction \cite{tip}. The command {\tt help sat} can be used to print a list of all options with short descriptions of their functions. \section{Conclusion} \label{conclusion} Yosys provides a wide range of functions to analyze and investigate designs. For many cases it is sufficient to simply display circuit diagrams, maybe use some additional commands to narrow the scope of the circuit diagrams to the interesting parts of the circuit. But some cases require more than that. For this applications Yosys provides commands that can be used to further inspect the behavior of the circuit, either by evaluating which output values are generated from certain input values ({\tt eval}) or by evaluation which input values and initial conditions can result in a certain behavior at the outputs ({\tt sat}). The SAT command can even be used to prove (or disprove) theorems regarding the circuit, in more advanced cases with the additional help of a miter circuit. This features can be powerful tools for the circuit designer using Yosys as a utility for building circuits and the software developer using Yosys as a framework for new algorithms alike. \begin{thebibliography}{9} \bibitem{yosys} Clifford Wolf. The Yosys Open SYnthesis Suite. \url{http://www.clifford.at/yosys/} \bibitem{graphviz} Graphviz - Graph Visualization Software. \url{http://www.graphviz.org/} \bibitem{xdot} xdot.py - an interactive viewer for graphs written in Graphviz's dot language. \url{https://github.com/jrfonseca/xdot.py} \bibitem{CircuitSAT} {\it Circuit satisfiability problem} on Wikipedia \url{http://en.wikipedia.org/wiki/Circuit_satisfiability} \bibitem{MiniSAT} MiniSat: a minimalistic open-source SAT solver. \url{http://minisat.se/} \bibitem{tip} Niklas Een and Niklas S\"orensson (2003). Temporal Induction by Incremental SAT Solving. \url{http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.8161} \end{thebibliography} \end{document} 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 1719 1720 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 1765 1766 1767 1768 1769 1770 1771 1772 1773 1774 1775 1776 1777 1778 1779 1780 1781 1782 1783 1784 1785 1786 1787 1788 1789 1790 1791 1792 1793 1794 1795 1796 1797 1798 1799 1800 1801 1802 1803 1804 1805 1806 1807 1808 1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 2029 2030 2031 2032 2033 2034 2035 2036 2037 2038 2039 2040 2041 2042 2043 2044 2045 2046 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 2157 2158 2159 2160 2161 2162 2163 2164 2165 2166 2167 2168 2169 2170 2171 2172 2173 2174 2175 2176 2177 2178 2179 2180 2181 2182 2183 2184 2185 2186 2187 2188 2189 2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 2314 2315 2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428
/******************************************************************************
 * x86_emulate.c
 * 
 * Generic x86 (32-bit and 64-bit) instruction decoder and emulator.
 * 
 * Copyright (c) 2005-2007 Keir Fraser
 * Copyright (c) 2005-2007 XenSource Inc.
 * 
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 * 
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 * 
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */

#ifndef __XEN__
#include <stddef.h>
#include <stdint.h>
#include <public/xen.h>
#else
#include <xen/config.h>
#include <xen/types.h>
#include <xen/lib.h>
#include <asm/regs.h>
#undef cmpxchg
#endif
#include <asm-x86/x86_emulate.h>

/* Operand sizes: 8-bit operands or specified/overridden size. */
#define ByteOp      (1<<0) /* 8-bit operands. */
/* Destination operand type. */
#define DstBitBase  (0<<1) /* Memory operand, bit string. */
#define ImplicitOps (1<<1) /* Implicit in opcode. No generic decode. */
#define DstReg      (2<<1) /* Register operand. */
#define DstMem      (3<<1) /* Memory operand. */
#define DstMask     (3<<1)
/* Source operand type. */
#define SrcNone     (0<<3) /* No source operand. */
#define SrcImplicit (0<<3) /* Source operand is implicit in the opcode. */
#define SrcReg      (1<<3) /* Register operand. */
#define SrcMem      (2<<3) /* Memory operand. */
#define SrcMem16    (3<<3) /* Memory operand (16-bit). */
#define SrcImm      (4<<3) /* Immediate operand. */
#define SrcImmByte  (5<<3) /* 8-bit sign-extended immediate operand. */
#define SrcMask     (7<<3)
/* Generic ModRM decode. */
#define ModRM       (1<<6)
/* Destination is only written; never read. */
#define Mov         (1<<7)

static uint8_t opcode_table[256] = {
    /* 0x00 - 0x07 */
    ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
    ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
    ByteOp|DstReg|SrcImm, DstReg|SrcImm, 0, 0,
    /* 0x08 - 0x0F */
    ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
    ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
    ByteOp|DstReg|SrcImm, DstReg|SrcImm, 0, 0,
    /* 0x10 - 0x17 */
    ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
    ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
    ByteOp|DstReg|SrcImm, DstReg|SrcImm, 0, 0,
    /* 0x18 - 0x1F */
    ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
    ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
    ByteOp|DstReg|SrcImm, DstReg|SrcImm, 0, 0,
    /* 0x20 - 0x27 */
    ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
    ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
    ByteOp|DstReg|SrcImm, DstReg|SrcImm, 0, ImplicitOps,
    /* 0x28 - 0x2F */
    ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
    ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
    ByteOp|DstReg|SrcImm, DstReg|SrcImm, 0, ImplicitOps,
    /* 0x30 - 0x37 */
    ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
    ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
    ByteOp|DstReg|SrcImm, DstReg|SrcImm, 0, ImplicitOps,
    /* 0x38 - 0x3F */
    ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
    ByteOp|DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
    ByteOp|DstReg|SrcImm, DstReg|SrcImm, 0, ImplicitOps,
    /* 0x40 - 0x4F */
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    /* 0x50 - 0x5F */
    ImplicitOps|Mov, ImplicitOps|Mov, ImplicitOps|Mov, ImplicitOps|Mov,
    ImplicitOps|Mov, ImplicitOps|Mov, ImplicitOps|Mov, ImplicitOps|Mov,
    ImplicitOps|Mov, ImplicitOps|Mov, ImplicitOps|Mov, ImplicitOps|Mov,
    ImplicitOps|Mov, ImplicitOps|Mov, ImplicitOps|Mov, ImplicitOps|Mov,
    /* 0x60 - 0x67 */
    ImplicitOps, ImplicitOps, DstReg|SrcMem|ModRM, DstReg|SrcMem16|ModRM|Mov,
    0, 0, 0, 0,
    /* 0x68 - 0x6F */
    ImplicitOps|Mov, DstMem|SrcImm|ModRM|Mov,
    ImplicitOps|Mov, DstMem|SrcImmByte|ModRM|Mov,
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    /* 0x70 - 0x77 */
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    /* 0x78 - 0x7F */
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    /* 0x80 - 0x87 */
    ByteOp|DstMem|SrcImm|ModRM, DstMem|SrcImm|ModRM,
    ByteOp|DstMem|SrcImm|ModRM, DstMem|SrcImmByte|ModRM,
    ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
    ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
    /* 0x88 - 0x8F */
    ByteOp|DstMem|SrcReg|ModRM|Mov, DstMem|SrcReg|ModRM|Mov,
    ByteOp|DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
    0, DstReg|SrcNone|ModRM, 0, DstMem|SrcNone|ModRM|Mov,
    /* 0x90 - 0x97 */
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    /* 0x98 - 0x9F */
    ImplicitOps, ImplicitOps, 0, 0, 0, 0, ImplicitOps, ImplicitOps,
    /* 0xA0 - 0xA7 */
    ByteOp|ImplicitOps|Mov, ImplicitOps|Mov,
    ByteOp|ImplicitOps|Mov, ImplicitOps|Mov,
    ByteOp|ImplicitOps|Mov, ImplicitOps|Mov, 0, 0,
    /* 0xA8 - 0xAF */
    ByteOp|DstReg|SrcImm, DstReg|SrcImm,
    ByteOp|ImplicitOps|Mov, ImplicitOps|Mov,
    ByteOp|ImplicitOps|Mov, ImplicitOps|Mov, 0, 0,
    /* 0xB0 - 0xB7 */
    ByteOp|DstReg|SrcImm|Mov, ByteOp|DstReg|SrcImm|Mov,
    ByteOp|DstReg|SrcImm|Mov, ByteOp|DstReg|SrcImm|Mov,
    ByteOp|DstReg|SrcImm|Mov, ByteOp|DstReg|SrcImm|Mov,
    ByteOp|DstReg|SrcImm|Mov, ByteOp|DstReg|SrcImm|Mov,
    /* 0xB8 - 0xBF */
    DstReg|SrcImm|Mov, DstReg|SrcImm|Mov, DstReg|SrcImm|Mov, DstReg|SrcImm|Mov,
    DstReg|SrcImm|Mov, DstReg|SrcImm|Mov, DstReg|SrcImm|Mov, DstReg|SrcImm|Mov,
    /* 0xC0 - 0xC7 */
    ByteOp|DstMem|SrcImm|ModRM, DstMem|SrcImmByte|ModRM,
    ImplicitOps, ImplicitOps,
    0, 0, ByteOp|DstMem|SrcImm|ModRM|Mov, DstMem|SrcImm|ModRM|Mov,
    /* 0xC8 - 0xCF */
    0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xD0 - 0xD7 */
    ByteOp|DstMem|SrcImplicit|ModRM, DstMem|SrcImplicit|ModRM, 
    ByteOp|DstMem|SrcImplicit|ModRM, DstMem|SrcImplicit|ModRM, 
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    /* 0xD8 - 0xDF */
    0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xE0 - 0xE7 */
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    /* 0xE8 - 0xEF */
    ImplicitOps, ImplicitOps, 0, ImplicitOps,
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    /* 0xF0 - 0xF7 */
    0, 0, 0, 0,
    0, ImplicitOps, ByteOp|DstMem|SrcNone|ModRM, DstMem|SrcNone|ModRM,
    /* 0xF8 - 0xFF */
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    ImplicitOps, ImplicitOps, ByteOp|DstMem|SrcNone|ModRM, DstMem|SrcNone|ModRM
};

static uint8_t twobyte_table[256] = {
    /* 0x00 - 0x07 */
    0, 0, 0, 0, 0, ImplicitOps, 0, 0,
    /* 0x08 - 0x0F */
    ImplicitOps, ImplicitOps, 0, 0, 0, ImplicitOps|ModRM, 0, 0,
    /* 0x10 - 0x17 */
    0, 0, 0, 0, 0, 0, 0, 0,
    /* 0x18 - 0x1F */
    ImplicitOps|ModRM, ImplicitOps|ModRM, ImplicitOps|ModRM, ImplicitOps|ModRM,
    ImplicitOps|ModRM, ImplicitOps|ModRM, ImplicitOps|ModRM, ImplicitOps|ModRM,
    /* 0x20 - 0x27 */
    ImplicitOps|ModRM, ImplicitOps|ModRM, ImplicitOps|ModRM, ImplicitOps|ModRM,
    0, 0, 0, 0,
    /* 0x28 - 0x2F */
    0, 0, 0, 0, 0, 0, 0, 0,
    /* 0x30 - 0x37 */
    ImplicitOps, 0, ImplicitOps, 0, 0, 0, 0, 0,
    /* 0x38 - 0x3F */
    0, 0, 0, 0, 0, 0, 0, 0,
    /* 0x40 - 0x47 */
    DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
    DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
    DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
    DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
    /* 0x48 - 0x4F */
    DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
    DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
    DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
    DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem|ModRM|Mov,
    /* 0x50 - 0x5F */
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    /* 0x60 - 0x6F */
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    /* 0x70 - 0x7F */
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    /* 0x80 - 0x87 */
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    /* 0x88 - 0x8F */
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    /* 0x90 - 0x97 */
    ByteOp|DstMem|SrcNone|ModRM|Mov, ByteOp|DstMem|SrcNone|ModRM|Mov,
    ByteOp|DstMem|SrcNone|ModRM|Mov, ByteOp|DstMem|SrcNone|ModRM|Mov,
    ByteOp|DstMem|SrcNone|ModRM|Mov, ByteOp|DstMem|SrcNone|ModRM|Mov,
    ByteOp|DstMem|SrcNone|ModRM|Mov, ByteOp|DstMem|SrcNone|ModRM|Mov,
    /* 0x98 - 0x9F */
    ByteOp|DstMem|SrcNone|ModRM|Mov, ByteOp|DstMem|SrcNone|ModRM|Mov,
    ByteOp|DstMem|SrcNone|ModRM|Mov, ByteOp|DstMem|SrcNone|ModRM|Mov,
    ByteOp|DstMem|SrcNone|ModRM|Mov, ByteOp|DstMem|SrcNone|ModRM|Mov,
    ByteOp|DstMem|SrcNone|ModRM|Mov, ByteOp|DstMem|SrcNone|ModRM|Mov,
    /* 0xA0 - 0xA7 */
    0, 0, 0, DstBitBase|SrcReg|ModRM, 0, 0, 0, 0, 
    /* 0xA8 - 0xAF */
    0, 0, 0, DstBitBase|SrcReg|ModRM, 0, 0, 0, DstReg|SrcMem|ModRM,
    /* 0xB0 - 0xB7 */
    ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM,
    0, DstBitBase|SrcReg|ModRM,
    0, 0, ByteOp|DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem16|ModRM|Mov,
    /* 0xB8 - 0xBF */
    0, 0, DstBitBase|SrcImmByte|ModRM, DstBitBase|SrcReg|ModRM,
    DstReg|SrcMem|ModRM, DstReg|SrcMem|ModRM,
    ByteOp|DstReg|SrcMem|ModRM|Mov, DstReg|SrcMem16|ModRM|Mov,
    /* 0xC0 - 0xC7 */
    ByteOp|DstMem|SrcReg|ModRM, DstMem|SrcReg|ModRM, 0, 0,
    0, 0, 0, ImplicitOps|ModRM,
    /* 0xC8 - 0xCF */
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    ImplicitOps, ImplicitOps, ImplicitOps, ImplicitOps,
    /* 0xD0 - 0xDF */
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xE0 - 0xEF */
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    /* 0xF0 - 0xFF */
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};

/* Type, address-of, and value of an instruction's operand. */
struct operand {
    enum { OP_REG, OP_MEM, OP_IMM, OP_NONE } type;
    unsigned int  bytes;
    unsigned long val, orig_val;
    union {
        /* OP_REG: Pointer to register field. */
        unsigned long *reg;
        /* OP_MEM: Segment and offset. */
        struct {
            enum x86_segment seg;
            unsigned long    off;
        } mem;
    };
};

/* EFLAGS bit definitions. */
#define EFLG_OF (1<<11)
#define EFLG_DF (1<<10)
#define EFLG_IF (1<<9)
#define EFLG_SF (1<<7)
#define EFLG_ZF (1<<6)
#define EFLG_AF (1<<4)
#define EFLG_PF (1<<2)
#define EFLG_CF (1<<0)

/* Exception definitions. */
#define EXC_DE  0
#define EXC_BR  5
#define EXC_UD  6
#define EXC_GP 13

/*
 * Instruction emulation:
 * Most instructions are emulated directly via a fragment of inline assembly
 * code. This allows us to save/restore EFLAGS and thus very easily pick up
 * any modified flags.
 */

#if defined(__x86_64__)
#define _LO32 "k"          /* force 32-bit operand */
#define _STK  "%%rsp"      /* stack pointer */
#elif defined(__i386__)
#define _LO32 ""           /* force 32-bit operand */
#define _STK  "%%esp"      /* stack pointer */
#endif

/*
 * These EFLAGS bits are restored from saved value during emulation, and
 * any changes are written back to the saved value after emulation.
 */
#define EFLAGS_MASK (EFLG_OF|EFLG_SF|EFLG_ZF|EFLG_AF|EFLG_PF|EFLG_CF)

/* Before executing instruction: restore necessary bits in EFLAGS. */
#define _PRE_EFLAGS(_sav, _msk, _tmp)           \
/* EFLAGS = (_sav & _msk) | (EFLAGS & ~_msk); */\
"push %"_sav"; "                                \
"movl %"_msk",%"_LO32 _tmp"; "                  \
"andl %"_LO32 _tmp",("_STK"); "                 \
"pushf; "                                       \
"notl %"_LO32 _tmp"; "                          \
"andl %"_LO32 _tmp",("_STK"); "                 \
"pop  %"_tmp"; "                                \
"orl  %"_LO32 _tmp",("_STK"); "                 \
"popf; "                                        \
/* _sav &= ~msk; */                             \
"movl %"_msk",%"_LO32 _tmp"; "                  \
"notl %"_LO32 _tmp"; "                          \
"andl %"_LO32 _tmp",%"_sav"; "

/* After executing instruction: write-back necessary bits in EFLAGS. */
#define _POST_EFLAGS(_sav, _msk, _tmp)          \
/* _sav |= EFLAGS & _msk; */                    \
"pushf; "                                       \
"pop  %"_tmp"; "                                \
"andl %"_msk",%"_LO32 _tmp"; "                  \
"orl  %"_LO32 _tmp",%"_sav"; "

/* Raw emulation: instruction has two explicit operands. */
#define __emulate_2op_nobyte(_op,_src,_dst,_eflags,_wx,_wy,_lx,_ly,_qx,_qy)\
do{ unsigned long _tmp;                                                    \
    switch ( (_dst).bytes )                                                \
    {                                                                      \
    case 2:                                                                \
        asm volatile (                                                     \
            _PRE_EFLAGS("0","4","2")                                       \
            _op"w %"_wx"3,%1; "                                            \
            _POST_EFLAGS("0","4","2")                                      \
            : "=m" (_eflags), "=m" ((_dst).val), "=&r" (_tmp)              \
            : _wy ((_src).val), "i" (EFLAGS_MASK),                         \
              "m" (_eflags), "m" ((_dst).val) );                           \
        break;                                                             \
    case 4:                                                                \
        asm volatile (                                                     \
            _PRE_EFLAGS("0","4","2")                                       \
            _op"l %"_lx"3,%1; "                                            \
            _POST_EFLAGS("0","4","2")                                      \
            : "=m" (_eflags), "=m" ((_dst).val), "=&r" (_tmp)              \
            : _ly ((_src).val), "i" (EFLAGS_MASK),                         \
              "m" (_eflags), "m" ((_dst).val) );                           \
        break;                                                             \
    case 8:                                                                \
        __emulate_2op_8byte(_op, _src, _dst, _eflags, _qx, _qy);           \
        break;                                                             \
    }                                                                      \
} while (0)
#define __emulate_2op(_op,_src,_dst,_eflags,_bx,_by,_wx,_wy,_lx,_ly,_qx,_qy)\
do{ unsigned long _tmp;                                                    \
    switch ( (_dst).bytes )                                                \
    {                                                                      \
    case 1:                                                                \
        asm volatile (                                                     \
            _PRE_EFLAGS("0","4","2")                                       \
            _op"b %"_bx"3,%1; "                                            \
            _POST_EFLAGS("0","4","2")                                      \
            : "=m" (_eflags), "=m" ((_dst).val), "=&r" (_tmp)              \
            : _by ((_src).val), "i" (EFLAGS_MASK),                         \
              "m" (_eflags), "m" ((_dst).val) );                           \
        break;                                                             \
    default:                                                               \
        __emulate_2op_nobyte(_op,_src,_dst,_eflags,_wx,_wy,_lx,_ly,_qx,_qy);\
        break;                                                             \
    }                                                                      \
} while (0)
/* Source operand is byte-sized and may be restricted to just %cl. */
#define emulate_2op_SrcB(_op, _src, _dst, _eflags)                         \
    __emulate_2op(_op, _src, _dst, _eflags,                                \
                  "b", "c", "b", "c", "b", "c", "b", "c")
/* Source operand is byte, word, long or quad sized. */
#define emulate_2op_SrcV(_op, _src, _dst, _eflags)                         \
    __emulate_2op(_op, _src, _dst, _eflags,                                \
                  "b", "q", "w", "r", _LO32, "r", "", "r")
/* Source operand is word, long or quad sized. */
#define emulate_2op_SrcV_nobyte(_op, _src, _dst, _eflags)                  \
    __emulate_2op_nobyte(_op, _src, _dst, _eflags,                         \
                  "w", "r", _LO32, "r", "", "r")

/* Instruction has only one explicit operand (no source operand). */
#define emulate_1op(_op,_dst,_eflags)                                      \
do{ unsigned long _tmp;                                                    \
    switch ( (_dst).bytes )                                                \
    {                                                                      \
    case 1:                                                                \
        asm volatile (                                                     \
            _PRE_EFLAGS("0","3","2")                                       \
            _op"b %1; "                                                    \
            _POST_EFLAGS("0","3","2")                                      \
            : "=m" (_eflags), "=m" ((_dst).val), "=&r" (_tmp)              \
            : "i" (EFLAGS_MASK), "m" (_eflags), "m" ((_dst).val) );        \
        break;                                                             \
    case 2:                                                                \
        asm volatile (                                                     \
            _PRE_EFLAGS("0","3","2")                                       \
            _op"w %1; "                                                    \
            _POST_EFLAGS("0","3","2")                                      \
            : "=m" (_eflags), "=m" ((_dst).val), "=&r" (_tmp)              \
            : "i" (EFLAGS_MASK), "m" (_eflags), "m" ((_dst).val) );        \
        break;                                                             \
    case 4:                                                                \
        asm volatile (                                                     \
            _PRE_EFLAGS("0","3","2")                                       \
            _op"l %1; "                                                    \
            _POST_EFLAGS("0","3","2")                                      \
            : "=m" (_eflags), "=m" ((_dst).val), "=&r" (_tmp)              \
            : "i" (EFLAGS_MASK), "m" (_eflags), "m" ((_dst).val) );        \
        break;                                                             \
    case 8:                                                                \
        __emulate_1op_8byte(_op, _dst, _eflags);                           \
        break;                                                             \
    }                                                                      \
} while (0)

/* Emulate an instruction with quadword operands (x86/64 only). */
#if defined(__x86_64__)
#define __emulate_2op_8byte(_op, _src, _dst, _eflags, _qx, _qy)         \
do{ asm volatile (                                                      \
        _PRE_EFLAGS("0","4","2")                                        \
        _op"q %"_qx"3,%1; "                                             \
        _POST_EFLAGS("0","4","2")                                       \
        : "=m" (_eflags), "=m" ((_dst).val), "=&r" (_tmp)               \
        : _qy ((_src).val), "i" (EFLAGS_MASK),                          \
          "m" (_eflags), "m" ((_dst).val) );                            \
} while (0)
#define __emulate_1op_8byte(_op, _dst, _eflags)                         \
do{ asm volatile (                                                      \
        _PRE_EFLAGS("0","3","2")                                        \
        _op"q %1; "                                                     \
        _POST_EFLAGS("0","3","2")                                       \
        : "=m" (_eflags), "=m" ((_dst).val), "=&r" (_tmp)               \
        : "i" (EFLAGS_MASK), "m" (_eflags), "m" ((_dst).val) );         \
} while (0)
#elif defined(__i386__)
#define __emulate_2op_8byte(_op, _src, _dst, _eflags, _qx, _qy)
#define __emulate_1op_8byte(_op, _dst, _eflags)
#endif /* __i386__ */

/* Fetch next part of the instruction being emulated. */
#define insn_fetch_bytes(_size)                                         \
({ unsigned long _x, _eip = _regs.eip;                                  \
   if ( !mode_64bit() ) _eip = (uint32_t)_eip; /* ignore upper dword */ \
   _regs.eip += (_size); /* real hardware doesn't truncate */           \
   generate_exception_if((uint8_t)(_regs.eip - ctxt->regs->eip) > 15,   \
                         EXC_GP);                                       \
   rc = ops->insn_fetch(x86_seg_cs, _eip, &_x, (_size), ctxt);          \
   if ( rc ) goto done;                                                 \
   _x;                                                                  \
})
#define insn_fetch_type(_type) ((_type)insn_fetch_bytes(sizeof(_type)))

#define _truncate_ea(ea, byte_width)            \
({  unsigned long __ea = (ea);                  \
    unsigned int _width = (byte_width);         \
    ((_width == sizeof(unsigned long)) ? __ea : \
     (__ea & ((1UL << (_width << 3)) - 1)));    \
})
#define truncate_ea(ea) _truncate_ea((ea), ad_bytes)

#define mode_64bit() (def_ad_bytes == 8)

#define fail_if(p)                                      \
do {                                                    \
    rc = (p) ? X86EMUL_UNHANDLEABLE : X86EMUL_OKAY;     \
    if ( rc ) goto done;                                \
} while (0)

/* In future we will be able to generate arbitrary exceptions. */
#define generate_exception_if(p, e) fail_if(p)

/* To be done... */
#define mode_ring0() (0)
#define mode_iopl()  (0)

/* Given byte has even parity (even number of 1s)? */
static int even_parity(uint8_t v)
{
    asm ( "test %%al,%%al; setp %%al"
              : "=a" (v) : "0" (v) );
    return v;
}

/* Update address held in a register, based on addressing mode. */
#define _register_address_increment(reg, inc, byte_width)               \
do {                                                                    \
    int _inc = (inc); /* signed type ensures sign extension to long */  \
    unsigned int _width = (byte_width);                                 \
    if ( _width == sizeof(unsigned long) )                              \
        (reg) += _inc;                                                  \
    else if ( mode_64bit() )                                            \
        (reg) = ((reg) + _inc) & ((1UL << (_width << 3)) - 1);          \
    else                                                                \
        (reg) = ((reg) & ~((1UL << (_width << 3)) - 1)) |               \
                (((reg) + _inc) & ((1UL << (_width << 3)) - 1));        \
} while (0)
#define register_address_increment(reg, inc) \
    _register_address_increment((reg), (inc), ad_bytes)

#define sp_pre_dec(dec) ({                                              \
    _register_address_increment(_regs.esp, -(dec), ctxt->sp_size/8);    \
    _truncate_ea(_regs.esp, ctxt->sp_size/8);                           \
})
#define sp_post_inc(inc) ({                                             \
    unsigned long __esp = _truncate_ea(_regs.esp, ctxt->sp_size/8);     \
    _register_address_increment(_regs.esp, (inc), ctxt->sp_size/8);     \
    __esp;                                                              \
})

#define jmp_rel(rel)                                                    \
do {                                                                    \
    _regs.eip += (int)(rel);                                            \
    if ( !mode_64bit() )                                                \
        _regs.eip = ((op_bytes == 2)                                    \
                     ? (uint16_t)_regs.eip : (uint32_t)_regs.eip);      \
} while (0)

static int __handle_rep_prefix(
    struct cpu_user_regs *int_regs,
    struct cpu_user_regs *ext_regs,
    int ad_bytes)
{
    unsigned long ecx = ((ad_bytes == 2) ? (uint16_t)int_regs->ecx :
                         (ad_bytes == 4) ? (uint32_t)int_regs->ecx :
                         int_regs->ecx);

    if ( ecx-- == 0 )
    {
        ext_regs->eip = int_regs->eip;
        return 1;
    }

    if ( ad_bytes == 2 )
        *(uint16_t *)&int_regs->ecx = ecx;
    else if ( ad_bytes == 4 )
        int_regs->ecx = (uint32_t)ecx;
    else
        int_regs->ecx = ecx;
    int_regs->eip = ext_regs->eip;
    return 0;
}

#define handle_rep_prefix()                                                \
do {                                                                       \
    if ( rep_prefix && __handle_rep_prefix(&_regs, ctxt->regs, ad_bytes) ) \
        goto done;                                                         \
} while (0)

/*
 * Unsigned multiplication with double-word result.
 * IN:  Multiplicand=m[0], Multiplier=m[1]
 * OUT: Return CF/OF (overflow status); Result=m[1]:m[0]
 */
static int mul_dbl(unsigned long m[2])
{
    int rc;
    asm ( "mul %4; seto %b2"
          : "=a" (m[0]), "=d" (m[1]), "=q" (rc)
          : "0" (m[0]), "1" (m[1]), "2" (0) );
    return rc;
}

/*
 * Signed multiplication with double-word result.
 * IN:  Multiplicand=m[0], Multiplier=m[1]
 * OUT: Return CF/OF (overflow status); Result=m[1]:m[0]
 */
static int imul_dbl(unsigned long m[2])
{
    int rc;
    asm ( "imul %4; seto %b2"
          : "=a" (m[0]), "=d" (m[1]), "=q" (rc)
          : "0" (m[0]), "1" (m[1]), "2" (0) );
    return rc;
}

/*
 * Unsigned division of double-word dividend.
 * IN:  Dividend=u[1]:u[0], Divisor=v
 * OUT: Return 1: #DE
 *      Return 0: Quotient=u[0], Remainder=u[1]
 */
static int div_dbl(unsigned long u[2], unsigned long v)
{
    if ( (v == 0) || (u[1] > v) || ((u[1] == v) && (u[0] != 0)) )
        return 1;
    asm ( "div %4"
          : "=a" (u[0]), "=d" (u[1])
          : "0" (u[0]), "1" (u[1]), "r" (v) );
    return 0;
}

/*
 * Signed division of double-word dividend.
 * IN:  Dividend=u[1]:u[0], Divisor=v
 * OUT: Return 1: #DE
 *      Return 0: Quotient=u[0], Remainder=u[1]
 * NB. We don't use idiv directly as it's moderately hard to work out
 *     ahead of time whether it will #DE, which we cannot allow to happen.
 */
static int idiv_dbl(unsigned long u[2], unsigned long v)
{
    int negu = (long)u[1] < 0, negv = (long)v < 0;

    /* u = abs(u) */
    if ( negu )
    {
        u[1] = ~u[1];
        if ( (u[0] = -u[0]) == 0 )
            u[1]++;
    }

    /* abs(u) / abs(v) */
    if ( div_dbl(u, negv ? -v : v) )
        return 1;

    /* Remainder has same sign as dividend. It cannot overflow. */
    if ( negu )
        u[1] = -u[1];

    /* Quotient is overflowed if sign bit is set. */
    if ( negu ^ negv )
    {
        if ( (long)u[0] >= 0 )
            u[0] = -u[0];
        else if ( (u[0] << 1) != 0 ) /* == 0x80...0 is okay */
            return 1;
    }
    else if ( (long)u[0] < 0 )
        return 1;

    return 0;
}

static int
test_cc(
    unsigned int condition, unsigned int flags)
{
    int rc = 0;

    switch ( (condition & 15) >> 1 )
    {
    case 0: /* o */
        rc |= (flags & EFLG_OF);
        break;
    case 1: /* b/c/nae */
        rc |= (flags & EFLG_CF);
        break;
    case 2: /* z/e */
        rc |= (flags & EFLG_ZF);
        break;
    case 3: /* be/na */
        rc |= (flags & (EFLG_CF|EFLG_ZF));
        break;
    case 4: /* s */
        rc |= (flags & EFLG_SF);
        break;
    case 5: /* p/pe */
        rc |= (flags & EFLG_PF);
        break;
    case 7: /* le/ng */
        rc |= (flags & EFLG_ZF);
        /* fall through */
    case 6: /* l/nge */
        rc |= (!(flags & EFLG_SF) != !(flags & EFLG_OF));
        break;
    }

    /* Odd condition identifiers (lsb == 1) have inverted sense. */
    return (!!rc ^ (condition & 1));
}

void *
decode_register(
    uint8_t modrm_reg, struct cpu_user_regs *regs, int highbyte_regs)
{
    void *p;

    switch ( modrm_reg )
    {
    case  0: p = &regs->eax; break;
    case  1: p = &regs->ecx; break;
    case  2: p = &regs->edx; break;
    case  3: p = &regs->ebx; break;
    case  4: p = (highbyte_regs ?
                  ((unsigned char *)&regs->eax + 1) : 
                  (unsigned char *)&regs->esp); break;
    case  5: p = (highbyte_regs ?
                  ((unsigned char *)&regs->ecx + 1) : 
                  (unsigned char *)&regs->ebp); break;
    case  6: p = (highbyte_regs ?
                  ((unsigned char *)&regs->edx + 1) : 
                  (unsigned char *)&regs->esi); break;
    case  7: p = (highbyte_regs ?
                  ((unsigned char *)&regs->ebx + 1) : 
                  (unsigned char *)&regs->edi); break;
#if defined(__x86_64__)
    case  8: p = &regs->r8;  break;
    case  9: p = &regs->r9;  break;
    case 10: p = &regs->r10; break;
    case 11: p = &regs->r11; break;
    case 12: p = &regs->r12; break;
    case 13: p = &regs->r13; break;
    case 14: p = &regs->r14; break;
    case 15: p = &regs->r15; break;
#endif
    default: p = NULL; break;
    }

    return p;
}

int
x86_emulate(
    struct x86_emulate_ctxt *ctxt,
    struct x86_emulate_ops  *ops)
{
    /* Shadow copy of register state. Committed on successful emulation. */
    struct cpu_user_regs _regs = *ctxt->regs;

    uint8_t b, d, sib, sib_index, sib_base, twobyte = 0, rex_prefix = 0;
    uint8_t modrm, modrm_mod = 0, modrm_reg = 0, modrm_rm = 0;
    unsigned int op_bytes, def_op_bytes, ad_bytes, def_ad_bytes;
    unsigned int lock_prefix = 0, rep_prefix = 0;
    int override_seg = -1, rc = X86EMUL_OKAY;
    struct operand src, dst;

    /* Data operand effective address (usually computed from ModRM). */
    struct operand ea;

    /* Default is a memory operand relative to segment DS. */
    ea.type    = OP_MEM;
    ea.mem.seg = x86_seg_ds;
    ea.mem.off = 0;

    op_bytes = def_op_bytes = ad_bytes = def_ad_bytes = ctxt->addr_size/8;
    if ( op_bytes == 8 )
    {
        op_bytes = def_op_bytes = 4;
#ifndef __x86_64__
        return X86EMUL_UNHANDLEABLE;
#endif
    }

    /* Prefix bytes. */
    for ( ; ; )
    {
        switch ( b = insn_fetch_type(uint8_t) )
        {
        case 0x66: /* operand-size override */
            op_bytes = def_op_bytes ^ 6;
            break;
        case 0x67: /* address-size override */
            ad_bytes = def_ad_bytes ^ (mode_64bit() ? 12 : 6);
            break;
        case 0x2e: /* CS override */
            override_seg = x86_seg_cs;
            break;
        case 0x3e: /* DS override */
            override_seg = x86_seg_ds;
            break;
        case 0x26: /* ES override */
            override_seg = x86_seg_es;
            break;
        case 0x64: /* FS override */
            override_seg = x86_seg_fs;
            break;
        case 0x65: /* GS override */
            override_seg = x86_seg_gs;
            break;
        case 0x36: /* SS override */
            override_seg = x86_seg_ss;
            break;
        case 0xf0: /* LOCK */
            lock_prefix = 1;
            break;
        case 0xf2: /* REPNE/REPNZ */
        case 0xf3: /* REP/REPE/REPZ */
            rep_prefix = 1;
            break;
        case 0x40 ... 0x4f: /* REX */
            if ( !mode_64bit() )
                goto done_prefixes;
            rex_prefix = b;
            continue;
        default:
            goto done_prefixes;
        }

        /* Any legacy prefix after a REX prefix nullifies its effect. */
        rex_prefix = 0;
    }
 done_prefixes:

    if ( rex_prefix & 8 ) /* REX.W */
        op_bytes = 8;

    /* Opcode byte(s). */
    d = opcode_table[b];
    if ( d == 0 )
    {
        /* Two-byte opcode? */
        if ( b == 0x0f )
        {
            twobyte = 1;
            b = insn_fetch_type(uint8_t);
            d = twobyte_table[b];
        }

        /* Unrecognised? */
        if ( d == 0 )
            goto cannot_emulate;
    }

    /* Lock prefix is allowed only on RMW instructions. */
    generate_exception_if((d & Mov) && lock_prefix, EXC_GP);

    /* ModRM and SIB bytes. */
    if ( d & ModRM )
    {
        modrm = insn_fetch_type(uint8_t);
        modrm_mod = (modrm & 0xc0) >> 6;
        modrm_reg = ((rex_prefix & 4) << 1) | ((modrm & 0x38) >> 3);
        modrm_rm  = modrm & 0x07;

        if ( modrm_mod == 3 )
        {
            modrm_rm |= (rex_prefix & 1) << 3;
            ea.type = OP_REG;
            ea.reg  = decode_register(
                modrm_rm, &_regs, (d & ByteOp) && (rex_prefix == 0));
        }
        else if ( ad_bytes == 2 )
        {
            /* 16-bit ModR/M decode. */
            switch ( modrm_rm )
            {
            case 0:
                ea.mem.off = _regs.ebx + _regs.esi;
                break;
            case 1:
                ea.mem.off = _regs.ebx + _regs.edi;
                break;
            case 2:
                ea.mem.seg = x86_seg_ss;
                ea.mem.off = _regs.ebp + _regs.esi;
                break;
            case 3:
                ea.mem.seg = x86_seg_ss;
                ea.mem.off = _regs.ebp + _regs.edi;
                break;
            case 4:
                ea.mem.off = _regs.esi;
                break;
            case 5:
                ea.mem.off = _regs.edi;
                break;
            case 6:
                if ( modrm_mod == 0 )
                    break;
                ea.mem.seg = x86_seg_ss;
                ea.mem.off = _regs.ebp;
                break;
            case 7:
                ea.mem.off = _regs.ebx;
                break;
            }
            switch ( modrm_mod )
            {
            case 0:
                if ( modrm_rm == 6 )
                    ea.mem.off = insn_fetch_type(int16_t);
                break;
            case 1:
                ea.mem.off += insn_fetch_type(int8_t);
                break;
            case 2:
                ea.mem.off += insn_fetch_type(int16_t);
                break;
            }
            ea.mem.off = truncate_ea(ea.mem.off);
        }
        else
        {
            /* 32/64-bit ModR/M decode. */
            if ( modrm_rm == 4 )
            {
                sib = insn_fetch_type(uint8_t);
                sib_index = ((sib >> 3) & 7) | ((rex_prefix << 2) & 8);
                sib_base  = (sib & 7) | ((rex_prefix << 3) & 8);
                if ( sib_index != 4 )
                    ea.mem.off = *(long*)decode_register(sib_index, &_regs, 0);
                ea.mem.off <<= (sib >> 6) & 3;
                if ( (modrm_mod == 0) && ((sib_base & 7) == 5) )
                    ea.mem.off += insn_fetch_type(int32_t);
                else if ( sib_base == 4 )
                {
                    ea.mem.seg  = x86_seg_ss;
                    ea.mem.off += _regs.esp;
                    if ( !twobyte && (b == 0x8f) )
                        /* POP <rm> computes its EA post increment. */
                        ea.mem.off += ((mode_64bit() && (op_bytes == 4))
                                       ? 8 : op_bytes);
                }
                else if ( sib_base == 5 )
                {
                    ea.mem.seg  = x86_seg_ss;
                    ea.mem.off += _regs.ebp;
                }
                else
                    ea.mem.off += *(long*)decode_register(sib_base, &_regs, 0);
            }
            else
            {
                modrm_rm |= (rex_prefix & 1) << 3;
                ea.mem.off = *(long *)decode_register(modrm_rm, &_regs, 0);
                if ( (modrm_rm == 5) && (modrm_mod != 0) )
                    ea.mem.seg = x86_seg_ss;
            }
            switch ( modrm_mod )
            {
            case 0:
                if ( (modrm_rm & 7) != 5 )
                    break;
                ea.mem.off = insn_fetch_type(int32_t);
                if ( !mode_64bit() )
                    break;
                /* Relative to RIP of next instruction. Argh! */
                ea.mem.off += _regs.eip;
                if ( (d & SrcMask) == SrcImm )
                    ea.mem.off += (d & ByteOp) ? 1 :
                        ((op_bytes == 8) ? 4 : op_bytes);
                else if ( (d & SrcMask) == SrcImmByte )
                    ea.mem.off += 1;
                else if ( ((b == 0xf6) || (b == 0xf7)) &&
                          ((modrm_reg & 7) <= 1) )
                    /* Special case in Grp3: test has immediate operand. */
                    ea.mem.off += (d & ByteOp) ? 1
                        : ((op_bytes == 8) ? 4 : op_bytes);
                break;
            case 1:
                ea.mem.off += insn_fetch_type(int8_t);
                break;
            case 2:
                ea.mem.off += insn_fetch_type(int32_t);
                break;
            }
            ea.mem.off = truncate_ea(ea.mem.off);
        }
    }

    if ( override_seg != -1 )
        ea.mem.seg = override_seg;

    /* Special instructions do their own operand decoding. */
    if ( (d & DstMask) == ImplicitOps )
        goto special_insn;

    /* Decode and fetch the source operand: register, memory or immediate. */
    switch ( d & SrcMask )
    {
    case SrcNone:
        break;
    case SrcReg:
        src.type = OP_REG;
        if ( d & ByteOp )
        {
            src.reg = decode_register(modrm_reg, &_regs, (rex_prefix == 0));
            src.val = *(uint8_t *)src.reg;
            src.bytes = 1;
        }
        else
        {
            src.reg = decode_register(modrm_reg, &_regs, 0);
            switch ( (src.bytes = op_bytes) )
            {
            case 2: src.val = *(uint16_t *)src.reg; break;
            case 4: src.val = *(uint32_t *)src.reg; break;
            case 8: src.val = *(uint64_t *)src.reg; break;
            }
        }
        break;
    case SrcMem16:
        ea.bytes = 2;
        goto srcmem_common;
    case SrcMem:
        ea.bytes = (d & ByteOp) ? 1 : op_bytes;
    srcmem_common:
        src = ea;
        if ( src.type == OP_REG )
        {
            switch ( src.bytes )
            {
            case 1: src.val = *(uint8_t  *)src.reg; break;
            case 2: src.val = *(uint16_t *)src.reg; break;
            case 4: src.val = *(uint32_t *)src.reg; break;
            case 8: src.val = *(uint64_t *)src.reg; break;
            }
        }
        else if ( (rc = ops->read(src.mem.seg, src.mem.off,
                                  &src.val, src.bytes, ctxt)) )
            goto done;
        break;
    case SrcImm:
        src.type  = OP_IMM;
        src.bytes = (d & ByteOp) ? 1 : op_bytes;
        if ( src.bytes == 8 ) src.bytes = 4;
        /* NB. Immediates are sign-extended as necessary. */
        switch ( src.bytes )
        {
        case 1: src.val = insn_fetch_type(int8_t);  break;
        case 2: src.val = insn_fetch_type(int16_t); break;
        case 4: src.val = insn_fetch_type(int32_t); break;
        }
        break;
    case SrcImmByte:
        src.type  = OP_IMM;
        src.bytes = 1;
        src.val   = insn_fetch_type(int8_t);
        break;
    }

    /* Decode and fetch the destination operand: register or memory. */
    switch ( d & DstMask )
    {
    case DstReg:
        dst.type = OP_REG;
        if ( d & ByteOp )
        {
            dst.reg = decode_register(modrm_reg, &_regs, (rex_prefix == 0));
            dst.val = *(uint8_t *)dst.reg;
            dst.bytes = 1;
        }
        else
        {
            dst.reg = decode_register(modrm_reg, &_regs, 0);
            switch ( (dst.bytes = op_bytes) )
            {
            case 2: dst.val = *(uint16_t *)dst.reg; break;
            case 4: dst.val = *(uint32_t *)dst.reg; break;
            case 8: dst.val = *(uint64_t *)dst.reg; break;
            }
        }
        break;
    case DstBitBase:
        if ( ((d & SrcMask) == SrcImmByte) || (ea.type == OP_REG) )
        {
            src.val &= (op_bytes << 3) - 1;
        }
        else
        {
            /*
             * EA       += BitOffset DIV op_bytes*8
             * BitOffset = BitOffset MOD op_bytes*8
             * DIV truncates towards negative infinity.
             * MOD always produces a positive result.
             */
            if ( op_bytes == 2 )
                src.val = (int16_t)src.val;
            else if ( op_bytes == 4 )
                src.val = (int32_t)src.val;
            if ( (long)src.val < 0 )
            {
                unsigned long byte_offset;
                byte_offset = op_bytes + (((-src.val-1) >> 3) & ~(op_bytes-1));
                ea.mem.off -= byte_offset;
                src.val = (byte_offset << 3) + src.val;
            }
            else
            {
                ea.mem.off += (src.val >> 3) & ~(op_bytes - 1);
                src.val &= (op_bytes << 3) - 1;
            }
        }
        /* Becomes a normal DstMem operation from here on. */
        d = (d & ~DstMask) | DstMem;
    case DstMem:
        ea.bytes = (d & ByteOp) ? 1 : op_bytes;
        dst = ea;
        if ( dst.type == OP_REG )
        {
            switch ( dst.bytes )
            {
            case 1: dst.val = *(uint8_t  *)dst.reg; break;
            case 2: dst.val = *(uint16_t *)dst.reg; break;
            case 4: dst.val = *(uint32_t *)dst.reg; break;
            case 8: dst.val = *(uint64_t *)dst.reg; break;
            }
        }
        else if ( !(d & Mov) ) /* optimisation - avoid slow emulated read */
        {
            if ( (rc = ops->read(dst.mem.seg, dst.mem.off,
                                 &dst.val, dst.bytes, ctxt)) )
                goto done;
            dst.orig_val = dst.val;
        }
        break;
    }

    /* LOCK prefix allowed only on instructions with memory destination. */
    generate_exception_if(lock_prefix && (dst.type != OP_MEM), EXC_GP);

    if ( twobyte )
        goto twobyte_insn;

    switch ( b )
    {
    case 0x04 ... 0x05: /* add imm,%%eax */
        dst.reg = (unsigned long *)&_regs.eax;
        dst.val = _regs.eax;
    case 0x00 ... 0x03: add: /* add */
        emulate_2op_SrcV("add", src, dst, _regs.eflags);
        break;

    case 0x0c ... 0x0d: /* or imm,%%eax */
        dst.reg = (unsigned long *)&_regs.eax;
        dst.val = _regs.eax;
    case 0x08 ... 0x0b: or:  /* or */
        emulate_2op_SrcV("or", src, dst, _regs.eflags);
        break;

    case 0x14 ... 0x15: /* adc imm,%%eax */
        dst.reg = (unsigned long *)&_regs.eax;
        dst.val = _regs.eax;
    case 0x10 ... 0x13: adc: /* adc */
        emulate_2op_SrcV("adc", src, dst, _regs.eflags);
        break;

    case 0x1c ... 0x1d: /* sbb imm,%%eax */
        dst.reg = (unsigned long *)&_regs.eax;
        dst.val = _regs.eax;
    case 0x18 ... 0x1b: sbb: /* sbb */
        emulate_2op_SrcV("sbb", src, dst, _regs.eflags);
        break;

    case 0x24 ... 0x25: /* and imm,%%eax */
        dst.reg = (unsigned long *)&_regs.eax;
        dst.val = _regs.eax;
    case 0x20 ... 0x23: and: /* and */
        emulate_2op_SrcV("and", src, dst, _regs.eflags);
        break;

    case 0x2c ... 0x2d: /* sub imm,%%eax */
        dst.reg = (unsigned long *)&_regs.eax;
        dst.val = _regs.eax;
    case 0x28 ... 0x2b: sub: /* sub */
        emulate_2op_SrcV("sub", src, dst, _regs.eflags);
        break;

    case 0x34 ... 0x35: /* xor imm,%%eax */
        dst.reg = (unsigned long *)&_regs.eax;
        dst.val = _regs.eax;
    case 0x30 ... 0x33: xor: /* xor */
        emulate_2op_SrcV("xor", src, dst, _regs.eflags);
        break;

    case 0x3c ... 0x3d: /* cmp imm,%%eax */
        dst.reg = (unsigned long *)&_regs.eax;
        dst.val = _regs.eax;
    case 0x38 ... 0x3b: cmp: /* cmp */
        emulate_2op_SrcV("cmp", src, dst, _regs.eflags);
        break;

    case 0x62: /* bound */ {
        unsigned long src_val2;
        int lb, ub, idx;
        generate_exception_if(mode_64bit() || (src.type != OP_MEM), EXC_UD);
        if ( (rc = ops->read(src.mem.seg, src.mem.off + op_bytes,
                             &src_val2, op_bytes, ctxt)) )
            goto done;
        ub  = (op_bytes == 2) ? (int16_t)src_val2 : (int32_t)src_val2;
        lb  = (op_bytes == 2) ? (int16_t)src.val  : (int32_t)src.val;
        idx = (op_bytes == 2) ? (int16_t)dst.val  : (int32_t)dst.val;
        generate_exception_if((idx < lb) || (idx > ub), EXC_BR);
        dst.type = OP_NONE;
        break;
    }

    case 0x63: /* movsxd (x86/64) / arpl (x86/32) */
        if ( mode_64bit() )
        {
            /* movsxd */
            if ( src.type == OP_REG )
                src.val = *(int32_t *)src.reg;
            else if ( (rc = ops->read(src.mem.seg, src.mem.off,
                                      &src.val, 4, ctxt)) )
                goto done;
            dst.val = (int32_t)src.val;
        }
        else
        {
            /* arpl */
            uint16_t src_val = dst.val;
            dst = src;
            _regs.eflags &= ~EFLG_ZF;
            _regs.eflags |= ((src_val & 3) > (dst.val & 3)) ? EFLG_ZF : 0;
            if ( _regs.eflags & EFLG_ZF )
                dst.val  = (dst.val & ~3) | (src_val & 3);
            else
                dst.type = OP_NONE;
        }
        break;

    case 0x69: /* imul imm16/32 */
    case 0x6b: /* imul imm8 */ {
        unsigned long reg = *(long *)decode_register(modrm_reg, &_regs, 0);
        _regs.eflags &= ~(EFLG_OF|EFLG_CF);
        switch ( dst.bytes )
        {
        case 2:
            dst.val = ((uint32_t)(int16_t)src.val *
                       (uint32_t)(int16_t)reg);
            if ( (int16_t)dst.val != (uint32_t)dst.val )
                _regs.eflags |= EFLG_OF|EFLG_CF;
            break;
#ifdef __x86_64__
        case 4:
            dst.val = ((uint64_t)(int32_t)src.val *
                       (uint64_t)(int32_t)reg);
            if ( (int32_t)dst.val != dst.val )
                _regs.eflags |= EFLG_OF|EFLG_CF;
            break;
#endif
        default: {
            unsigned long m[2] = { src.val, reg };
            if ( imul_dbl(m) )
                _regs.eflags |= EFLG_OF|EFLG_CF;
            dst.val = m[0];
            break;
        }
        }
        dst.type = OP_REG;
        dst.reg  = decode_register(modrm_reg, &_regs, 0);
        break;
    }

    case 0x82: /* Grp1 (x86/32 only) */
        generate_exception_if(mode_64bit(), EXC_UD);
    case 0x80: case 0x81: case 0x83: /* Grp1 */
        switch ( modrm_reg & 7 )
        {
        case 0: goto add;
        case 1: goto or;
        case 2: goto adc;
        case 3: goto sbb;
        case 4: goto and;
        case 5: goto sub;
        case 6: goto xor;
        case 7: goto cmp;
        }
        break;

    case 0xa8 ... 0xa9: /* test imm,%%eax */
        dst.reg = (unsigned long *)&_regs.eax;
        dst.val = _regs.eax;
    case 0x84 ... 0x85: test: /* test */
        emulate_2op_SrcV("test", src, dst, _regs.eflags);
        break;

    case 0x86 ... 0x87: xchg: /* xchg */
        /* Write back the register source. */
        switch ( dst.bytes )
        {
        case 1: *(uint8_t  *)src.reg = (uint8_t)dst.val; break;
        case 2: *(uint16_t *)src.reg = (uint16_t)dst.val; break;
        case 4: *src.reg = (uint32_t)dst.val; break; /* 64b reg: zero-extend */
        case 8: *src.reg = dst.val; break;
        }
        /* Write back the memory destination with implicit LOCK prefix. */
        dst.val = src.val;
        lock_prefix = 1;
        break;

    case 0xc6 ... 0xc7: /* mov (sole member of Grp11) */
        generate_exception_if((modrm_reg & 7) != 0, EXC_UD);
    case 0x88 ... 0x8b: /* mov */
        dst.val = src.val;
        break;

    case 0x8d: /* lea */
        dst.val = ea.mem.off;
        break;

    case 0x8f: /* pop (sole member of Grp1a) */
        generate_exception_if((modrm_reg & 7) != 0, EXC_UD);
        /* 64-bit mode: POP defaults to a 64-bit operand. */
        if ( mode_64bit() && (dst.bytes == 4) )
            dst.bytes = 8;
        if ( (rc = ops->read(x86_seg_ss, sp_post_inc(dst.bytes),
                             &dst.val, dst.bytes, ctxt)) != 0 )
            goto done;
        break;

    case 0xb0 ... 0xb7: /* mov imm8,r8 */
        dst.reg = decode_register(
            (b & 7) | ((rex_prefix & 1) << 3), &_regs, (rex_prefix == 0));
        dst.val = src.val;
        break;

    case 0xb8 ... 0xbf: /* mov imm{16,32,64},r{16,32,64} */
        if ( dst.bytes == 8 ) /* Fetch more bytes to obtain imm64 */
            src.val = ((uint32_t)src.val |
                       ((uint64_t)insn_fetch_type(uint32_t) << 32));
        dst.reg = decode_register(
            (b & 7) | ((rex_prefix & 1) << 3), &_regs, 0);
        dst.val = src.val;
        break;

    case 0xc0 ... 0xc1: grp2: /* Grp2 */
        switch ( modrm_reg & 7 )
        {
        case 0: /* rol */
            emulate_2op_SrcB("rol", src, dst, _regs.eflags);
            break;
        case 1: /* ror */
            emulate_2op_SrcB("ror", src, dst, _regs.eflags);
            break;
        case 2: /* rcl */
            emulate_2op_SrcB("rcl", src, dst, _regs.eflags);
            break;
        case 3: /* rcr */
            emulate_2op_SrcB("rcr", src, dst, _regs.eflags);
            break;
        case 4: /* sal/shl */
        case 6: /* sal/shl */
            emulate_2op_SrcB("sal", src, dst, _regs.eflags);
            break;
        case 5: /* shr */
            emulate_2op_SrcB("shr", src, dst, _regs.eflags);
            break;
        case 7: /* sar */
            emulate_2op_SrcB("sar", src, dst, _regs.eflags);
            break;
        }
        break;

    case 0xd0 ... 0xd1: /* Grp2 */
        src.val = 1;
        goto grp2;

    case 0xd2 ... 0xd3: /* Grp2 */
        src.val = _regs.ecx;
        goto grp2;

    case 0xf6 ... 0xf7: /* Grp3 */
        switch ( modrm_reg & 7 )
        {
        case 0 ... 1: /* test */
            /* Special case in Grp3: test has an immediate source operand. */
            src.type = OP_IMM;
            src.bytes = (d & ByteOp) ? 1 : op_bytes;
            if ( src.bytes == 8 ) src.bytes = 4;
            switch ( src.bytes )
            {
            case 1: src.val = insn_fetch_type(int8_t);  break;
            case 2: src.val = insn_fetch_type(int16_t); break;
            case 4: src.val = insn_fetch_type(int32_t); break;
            }
            goto test;
        case 2: /* not */
            dst.val = ~dst.val;
            break;
        case 3: /* neg */
            emulate_1op("neg", dst, _regs.eflags);
            break;
        case 4: /* mul */
            src = dst;
            dst.type = OP_REG;
            dst.reg  = (unsigned long *)&_regs.eax;
            dst.val  = *dst.reg;
            _regs.eflags &= ~(EFLG_OF|EFLG_CF);
            switch ( src.bytes )
            {
            case 1:
                dst.val *= src.val;
                if ( (uint8_t)dst.val != (uint16_t)dst.val )
                    _regs.eflags |= EFLG_OF|EFLG_CF;
                break;
            case 2:
                dst.val *= src.val;
                if ( (uint16_t)dst.val != (uint32_t)dst.val )
                    _regs.eflags |= EFLG_OF|EFLG_CF;
                *(uint16_t *)&_regs.edx = dst.val >> 16;
                break;
#ifdef __x86_64__
            case 4:
                dst.val *= src.val;
                if ( (uint32_t)dst.val != dst.val )
                    _regs.eflags |= EFLG_OF|EFLG_CF;
                _regs.edx = (uint32_t)(dst.val >> 32);
                break;
#endif
            default: {
                unsigned long m[2] = { src.val, dst.val };
                if ( mul_dbl(m) )
                    _regs.eflags |= EFLG_OF|EFLG_CF;
                _regs.edx = m[1];
                dst.val  = m[0];
                break;
            }
            }
            break;
        case 5: /* imul */
            src = dst;
            dst.type = OP_REG;
            dst.reg  = (unsigned long *)&_regs.eax;
            dst.val  = *dst.reg;
            _regs.eflags &= ~(EFLG_OF|EFLG_CF);
            switch ( src.bytes )
            {
            case 1:
                dst.val = ((uint16_t)(int8_t)src.val *
                           (uint16_t)(int8_t)dst.val);
                if ( (int8_t)dst.val != (uint16_t)dst.val )
                    _regs.eflags |= EFLG_OF|EFLG_CF;
                break;
            case 2:
                dst.val = ((uint32_t)(int16_t)src.val *
                           (uint32_t)(int16_t)dst.val);
                if ( (int16_t)dst.val != (uint32_t)dst.val )
                    _regs.eflags |= EFLG_OF|EFLG_CF;
                *(uint16_t *)&_regs.edx = dst.val >> 16;
                break;
#ifdef __x86_64__
            case 4:
                dst.val = ((uint64_t)(int32_t)src.val *
                           (uint64_t)(int32_t)dst.val);
                if ( (int32_t)dst.val != dst.val )
                    _regs.eflags |= EFLG_OF|EFLG_CF;
                _regs.edx = (uint32_t)(dst.val >> 32);
                break;
#endif
            default: {
                unsigned long m[2] = { src.val, dst.val };
                if ( imul_dbl(m) )
                    _regs.eflags |= EFLG_OF|EFLG_CF;
                _regs.edx = m[1];
                dst.val  = m[0];
                break;
            }
            }
            break;
        case 6: /* div */ {
            unsigned long u[2], v;
            src = dst;
            dst.type = OP_REG;
            dst.reg  = (unsigned long *)&_regs.eax;
            switch ( src.bytes )
            {
            case 1:
                u[0] = (uint16_t)_regs.eax;
                u[1] = 0;
                v    = (uint8_t)src.val;
                generate_exception_if(
                    div_dbl(u, v) || ((uint8_t)u[0] != (uint16_t)u[0]),
                    EXC_DE);
                dst.val = (uint8_t)u[0];
                ((uint8_t *)&_regs.eax)[1] = u[1];
                break;
            case 2:
                u[0] = ((uint32_t)_regs.edx << 16) | (uint16_t)_regs.eax;
                u[1] = 0;
                v    = (uint16_t)src.val;
                generate_exception_if(
                    div_dbl(u, v) || ((uint16_t)u[0] != (uint32_t)u[0]),
                    EXC_DE);
                dst.val = (uint16_t)u[0];
                *(uint16_t *)&_regs.edx = u[1];
                break;
#ifdef __x86_64__
            case 4:
                u[0] = (_regs.edx << 32) | (uint32_t)_regs.eax;
                u[1] = 0;
                v    = (uint32_t)src.val;
                generate_exception_if(
                    div_dbl(u, v) || ((uint32_t)u[0] != u[0]),
                    EXC_DE);
                dst.val   = (uint32_t)u[0];
                _regs.edx = (uint32_t)u[1];
                break;
#endif
            default:
                u[0] = _regs.eax;
                u[1] = _regs.edx;
                v    = src.val;
                generate_exception_if(div_dbl(u, v), EXC_DE);
                dst.val   = u[0];
                _regs.edx = u[1];
                break;
            }
            break;
        }
        case 7: /* idiv */ {
            unsigned long u[2], v;
            src = dst;
            dst.type = OP_REG;
            dst.reg  = (unsigned long *)&_regs.eax;
            switch ( src.bytes )
            {
            case 1:
                u[0] = (int16_t)_regs.eax;
                u[1] = ((long)u[0] < 0) ? ~0UL : 0UL;
                v    = (int8_t)src.val;
                generate_exception_if(
                    idiv_dbl(u, v) || ((int8_t)u[0] != (int16_t)u[0]),
                    EXC_DE);
                dst.val = (int8_t)u[0];
                ((int8_t *)&_regs.eax)[1] = u[1];
                break;
            case 2:
                u[0] = (int32_t)((_regs.edx << 16) | (uint16_t)_regs.eax);
                u[1] = ((long)u[0] < 0) ? ~0UL : 0UL;
                v    = (int16_t)src.val;
                generate_exception_if(
                    idiv_dbl(u, v) || ((int16_t)u[0] != (int32_t)u[0]),
                    EXC_DE);
                dst.val = (int16_t)u[0];
                *(int16_t *)&_regs.edx = u[1];
                break;
#ifdef __x86_64__
            case 4:
                u[0] = (_regs.edx << 32) | (uint32_t)_regs.eax;
                u[1] = ((long)u[0] < 0) ? ~0UL : 0UL;
                v    = (int32_t)src.val;
                generate_exception_if(
                    idiv_dbl(u, v) || ((int32_t)u[0] != u[0]),
                    EXC_DE);
                dst.val   = (int32_t)u[0];
                _regs.edx = (uint32_t)u[1];
                break;
#endif
            default:
                u[0] = _regs.eax;
                u[1] = _regs.edx;
                v    = src.val;
                generate_exception_if(idiv_dbl(u, v), EXC_DE);
                dst.val   = u[0];
                _regs.edx = u[1];
                break;
            }
            break;
        }
        default:
            goto cannot_emulate;
        }
        break;

    case 0xfe: /* Grp4 */
        generate_exception_if((modrm_reg & 7) >= 2, EXC_UD);
    case 0xff: /* Grp5 */
        switch ( modrm_reg & 7 )
        {
        case 0: /* inc */
            emulate_1op("inc", dst, _regs.eflags);
            break;
        case 1: /* dec */
            emulate_1op("dec", dst, _regs.eflags);
            break;
        case 2: /* call (near) */
        case 4: /* jmp (near) */
            if ( ((op_bytes = dst.bytes) != 8) && mode_64bit() )
            {
                dst.bytes = op_bytes = 8;
                if ( dst.type == OP_REG )
                    dst.val = *dst.reg;
                else if ( (rc = ops->read(dst.mem.seg, dst.mem.off,
                                          &dst.val, 8, ctxt)) != 0 )
                    goto done;
            }
            src.val = _regs.eip;
            _regs.eip = dst.val;
            if ( (modrm_reg & 7) == 2 )
                goto push; /* call */
            break;
        case 6: /* push */
            /* 64-bit mode: PUSH defaults to a 64-bit operand. */
            if ( mode_64bit() && (dst.bytes == 4) )
            {
                dst.bytes = 8;
                if ( dst.type == OP_REG )
                    dst.val = *dst.reg;
                else if ( (rc = ops->read(dst.mem.seg, dst.mem.off,
                                          &dst.val, 8, ctxt)) != 0 )
                    goto done;
            }
            if ( (rc = ops->write(x86_seg_ss, sp_pre_dec(dst.bytes),
                                  dst.val, dst.bytes, ctxt)) != 0 )
                goto done;
            dst.type = OP_NONE;
            break;
        case 7:
            generate_exception_if(1, EXC_UD);
        default:
            goto cannot_emulate;
        }
        break;
    }

 writeback:
    switch ( dst.type )
    {
    case OP_REG:
        /* The 4-byte case *is* correct: in 64-bit mode we zero-extend. */
        switch ( dst.bytes )
        {
        case 1: *(uint8_t  *)dst.reg = (uint8_t)dst.val; break;
        case 2: *(uint16_t *)dst.reg = (uint16_t)dst.val; break;
        case 4: *dst.reg = (uint32_t)dst.val; break; /* 64b: zero-ext */
        case 8: *dst.reg = dst.val; break;
        }
        break;
    case OP_MEM:
        if ( !(d & Mov) && (dst.orig_val == dst.val) )
            /* nothing to do */;
        else if ( lock_prefix )
            rc = ops->cmpxchg(
                dst.mem.seg, dst.mem.off, dst.orig_val,
                dst.val, dst.bytes, ctxt);
        else
            rc = ops->write(
                dst.mem.seg, dst.mem.off, dst.val, dst.bytes, ctxt);
        if ( rc != 0 )
            goto done;
    default:
        break;
    }

    /* Commit shadow register state. */
    _regs.eflags &= ~EF_RF;
    *ctxt->regs = _regs;

 done:
    return rc;

 special_insn:
    dst.type = OP_NONE;

    /*
     * The only implicit-operands instructions allowed a LOCK prefix are
     * CMPXCHG{8,16}B, MOV CRn, MOV DRn.
     */
    generate_exception_if(lock_prefix &&
                          ((b < 0x20) || (b > 0x23)) && /* MOV CRn/DRn */
                          (b != 0xc7),                  /* CMPXCHG{8,16}B */
                          EXC_GP);

    if ( twobyte )
        goto twobyte_special_insn;

    switch ( b )
    {
    case 0x27: /* daa */ {
        uint8_t al = _regs.eax;
        unsigned long eflags = _regs.eflags;
        generate_exception_if(mode_64bit(), EXC_UD);
        _regs.eflags &= ~(EFLG_CF|EFLG_AF);
        if ( ((al & 0x0f) > 9) || (eflags & EFLG_AF) )
        {
            *(uint8_t *)&_regs.eax += 6;
            _regs.eflags |= EFLG_AF;
        }
        if ( (al > 0x99) || (eflags & EFLG_CF) )
        {
            *(uint8_t *)&_regs.eax += 0x60;
            _regs.eflags |= EFLG_CF;
        }
        _regs.eflags &= ~(EFLG_SF|EFLG_ZF|EFLG_PF);
        _regs.eflags |= ((uint8_t)_regs.eax == 0) ? EFLG_ZF : 0;
        _regs.eflags |= (( int8_t)_regs.eax <  0) ? EFLG_SF : 0;
        _regs.eflags |= even_parity(_regs.eax) ? EFLG_PF : 0;
        break;
    }

    case 0x2f: /* das */ {
        uint8_t al = _regs.eax;
        unsigned long eflags = _regs.eflags;
        generate_exception_if(mode_64bit(), EXC_UD);
        _regs.eflags &= ~(EFLG_CF|EFLG_AF);
        if ( ((al & 0x0f) > 9) || (eflags & EFLG_AF) )
        {
            _regs.eflags |= EFLG_AF;
            if ( (al < 6) || (eflags & EFLG_CF) )
                _regs.eflags |= EFLG_CF;
            *(uint8_t *)&_regs.eax -= 6;
        }
        if ( (al > 0x99) || (eflags & EFLG_CF) )
        {
            *(uint8_t *)&_regs.eax -= 0x60;
            _regs.eflags |= EFLG_CF;
        }
        _regs.eflags &= ~(EFLG_SF|EFLG_ZF|EFLG_PF);
        _regs.eflags |= ((uint8_t)_regs.eax == 0) ? EFLG_ZF : 0;
        _regs.eflags |= (( int8_t)_regs.eax <  0) ? EFLG_SF : 0;
        _regs.eflags |= even_parity(_regs.eax) ? EFLG_PF : 0;
        break;
    }

    case 0x37: /* aaa */
    case 0x3f: /* aas */
        generate_exception_if(mode_64bit(), EXC_UD);
        _regs.eflags &= ~EFLG_CF;
        if ( ((uint8_t)_regs.eax > 9) || (_regs.eflags & EFLG_AF) )
        {
            ((uint8_t *)&_regs.eax)[0] += (b == 0x37) ? 6 : -6;
            ((uint8_t *)&_regs.eax)[1] += (b == 0x37) ? 1 : -1;
            _regs.eflags |= EFLG_CF | EFLG_AF;
        }
        ((uint8_t *)&_regs.eax)[0] &= 0x0f;
        break;

    case 0x40 ... 0x4f: /* inc/dec reg */
        dst.type  = OP_REG;
        dst.reg   = decode_register(b & 7, &_regs, 0);
        dst.bytes = op_bytes;
        dst.val   = *dst.reg;
        if ( b & 8 )
            emulate_1op("dec", dst, _regs.eflags);
        else
            emulate_1op("inc", dst, _regs.eflags);
        break;

    case 0x50 ... 0x57: /* push reg */
        src.val = *(unsigned long *)decode_register(
            (b & 7) | ((rex_prefix & 1) << 3), &_regs, 0);
        goto push;

    case 0x58 ... 0x5f: /* pop reg */
        dst.type  = OP_REG;
        dst.reg   = decode_register(
            (b & 7) | ((rex_prefix & 1) << 3), &_regs, 0);
        dst.bytes = op_bytes;
        if ( mode_64bit() && (dst.bytes == 4) )
            dst.bytes = 8;
        if ( (rc = ops->read(x86_seg_ss, sp_post_inc(dst.bytes),
                             &dst.val, dst.bytes, ctxt)) != 0 )
            goto done;
        break;

    case 0x60: /* pusha */ {
        int i;
        unsigned long regs[] = {
            _regs.eax, _regs.ecx, _regs.edx, _regs.ebx,
            _regs.esp, _regs.ebp, _regs.esi, _regs.edi };
        generate_exception_if(mode_64bit(), EXC_UD);
        for ( i = 0; i < 8; i++ )
            if ( (rc = ops->write(x86_seg_ss, sp_pre_dec(op_bytes),
                                  regs[i], op_bytes, ctxt)) != 0 )
            goto done;
        break;
    }

    case 0x61: /* popa */ {
        int i;
        unsigned long dummy_esp, *regs[] = {
            (unsigned long *)&_regs.edi, (unsigned long *)&_regs.esi,
            (unsigned long *)&_regs.ebp, (unsigned long *)&dummy_esp,
            (unsigned long *)&_regs.ebx, (unsigned long *)&_regs.edx,
            (unsigned long *)&_regs.ecx, (unsigned long *)&_regs.eax };
        generate_exception_if(mode_64bit(), EXC_UD);
        for ( i = 0; i < 8; i++ )
            if ( (rc = ops->read(x86_seg_ss, sp_post_inc(op_bytes),
                                 regs[i], op_bytes, ctxt)) != 0 )
            goto done;
        break;
    }

    case 0x68: /* push imm{16,32,64} */
        src.val = ((op_bytes == 2)
                   ? (int32_t)insn_fetch_type(int16_t)
                   : insn_fetch_type(int32_t));
        goto push;

    case 0x6a: /* push imm8 */
        src.val = insn_fetch_type(int8_t);
    push:
        d |= Mov; /* force writeback */
        dst.type  = OP_MEM;
        dst.bytes = op_bytes;
        if ( mode_64bit() && (dst.bytes == 4) )
            dst.bytes = 8;
        dst.val = src.val;
        dst.mem.seg = x86_seg_ss;
        dst.mem.off = sp_pre_dec(dst.bytes);
        break;

    case 0x6c ... 0x6d: /* ins %dx,%es:%edi */
        handle_rep_prefix();
        generate_exception_if(!mode_iopl(), EXC_GP);
        dst.type  = OP_MEM;
        dst.bytes = !(b & 1) ? 1 : (op_bytes == 8) ? 4 : op_bytes;
        dst.mem.seg = x86_seg_es;
        dst.mem.off = truncate_ea(_regs.edi);
        fail_if(ops->read_io == NULL);
        if ( (rc = ops->read_io((uint16_t)_regs.edx, dst.bytes,
                                &dst.val, ctxt)) != 0 )
            goto done;
        register_address_increment(
            _regs.edi, (_regs.eflags & EFLG_DF) ? -dst.bytes : dst.bytes);
        break;

    case 0x6e ... 0x6f: /* outs %esi,%dx */
        handle_rep_prefix();
        generate_exception_if(!mode_iopl(), EXC_GP);
        dst.bytes = !(b & 1) ? 1 : (op_bytes == 8) ? 4 : op_bytes;
        if ( (rc = ops->read(ea.mem.seg, truncate_ea(_regs.esi),
                             &dst.val, dst.bytes, ctxt)) != 0 )
            goto done;
        fail_if(ops->write_io == NULL);
        if ( (rc = ops->write_io((uint16_t)_regs.edx, dst.bytes,
                                 dst.val, ctxt)) != 0 )
            goto done;
        register_address_increment(
            _regs.esi, (_regs.eflags & EFLG_DF) ? -dst.bytes : dst.bytes);
        break;

    case 0x70 ... 0x7f: /* jcc (short) */ {
        int rel = insn_fetch_type(int8_t);
        if ( test_cc(b, _regs.eflags) )
            jmp_rel(rel);
        break;
    }

    case 0x90: /* nop / xchg %%r8,%%rax */
        if ( !(rex_prefix & 1) )
            break; /* nop */

    case 0x91 ... 0x97: /* xchg reg,%%rax */
        src.type = dst.type = OP_REG;
        src.bytes = dst.bytes = op_bytes;
        src.reg  = (unsigned long *)&_regs.eax;
        src.val  = *src.reg;
        dst.reg  = decode_register(
            (b & 7) | ((rex_prefix & 1) << 3), &_regs, 0);
        dst.val  = *dst.reg;
        goto xchg;

    case 0x98: /* cbw/cwde/cdqe */
        switch ( op_bytes )
        {
        case 2: *(int16_t *)&_regs.eax = (int8_t)_regs.eax; break; /* cbw */
        case 4: _regs.eax = (uint32_t)(int16_t)_regs.eax; break; /* cwde */
        case 8: _regs.eax = (int32_t)_regs.eax; break; /* cdqe */
        }
        break;

    case 0x99: /* cwd/cdq/cqo */
        switch ( op_bytes )
        {
        case 2:
            *(int16_t *)&_regs.edx = ((int16_t)_regs.eax < 0) ? -1 : 0;
            break;
        case 4:
            _regs.edx = (uint32_t)(((int32_t)_regs.eax < 0) ? -1 : 0);
            break;
        case 8:
            _regs.edx = (_regs.eax < 0) ? -1 : 0;
            break;
        }
        break;

    case 0x9e: /* sahf */
        *(uint8_t *)_regs.eflags = (((uint8_t *)&_regs.eax)[1] & 0xd7) | 0x02;
        break;

    case 0x9f: /* lahf */
        ((uint8_t *)&_regs.eax)[1] = (_regs.eflags & 0xd7) | 0x02;
        break;

    case 0xa0 ... 0xa1: /* mov mem.offs,{%al,%ax,%eax,%rax} */
        /* Source EA is not encoded via ModRM. */
        dst.type  = OP_REG;
        dst.reg   = (unsigned long *)&_regs.eax;
        dst.bytes = (d & ByteOp) ? 1 : op_bytes;
        if ( (rc = ops->read(ea.mem.seg, insn_fetch_bytes(ad_bytes),
                             &dst.val, dst.bytes, ctxt)) != 0 )
            goto done;
        break;

    case 0xa2 ... 0xa3: /* mov {%al,%ax,%eax,%rax},mem.offs */
        /* Destination EA is not encoded via ModRM. */
        dst.type  = OP_MEM;
        dst.mem.seg = ea.mem.seg;
        dst.mem.off = insn_fetch_bytes(ad_bytes);