/* ChibiOS/RT - Copyright (C) 2006,2007,2008,2009,2010 Giovanni Di Sirio. This file is part of ChibiOS/RT. ChibiOS/RT is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. ChibiOS/RT is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . */ /** * @page article_mutual_exclusion Mutual Exclusion guide * The most common problem when writing multithreaded code is the * synchronization on the shared resources/services.
* ChibiOS/RT offers a rich variety of mechanisms that apparently solve the * same problem. I wrote apparently because each mechanism has its pro and * cons. * This article will introduce the various mechanisms and the explain the * right scenarios for each one. * *

Basics

* Some of the concepts mentioned in this article can be found in the * following Wikipedia articles: * - * Mutual Exclusion * - * Priority Inversion * - Priority Inheritance * - * Priority Ceiling * . *

Mutual exclusion by System Locks

* This is the lowest level mechanism, the system is locked by invoking the * @p chSysLock() API and then unlocked by invoking @p chSysUnlock().
* The implementation is architecture dependent but it is guaranteed to, at * least, disable the interrupt sources with hardware priority below or equal * the kernel level. * *

Advantages

* - It is the lightest as execution time, often a lock or unlock becomes just a * single inlined assembler instruction. * - It ensures mutual exclusion among threads but also interrupt handling code. * - The implementation would ensure mutual exclusion even on multicore * architectures where multiple hardware threads are present. * . *

Disadvantages

* - Disabling interrupts for a long period of time can deteriorate the overall * system response time and/or introduce jitter. * . *

When use Locks

* - When mutual exclusion with interrupt handlers is required. * - When the operation within the lock zone is very simple and has finite * time. * . *

Example

* @code * ... * chSysLock(); * /* Protected code */ * chSysUnlock(); * ... * @endcode * *

Mutual exclusion by Semaphores

* In ChibiOS/RT the counting semaphores are mainly meant as a * synchronization mechanism between interrupt handlers and high level code * running at thread level. Usually a thread waits on a semaphore that is * signaled asynchronously by an interrupt handler.
* The semaphores can, however, be used as simple mutexes by initializing * the semaphore counter to one. * *

Advantages

* - The semaphores code is "already there" if you use the I/O queues or * mailboxes and you don't want to enable the mutexes too in order to save * space. * - Semaphores are lighter than mutexes because their queues are FIFO * ordered and do not have any overhead caused by the priority inheritance * algorithm. * - A semaphore takes less RAM than a mutex (12 vs 16 bytes on 32 bit * architectures). * . *

Disadvantages

* - Semaphore queues are FIFO ordered by default, an option exist to make * them priority ordered but this can impact I/O performance because * semaphores are used in I/O queues. * - Semaphores do not implement the Priority Inheritance algorithm. * . *

When use Semaphores

* - When you don't need queuing by priority nor the Priority Inheritance * algorithm. * - When RAM/ROM space is scarce. * . *

Example

* @code * static Semaphore sem; /* Semaphore declaration */ * ... * chSemInit(&sem, 1); /* Semaphore initialization before use */ * ... * chSemWait(&sem); * /* Protected code */ * chSemSignal(&sem); * ... * @endcode * *

Mutual exclusion by Mutexes

* The mutexes are the mechanism intended as the most general solution for * Mutual Exclusion. * *

Advantages

* - Mutexes implement the Priority Inheritance algorithm that is an important * tool in reducing jitter and improve overall system response time (it is * not a magic solution, just another tool for the system designer). * . *

Disadvantages

* - Heaviest among all the possible choices. The Priority Inheritance method * is efficiently implemented but nothing is more efficient than no code at * all. * . *

When use Mutexes

* - When you are designing a very complex system with hard realtime * requirements. * . *

Example

* @code * static Mutex mtx; /* Mutex declaration */ * ... * chMtxInit(&mtx); /* Mutex initialization before use */ * ... * chMtxLock(&mtx); * /* Protected code */ * chMtxUnlock(); * ... * @endcode * *

Mutual exclusion by priority boost

* Another way to implement mutual exclusion is to boost the thread priority * to a level higher than all of the threads competing for a certain resource. * This solution effectively implements an Immediate Priority Ceiling * algorithm. * *

Advantages

* - Almost free as code size, you need no semaphores nor mutexes. * - No RAM overhead. * - Fast execution, priority change is a quick operation under ChibiOS/RT. * - The Priority Ceiling protocol can help mitigate potential Priority * Inversion problems. * . *

Disadvantages

* - Makes the design more complicated because priorities must be assigned to * not just threads but also assigned to the resources to be protected. * - Locking a resource affects all the threads with lower priority even if * not interested to the resource. * - All the threads that can access the resource must have lower priority * than the resource itself. * - The mechanism is not easy to understand in the code unless it is clearly * documented. * - This method does not work in on multicore architectures where multiple * hardware threads are present. * - Only useful in very simple applications. * . *

Example

* @code * /* Priority assigned to the resource, threads must have lower * priority than this.*/ * #define AAA_RESOURCE_PRIORITY NORMALPRIO+10 * ... * /* Locks the resources AAA.*/ * tprio_t aaa_old_prio = chThdSetPriority(AAA_RESOURCE_PRIORITY); * /* Accessing resource AAA */ * chThdSetPriority(aaa_old_prio); /* Unlocks AAA.*/ * ... * @endcode * *

Mutual exclusion by message passing

* Another method is to make a single dedicated thread execute the critical * code and make it work as a messages server. The other threads can request * the service to the server by sending a properly formatted message and * then wait for the answer with the result.
* This method is very useful when integrating into the system components not * designed to be reentrant or to be executed in a multithreaded environment, * as example a 3rd part file system or a networking protocol stack. * *

Advantages

* - It is possible to encapsulate very complex logic without worry about * about concurrent accesses. * - If the encapsulate code uses a large stack area only the server thread * have to allocate enough RAM, the client threads save RAM by just * requesting the service to the server. * - Clean system architecture. * - This method also implements a form of Priority Ceiling. The ceiling is * the priority of the server thread itself. * . *

Disadvantages

* - More complex implementation, a protocol must be created between clients * and server. * - Two context switches are required for each request to the server (but * ChibiOSRT is very efficient at that). * - Requires a dedicated thread as server. * . */ n174' href='#n174'>174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197
/*
 *  yosys -- Yosys Open SYnthesis Suite
 *
 *  Copyright (C) 2019-2020  whitequark <whitequark@whitequark.org>
 *
 *  Permission to use, copy, modify, and/or distribute this software for any
 *  purpose with or without fee is hereby granted.
 *
 *  THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 *  WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 *  MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 *  ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 *  WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 *  ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 *  OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 *
 */

// This file is included by the designs generated with `write_cxxrtl`. It is not used in Yosys itself.

#ifndef CXXRTL_H
#define CXXRTL_H

#include <cstddef>
#include <cstdint>
#include <cassert>
#include <limits>
#include <type_traits>
#include <tuple>
#include <vector>
#include <map>
#include <algorithm>
#include <memory>
#include <sstream>

// The cxxrtl support library implements compile time specialized arbitrary width arithmetics, as well as provides
// composite lvalues made out of bit slices and concatenations of lvalues. This allows the `write_cxxrtl` pass
// to perform a straightforward translation of RTLIL structures to readable C++, relying on the C++ compiler
// to unwrap the abstraction and generate efficient code.
namespace cxxrtl {

// All arbitrary-width values in cxxrtl are backed by arrays of unsigned integers called chunks. The chunk size
// is the same regardless of the value width to simplify manipulating values via FFI interfaces, e.g. driving
// and introspecting the simulation in Python.
//
// It is practical to use chunk sizes between 32 bits and platform register size because when arithmetics on
// narrower integer types is legalized by the C++ compiler, it inserts code to clear the high bits of the register.
// However, (a) most of our operations do not change those bits in the first place because of invariants that are
// invisible to the compiler, (b) we often operate on non-power-of-2 values and have to clear the high bits anyway.
// Therefore, using relatively wide chunks and clearing the high bits explicitly and only when we know they may be
// clobbered results in simpler generated code.
template<typename T>
struct chunk_traits {
	static_assert(std::is_integral<T>::value && std::is_unsigned<T>::value,
	              "chunk type must be an unsigned integral type");
	using type = T;
	static constexpr size_t bits = std::numeric_limits<T>::digits;
	static constexpr T mask = std::numeric_limits<T>::max();
};

template<class T>
struct expr_base;

template<size_t Bits>
struct value : public expr_base<value<Bits>> {
	static constexpr size_t bits = Bits;

	using chunk = chunk_traits<uint32_t>;
	static constexpr chunk::type msb_mask = (Bits % chunk::bits == 0) ? chunk::mask
		: chunk::mask >> (chunk::bits - (Bits % chunk::bits));

	static constexpr size_t chunks = (Bits + chunk::bits - 1) / chunk::bits;
	chunk::type data[chunks] = {};

	value() = default;
	template<typename... Init>
	explicit constexpr value(Init ...init) : data{init...} {}

	value(const value<Bits> &) = default;
	value(value<Bits> &&) = default;
	value<Bits> &operator=(const value<Bits> &) = default;

	// A (no-op) helper that forces the cast to value<>.
	const value<Bits> &val() const {
		return *this;
	}

	std::string str() const {
		std::stringstream ss;
		ss << *this;
		return ss.str();
	}

	// Operations with compile-time parameters.
	//
	// These operations are used to implement slicing, concatenation, and blitting.
	// The trunc, zext and sext operations add or remove most significant bits (i.e. on the left);
	// the rtrunc and rzext operations add or remove least significant bits (i.e. on the right).
	template<size_t NewBits>
	value<NewBits> trunc() const {
		static_assert(NewBits <= Bits, "trunc() may not increase width");
		value<NewBits> result;
		for (size_t n = 0; n < result.chunks; n++)
			result.data[n] = data[n];
		result.data[result.chunks - 1] &= result.msb_mask;
		return result;
	}

	template<size_t NewBits>
	value<NewBits> zext() const {
		static_assert(NewBits >= Bits, "zext() may not decrease width");
		value<NewBits> result;
		for (size_t n = 0; n < chunks; n++)
			result.data[n] = data[n];
		return result;
	}

	template<size_t NewBits>
	value<NewBits> sext() const {
		static_assert(NewBits >= Bits, "sext() may not decrease width");
		value<NewBits> result;
		for (size_t n = 0; n < chunks; n++)
			result.data[n] = data[n];
		if (is_neg()) {
			result.data[chunks - 1] |= ~msb_mask;
			for (size_t n = chunks; n < result.chunks; n++)
				result.data[n] = chunk::mask;
			result.data[result.chunks - 1] &= result.msb_mask;
		}
		return result;
	}

	template<size_t NewBits>
	value<NewBits> rtrunc() const {
		static_assert(NewBits <= Bits, "rtrunc() may not increase width");
		value<NewBits> result;
		constexpr size_t shift_chunks = (Bits - NewBits) / chunk::bits;
		constexpr size_t shift_bits   = (Bits - NewBits) % chunk::bits;
		chunk::type carry = 0;
		if (shift_chunks + result.chunks < chunks) {
			carry = (shift_bits == 0) ? 0
				: data[shift_chunks + result.chunks] << (chunk::bits - shift_bits);
		}
		for (size_t n = result.chunks; n > 0; n--) {
			result.data[n - 1] = carry | (data[shift_chunks + n - 1] >> shift_bits);
			carry = (shift_bits == 0) ? 0
				: data[shift_chunks + n - 1] << (chunk::bits - shift_bits);
		}
		return result;
	}

	template<size_t NewBits>
	value<NewBits> rzext() const {
		static_assert(NewBits >= Bits, "rzext() may not decrease width");
		value<NewBits> result;
		constexpr size_t shift_chunks = (NewBits - Bits) / chunk::bits;
		constexpr size_t shift_bits   = (NewBits - Bits) % chunk::bits;
		chunk::type carry = 0;
		for (size_t n = 0; n < chunks; n++) {
			result.data[shift_chunks + n] = (data[n] << shift_bits) | carry;
			carry = (shift_bits == 0) ? 0
				: data[n] >> (chunk::bits - shift_bits);
		}
		if (carry != 0)
			result.data[result.chunks - 1] = carry;
		return result;
	}

	// Bit blit operation, i.e. a partial read-modify-write.
	template<size_t Stop, size_t Start>
	value<Bits> blit(const value<Stop - Start + 1> &source) const {
		static_assert(Stop >= Start, "blit() may not reverse bit order");
		constexpr chunk::type start_mask = ~(chunk::mask << (Start % chunk::bits));
		constexpr chunk::type stop_mask = (Stop % chunk::bits + 1 == chunk::bits) ? 0
			: (chunk::mask << (Stop % chunk::bits + 1));
		value<Bits> masked = *this;
		if (Start / chunk::bits == Stop / chunk::bits) {
			masked.data[Start / chunk::bits] &= stop_mask | start_mask;
		} else {
			masked.data[Start / chunk::bits] &= start_mask;
			for (size_t n = Start / chunk::bits + 1; n < Stop / chunk::bits; n++)
				masked.data[n] = 0;
			masked.data[Stop / chunk::bits] &= stop_mask;
		}
		value<Bits> shifted = source
			.template rzext<Stop + 1>()
			.template zext<Bits>();
		return masked.bit_or(shifted);
	}

	// Helpers for selecting extending or truncating operation depending on whether the result is wider or narrower
	// than the operand. In C++17 these can be replaced with `if constexpr`.
	template<size_t NewBits, typename = void>
	struct zext_cast {
		value<NewBits> operator()(const value<Bits> &val) {
			return val.template zext<NewBits>();
		}
	};

	template<size_t NewBits>
	struct zext_cast<NewBits, typename std::enable_if<(NewBits < Bits)>::type> {
		value<NewBits> operator()(const value<Bits> &val) {
			return val.template trunc<NewBits>();
		}
	};

	template<size_t NewBits, typename = void>
	struct sext_cast {
		value<NewBits> operator()(const value<Bits> &val) {
			return val.template sext<NewBits>();
		}
	};

	template<size_t NewBits>
	struct sext_cast<NewBits, typename std::enable_if<(NewBits < Bits)>::type> {
		value<NewBits> operator()(const value<Bits> &val) {
			return val.template trunc<NewBits>();
		}
	};

	template<size_t NewBits>
	value<NewBits> zcast() const {
		return zext_cast<NewBits>()(*this);
	}

	template<size_t NewBits>
	value<NewBits> scast() const {
		return sext_cast<NewBits>()(*this);
	}

	// Operations with run-time parameters (offsets, amounts, etc).
	//
	// These operations are used for computations.
	bool bit(size_t offset) const {
		return data[offset / chunk::bits] & (1 << (offset % chunk::bits));
	}

	void set_bit(size_t offset, bool value = true) {
		size_t offset_chunks = offset / chunk::bits;
		size_t offset_bits = offset % chunk::bits;
		data[offset_chunks] &= ~(1 << offset_bits);
		data[offset_chunks] |= value ? 1 << offset_bits : 0;
	}

	bool is_zero() const {
		for (size_t n = 0; n < chunks; n++)
			if (data[n] != 0)
				return false;
		return true;
	}

	explicit operator bool() const {
		return !is_zero();
	}

	bool is_neg() const {
		return data[chunks - 1] & (1 << ((Bits - 1) % chunk::bits));
	}

	bool operator ==(const value<Bits> &other) const {
		for (size_t n = 0; n < chunks; n++)
			if (data[n] != other.data[n])
				return false;
		return true;
	}

	bool operator !=(const value<Bits> &other) const {
		return !(*this == other);
	}

	value<Bits> bit_not() const {
		value<Bits> result;
		for (size_t n = 0; n < chunks; n++)
			result.data[n] = ~data[n];
		result.data[chunks - 1] &= msb_mask;
		return result;
	}

	value<Bits> bit_and(const value<Bits> &other) const {
		value<Bits> result;
		for (size_t n = 0; n < chunks; n++)
			result.data[n] = data[n] & other.data[n];
		return result;
	}

	value<Bits> bit_or(const value<Bits> &other) const {
		value<Bits> result;
		for (size_t n = 0; n < chunks; n++)
			result.data[n] = data[n] | other.data[n];
		return result;
	}

	value<Bits> bit_xor(const value<Bits> &other) const {
		value<Bits> result;
		for (size_t n = 0; n < chunks; n++)
			result.data[n] = data[n] ^ other.data[n];
		return result;
	}

	value<Bits> update(const value<Bits> &val, const value<Bits> &mask) const {
		return bit_and(mask.bit_not()).bit_or(val.bit_and(mask));
	}

	template<size_t AmountBits>
	value<Bits> shl(const value<AmountBits> &amount) const {
		// Ensure our early return is correct by prohibiting values larger than 4 Gbit.
		static_assert(Bits <= chunk::mask, "shl() of unreasonably large values is not supported");
		// Detect shifts definitely large than Bits early.
		for (size_t n = 1; n < amount.chunks; n++)
			if (amount.data[n] != 0)
				return {};
		// Past this point we can use the least significant chunk as the shift size.
		size_t shift_chunks = amount.data[0] / chunk::bits;
		size_t shift_bits   = amount.data[0] % chunk::bits;
		if (shift_chunks >= chunks)
			return {};
		value<Bits> result;
		chunk::type carry = 0;
		for (size_t n = 0; n < chunks - shift_chunks; n++) {
			result.data[shift_chunks + n] = (data[n] << shift_bits) | carry;
			carry = (shift_bits == 0) ? 0
				: data[n] >> (chunk::bits - shift_bits);
		}
		return result;
	}

	template<size_t AmountBits, bool Signed = false>
	value<Bits> shr(const value<AmountBits> &amount) const {
		// Ensure our early return is correct by prohibiting values larger than 4 Gbit.
		static_assert(Bits <= chunk::mask, "shr() of unreasonably large values is not supported");
		// Detect shifts definitely large than Bits early.
		for (size_t n = 1; n < amount.chunks; n++)
			if (amount.data[n] != 0)
				return {};
		// Past this point we can use the least significant chunk as the shift size.
		size_t shift_chunks = amount.data[0] / chunk::bits;
		size_t shift_bits   = amount.data[0] % chunk::bits;
		if (shift_chunks >= chunks)
			return {};
		value<Bits> result;
		chunk::type carry = 0;
		for (size_t n = 0; n < chunks - shift_chunks; n++) {
			result.data[chunks - shift_chunks - 1 - n] = carry | (data[chunks - 1 - n] >> shift_bits);
			carry = (shift_bits == 0) ? 0
				: data[chunks - 1 - n] << (chunk::bits - shift_bits);
		}
		if (Signed && is_neg()) {
			for (size_t n = chunks - shift_chunks; n < chunks; n++)
				result.data[n] = chunk::mask;
			if (shift_bits != 0)
				result.data[chunks - shift_chunks] |= chunk::mask << (chunk::bits - shift_bits);
		}
		return result;
	}

	template<size_t AmountBits>
	value<Bits> sshr(const value<AmountBits> &amount) const {
		return shr<AmountBits, /*Signed=*/true>(amount);
	}

	size_t ctpop() const {
		size_t count = 0;
		for (size_t n = 0; n < chunks; n++) {
			// This loop implements the population count idiom as recognized by LLVM and GCC.
			for (chunk::type x = data[n]; x != 0; count++)
				x = x & (x - 1);
		}
		return count;
	}

	size_t ctlz() const {
		size_t count = 0;
		for (size_t n = 0; n < chunks; n++) {
			chunk::type x = data[chunks - 1 - n];
			if (x == 0) {
				count += (n == 0 ? Bits % chunk::bits : chunk::bits);
			} else {
				// This loop implements the find first set idiom as recognized by LLVM.
				for (; x != 0; count++)
					x >>= 1;
			}
		}
		return count;
	}

	template<bool Invert, bool CarryIn>
	std::pair<value<Bits>, bool /*CarryOut*/> alu(const value<Bits> &other) const {
		value<Bits> result;
		bool carry = CarryIn;
		for (size_t n = 0; n < result.chunks; n++) {
			result.data[n] = data[n] + (Invert ? ~other.data[n] : other.data[n]) + carry;
			carry = (result.data[n] <  data[n]) ||
			        (result.data[n] == data[n] && carry);
		}
		result.data[result.chunks - 1] &= result.msb_mask;
		return {result, carry};
	}

	value<Bits> add(const value<Bits> &other) const {
		return alu</*Invert=*/false, /*CarryIn=*/false>(other).first;
	}

	value<Bits> sub(const value<Bits> &other) const {
		return alu</*Invert=*/true, /*CarryIn=*/true>(other).first;
	}

	value<Bits> neg() const {
		return value<Bits> { 0u }.sub(*this);
	}

	bool ucmp(const value<Bits> &other) const {
		bool carry;
		std::tie(std::ignore, carry) = alu</*Invert=*/true, /*CarryIn=*/true>(other);
		return !carry; // a.ucmp(b) ≡ a u< b
	}

	bool scmp(const value<Bits> &other) const {
		value<Bits> result;
		bool carry;
		std::tie(result, carry) = alu</*Invert=*/true, /*CarryIn=*/true>(other);
		bool overflow = (is_neg() == !other.is_neg()) && (is_neg() != result.is_neg());
		return result.is_neg() ^ overflow; // a.scmp(b) ≡ a s< b
	}
};

// Expression template for a slice, usable as lvalue or rvalue, and composable with other expression templates here.
template<class T, size_t Stop, size_t Start>
struct slice_expr : public expr_base<slice_expr<T, Stop, Start>> {
	static_assert(Stop >= Start, "slice_expr() may not reverse bit order");
	static_assert(Start < T::bits && Stop < T::bits, "slice_expr() must be within bounds");
	static constexpr size_t bits = Stop - Start + 1;

	T &expr;

	slice_expr(T &expr) : expr(expr) {}
	slice_expr(const slice_expr<T, Stop, Start> &) = delete;

	operator value<bits>() const {
		return static_cast<const value<T::bits> &>(expr)
			.template rtrunc<T::bits - Start>()
			.template trunc<bits>();
	}

	slice_expr<T, Stop, Start> &operator=(const value<bits> &rhs) {
		// Generic partial assignment implemented using a read-modify-write operation on the sliced expression.
		expr = static_cast<const value<T::bits> &>(expr)
			.template blit<Stop, Start>(rhs);
		return *this;
	}

	// A helper that forces the cast to value<>, which allows deduction to work.
	value<bits> val() const {
		return static_cast<const value<bits> &>(*this);
	}
};

// Expression template for a concatenation, usable as lvalue or rvalue, and composable with other expression templates here.
template<class T, class U>
struct concat_expr : public expr_base<concat_expr<T, U>> {
	static constexpr size_t bits = T::bits + U::bits;

	T &ms_expr;
	U &ls_expr;

	concat_expr(T &ms_expr, U &ls_expr) : ms_expr(ms_expr), ls_expr(ls_expr) {}
	concat_expr(const concat_expr<T, U> &) = delete;

	operator value<bits>() const {
		value<bits> ms_shifted = static_cast<const value<T::bits> &>(ms_expr)
			.template rzext<bits>();
		value<bits> ls_extended = static_cast<const value<U::bits> &>(ls_expr)
			.template zext<bits>();
		return ms_shifted.bit_or(ls_extended);
	}

	concat_expr<T, U> &operator=(const value<bits> &rhs) {
		ms_expr = rhs.template rtrunc<T::bits>();
		ls_expr = rhs.template trunc<U::bits>();
		return *this;
	}

	// A helper that forces the cast to value<>, which allows deduction to work.
	value<bits> val() const {
		return static_cast<const value<bits> &>(*this);
	}
};

// Base class for expression templates, providing helper methods for operations that are valid on both rvalues and lvalues.
//
// Note that expression objects (slices and concatenations) constructed in this way should NEVER be captured because
// they refer to temporaries that will, in general, only live until the end of the statement. For example, both of
// these snippets perform use-after-free:
//
//    const auto &a = val.slice<7,0>().slice<1>();
//    value<1> b = a;
//
//    auto &&c = val.slice<7,0>().slice<1>();
//    c = value<1>{1u};
//
// An easy way to write code using slices and concatenations safely is to follow two simple rules:
//   * Never explicitly name any type except `value<W>` or `const value<W> &`.
//   * Never use a `const auto &` or `auto &&` in any such expression.
// Then, any code that compiles will be well-defined.
template<class T>
struct expr_base {
	template<size_t Stop, size_t Start = Stop>
	slice_expr<const T, Stop, Start> slice() const {
		return {*static_cast<const T *>(this)};
	}

	template<size_t Stop, size_t Start = Stop>
	slice_expr<T, Stop, Start> slice() {
		return {*static_cast<T *>(this)};
	}

	template<class U>
	concat_expr<const T, typename std::remove_reference<const U>::type> concat(const U &other) const {
		return {*static_cast<const T *>(this), other};
	}

	template<class U>
	concat_expr<T, typename std::remove_reference<U>::type> concat(U &&other) {
		return {*static_cast<T *>(this), other};
	}
};

template<size_t Bits>
std::ostream &operator<<(std::ostream &os, const value<Bits> &val) {
	auto old_flags = os.flags(std::ios::right);
	auto old_width = os.width(0);
	auto old_fill  = os.fill('0');
	os << val.bits << '\'' << std::hex;
	for (size_t n = val.chunks - 1; n != (size_t)-1; n--) {
		if (n == val.chunks - 1 && Bits % value<Bits>::chunk::bits != 0)
			os.width((Bits % value<Bits>::chunk::bits + 3) / 4);
		else
			os.width((value<Bits>::chunk::bits + 3) / 4);
		os << val.data[n];
	}
	os.fill(old_fill);
	os.width(old_width);
	os.flags(old_flags);
	return os;
}

template<size_t Bits>
struct wire {
	static constexpr size_t bits = Bits;

	value<Bits> curr;
	value<Bits> next;

	wire() = default;
	constexpr wire(const value<Bits> &init) : curr(init), next(init) {}
	template<typename... Init>
	explicit constexpr wire(Init ...init) : curr{init...}, next{init...} {}

	wire(const wire<Bits> &) = delete;
	wire(wire<Bits> &&) = default;
	wire<Bits> &operator=(const wire<Bits> &) = delete;

	bool commit() {
		if (curr != next) {
			curr = next;
			return true;
		}
		return false;
	}
};

template<size_t Bits>
std::ostream &operator<<(std::ostream &os, const wire<Bits> &val) {
	os << val.curr;
	return os;
}

template<size_t Width>
struct memory {
	std::vector<value<Width>> data;

	size_t depth() const {
		return data.size();
	}

	memory() = delete;
	explicit memory(size_t depth) : data(depth) {}

	memory(const memory<Width> &) = delete;
	memory<Width> &operator=(const memory<Width> &) = delete;

	// The only way to get the compiler to put the initializer in .rodata and do not copy it on stack is to stuff it
	// into a plain array. You'd think an std::initializer_list would work here, but it doesn't, because you can't
	// construct an initializer_list in a constexpr (or something) and so if you try to do that the whole thing is
	// first copied on the stack (probably overflowing it) and then again into `data`.
	template<size_t Size>
	struct init {
		size_t offset;
		value<Width> data[Size];
	};

	template<size_t... InitSize>
	explicit memory(size_t depth, const init<InitSize> &...init) : data(depth) {
		data.resize(depth);
		// This utterly reprehensible construct is the most reasonable way to apply a function to every element
		// of a parameter pack, if the elements all have different types and so cannot be cast to an initializer list.
		auto _ = {std::move(std::begin(init.data), std::end(init.data), data.begin() + init.offset)...};
	}

	// An operator for direct memory reads. May be used at any time during the simulation.
	const value<Width> &operator [](size_t index) const {
		assert(index < data.size());
		return data[index];
	}

	// An operator for direct memory writes. May only be used before the simulation is started. If used
	// after the simulation is started, the design may malfunction.
	value<Width> &operator [](size_t index) {
		assert(index < data.size());
		return data[index];
	}

	// A simple way to make a writable memory would be to use an array of wires instead of an array of values.
	// However, there are two significant downsides to this approach: first, it has large overhead (2× space
	// overhead, and O(depth) time overhead during commit); second, it does not simplify handling write port
	// priorities. Although in principle write ports could be ordered or conditionally enabled in generated
	// code based on their priorities and selected addresses, the feedback arc set problem is computationally
	// expensive, and the heuristic based algorithms are not easily modified to guarantee (rather than prefer)
	// a particular write port evaluation order.
	//
	// The approach used here instead is to queue writes into a buffer during the eval phase, then perform
	// the writes during the commit phase in the priority order. This approach has low overhead, with both space
	// and time proportional to the amount of write ports. Because virtually every memory in a practical design
	// has at most two write ports, linear search is used on every write, being the fastest and simplest approach.
	struct write {
		size_t index;
		value<Width> val;
		value<Width> mask;
		int priority;
	};
	std::vector<write> write_queue;

	void update(size_t index, const value<Width> &val, const value<Width> &mask, int priority = 0) {
		assert(index < data.size());
		// Queue up the write while keeping the queue sorted by priority.
		write_queue.insert(
			std::upper_bound(write_queue.begin(), write_queue.end(), priority,
				[](const int a, const write& b) { return a < b.priority; }),
			write { index, val, mask, priority });
	}

	bool commit() {
		bool changed = false;
		for (const write &entry : write_queue) {
			value<Width> elem = data[entry.index];
			elem = elem.update(entry.val, entry.mask);
			changed |= (data[entry.index] != elem);
			data[entry.index] = elem;
		}
		write_queue.clear();
		return changed;
	}
};

struct metadata {
	const enum {
		MISSING = 0,
		UINT   	= 1,
		SINT   	= 2,
		STRING 	= 3,
		DOUBLE 	= 4,
	} value_type;

	// In debug mode, using the wrong .as_*() function will assert.
	// In release mode, using the wrong .as_*() function will safely return a default value.
	union {
		const unsigned  uint_value = 0;
		const signed    sint_value;
	};
	const std::string string_value = "";
	const double      double_value = 0.0;

	metadata() : value_type(MISSING) {}
	metadata(unsigned value) : value_type(UINT), uint_value(value) {}
	metadata(signed value) : value_type(SINT), sint_value(value) {}
	metadata(const std::string &value) : value_type(STRING), string_value(value) {}
	metadata(const char *value) : value_type(STRING), string_value(value) {}
	metadata(double value) : value_type(DOUBLE), double_value(value) {}

	metadata(const metadata &) = default;
	metadata &operator=(const metadata &) = delete;

	unsigned as_uint() const {
		assert(value_type == UINT);
		return uint_value;
	}

	signed as_sint() const {
		assert(value_type == SINT);
		return sint_value;
	}

	const std::string &as_string() const {
		assert(value_type == STRING);
		return string_value;
	}

	double as_double() const {
		assert(value_type == DOUBLE);
		return double_value;
	}
};

typedef std::map<std::string, metadata> metadata_map;

struct module {
	module() {}
	virtual ~module() {}

	module(const module &) = delete;
	module &operator=(const module &) = delete;

	virtual bool eval() = 0;
	virtual bool commit() = 0;

	size_t step() {
		size_t deltas = 0;
		bool converged = false;
		do {
			converged = eval();
			deltas++;
		} while (commit() && !converged);
		return deltas;
	}
};

} // namespace cxxrtl

// Definitions of internal Yosys cells. Other than the functions in this namespace, cxxrtl is fully generic
// and indepenent of Yosys implementation details.
//
// The `write_cxxrtl` pass translates internal cells (cells with names that start with `$`) to calls of these
// functions. All of Yosys arithmetic and logical cells perform sign or zero extension on their operands,
// whereas basic operations on arbitrary width values require operands to be of the same width. These functions
// bridge the gap by performing the necessary casts. They are named similar to `cell_A[B]`, where A and B are `u`
// if the corresponding operand is unsigned, and `s` if it is signed.
namespace cxxrtl_yosys {

using namespace cxxrtl;

// std::max isn't constexpr until C++14 for no particular reason (it's an oversight), so we define our own.
template<class T>
constexpr T max(const T &a, const T &b) {
	return a > b ? a : b;
}

// Logic operations
template<size_t BitsY, size_t BitsA>
value<BitsY> not_u(const value<BitsA> &a) {
	return a.template zcast<BitsY>().bit_not();
}

template<size_t BitsY, size_t BitsA>
value<BitsY> not_s(const value<BitsA> &a) {
	return a.template scast<BitsY>().bit_not();
}

template<size_t BitsY, size_t BitsA>
value<BitsY> logic_not_u(const value<BitsA> &a) {
	return value<BitsY> { a ? 0u : 1u };
}

template<size_t BitsY, size_t BitsA>
value<BitsY> logic_not_s(const value<BitsA> &a) {
	return value<BitsY> { a ? 0u : 1u };
}

template<size_t BitsY, size_t BitsA>
value<BitsY> reduce_and_u(const value<BitsA> &a) {
	return value<BitsY> { a.bit_not().is_zero() ? 1u : 0u };
}

template<size_t BitsY, size_t BitsA>
value<BitsY> reduce_and_s(const value<BitsA> &a) {
	return value<BitsY> { a.bit_not().is_zero() ? 1u : 0u };
}

template<size_t BitsY, size_t BitsA>
value<BitsY> reduce_or_u(const value<BitsA> &a) {
	return value<BitsY> { a ? 1u : 0u };
}

template<size_t BitsY, size_t BitsA>
value<BitsY> reduce_or_s(const value<BitsA> &a) {
	return value<BitsY> { a ? 1u : 0u };
}

template<size_t BitsY, size_t BitsA>
value<BitsY> reduce_xor_u(const value<BitsA> &a) {
	return value<BitsY> { (a.ctpop() % 2) ? 1u : 0u };
}

template<size_t BitsY, size_t BitsA>
value<BitsY> reduce_xor_s(const value<BitsA> &a) {
	return value<BitsY> { (a.ctpop() % 2) ? 1u : 0u };
}

template<size_t BitsY, size_t BitsA>
value<BitsY> reduce_xnor_u(const value<BitsA> &a) {
	return value<BitsY> { (a.ctpop() % 2) ? 0u : 1u };
}

template<size_t BitsY, size_t BitsA>
value<BitsY> reduce_xnor_s(const value<BitsA> &a) {
	return value<BitsY> { (a.ctpop() % 2) ? 0u : 1u };
}

template<size_t BitsY, size_t BitsA>
value<BitsY> reduce_bool_u(const value<BitsA> &a) {
	return value<BitsY> { a ? 1u : 0u };
}

template<size_t BitsY, size_t BitsA>
value<BitsY> reduce_bool_s(const value<BitsA> &a) {
	return value<BitsY> { a ? 1u : 0u };
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> and_uu(const value<BitsA> &a, const value<BitsB> &b) {
	return a.template zcast<BitsY>().bit_and(b.template zcast<BitsY>());
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> and_ss(const value<BitsA> &a, const value<BitsB> &b) {
	return a.template scast<BitsY>().bit_and(b.template scast<BitsY>());
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> or_uu(const value<BitsA> &a, const value<BitsB> &b) {
	return a.template zcast<BitsY>().bit_or(b.template zcast<BitsY>());
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> or_ss(const value<BitsA> &a, const value<BitsB> &b) {
	return a.template scast<BitsY>().bit_or(b.template scast<BitsY>());
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> xor_uu(const value<BitsA> &a, const value<BitsB> &b) {
	return a.template zcast<BitsY>().bit_xor(b.template zcast<BitsY>());
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> xor_ss(const value<BitsA> &a, const value<BitsB> &b) {
	return a.template scast<BitsY>().bit_xor(b.template scast<BitsY>());
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> xnor_uu(const value<BitsA> &a, const value<BitsB> &b) {
	return a.template zcast<BitsY>().bit_xor(b.template zcast<BitsY>()).bit_not();
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> xnor_ss(const value<BitsA> &a, const value<BitsB> &b) {
	return a.template scast<BitsY>().bit_xor(b.template scast<BitsY>()).bit_not();
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> logic_and_uu(const value<BitsA> &a, const value<BitsB> &b) {
	return value<BitsY> { (bool(a) & bool(b)) ? 1u : 0u };
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> logic_and_ss(const value<BitsA> &a, const value<BitsB> &b) {
	return value<BitsY> { (bool(a) & bool(b)) ? 1u : 0u };
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> logic_or_uu(const value<BitsA> &a, const value<BitsB> &b) {
	return value<BitsY> { (bool(a) | bool(b)) ? 1u : 0u };
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> logic_or_ss(const value<BitsA> &a, const value<BitsB> &b) {
	return value<BitsY> { (bool(a) | bool(b)) ? 1u : 0u };
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> shl_uu(const value<BitsA> &a, const value<BitsB> &b) {
	return a.template zcast<BitsY>().template shl(b);
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> shl_su(const value<BitsA> &a, const value<BitsB> &b) {
	return a.template scast<BitsY>().template shl(b);
}

template<size_t BitsY, size_t BitsA, size_t BitsB>
value<BitsY> sshl_uu(const value<BitsA> &a, const value<BitsB> &b) {
	return a.template zcast<BitsY>().template shl(b);