summaryrefslogtreecommitdiffstats
path: root/src/python
diff options
context:
space:
mode:
authorAlan Mishchenko <alanmi@berkeley.edu>2012-01-13 20:58:28 -0800
committerAlan Mishchenko <alanmi@berkeley.edu>2012-01-13 20:58:28 -0800
commitb7ba9aa8dcdfee0c5fa42aec0385b83d2371da39 (patch)
tree2b5f55f03b0c4f701b824679e54524e4675b4211 /src/python
parent37b8a190baa91c69dcbd4300f03e209e19fb5b9b (diff)
downloadabc-b7ba9aa8dcdfee0c5fa42aec0385b83d2371da39.tar.gz
abc-b7ba9aa8dcdfee0c5fa42aec0385b83d2371da39.tar.bz2
abc-b7ba9aa8dcdfee0c5fa42aec0385b83d2371da39.zip
New hierarchy manager.
Diffstat (limited to 'src/python')
0 files changed, 0 insertions, 0 deletions
href='#n102'>102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185
/*
 * McKinley-optimized version of copy_page().
 *
 * Copyright (C) 2002 Hewlett-Packard Co
 *	David Mosberger <davidm@hpl.hp.com>
 *
 * Inputs:
 *	in0:	address of target page
 *	in1:	address of source page
 * Output:
 *	no return value
 *
 * General idea:
 *	- use regular loads and stores to prefetch data to avoid consuming M-slot just for
 *	  lfetches => good for in-cache performance
 *	- avoid l2 bank-conflicts by not storing into the same 16-byte bank within a single
 *	  cycle
 *
 * Principle of operation:
 *	First, note that L1 has a line-size of 64 bytes and L2 a line-size of 128 bytes.
 *	To avoid secondary misses in L2, we prefetch both source and destination with a line-size
 *	of 128 bytes.  When both of these lines are in the L2 and the first half of the
 *	source line is in L1, we start copying the remaining words.  The second half of the
 *	source line is prefetched in an earlier iteration, so that by the time we start
 *	accessing it, it's also present in the L1.
 *
 *	We use a software-pipelined loop to control the overall operation.  The pipeline
 *	has 2*PREFETCH_DIST+K stages.  The first PREFETCH_DIST stages are used for prefetching
 *	source cache-lines.  The second PREFETCH_DIST stages are used for prefetching destination
 *	cache-lines, the last K stages are used to copy the cache-line words not copied by
 *	the prefetches.  The four relevant points in the pipelined are called A, B, C, D:
 *	p[A] is TRUE if a source-line should be prefetched, p[B] is TRUE if a destination-line
 *	should be prefetched, p[C] is TRUE if the second half of an L2 line should be brought
 *	into L1D and p[D] is TRUE if a cacheline needs to be copied.
 *
 *	This all sounds very complicated, but thanks to the modulo-scheduled loop support,
 *	the resulting code is very regular and quite easy to follow (once you get the idea).