aboutsummaryrefslogtreecommitdiffstats
path: root/doc-src/howmitmproxy.html
diff options
context:
space:
mode:
authorAldo Cortesi <aldo@nullcube.com>2014-01-27 14:16:23 +1300
committerAldo Cortesi <aldo@nullcube.com>2014-01-27 14:16:23 +1300
commitf47d89ff4e710f6d2df755fe526e91a5cf236cfa (patch)
tree61c35a397e569619ae7022fb006279d86a0d88eb /doc-src/howmitmproxy.html
parent47c7e37723b416605f232cd85818f41d865a9f85 (diff)
downloadmitmproxy-f47d89ff4e710f6d2df755fe526e91a5cf236cfa.tar.gz
mitmproxy-f47d89ff4e710f6d2df755fe526e91a5cf236cfa.tar.bz2
mitmproxy-f47d89ff4e710f6d2df755fe526e91a5cf236cfa.zip
Revert "Move the doc tree out into its own repo."
This reverts commit 8f88fcedd601c0033b4469b66626a83011879baf.
Diffstat (limited to 'doc-src/howmitmproxy.html')
-rw-r--r--doc-src/howmitmproxy.html360
1 files changed, 360 insertions, 0 deletions
diff --git a/doc-src/howmitmproxy.html b/doc-src/howmitmproxy.html
new file mode 100644
index 00000000..94afd522
--- /dev/null
+++ b/doc-src/howmitmproxy.html
@@ -0,0 +1,360 @@
+
+
+Mitmproxy is an enormously flexible tool. Knowing exactly how the proxying
+process works will help you deploy it creatively, and take into account its
+fundamental assumptions and how to work around them. This document explains
+mitmproxy's proxy mechanism in detail, starting with the simplest unencrypted
+explicit proxying, and working up to the most complicated interaction -
+transparent proxying of SSL-protected traffic[^ssl] in the presence of
+[SNI](http://en.wikipedia.org/wiki/Server_Name_Indication).
+
+
+<div class="page-header">
+ <h1>Explicit HTTP</h1>
+</div>
+
+Configuring the client to use mitmproxy as an explicit proxy is the simplest
+and most reliable way to intercept traffic. The proxy protocol is codified in
+the [HTTP RFC](http://www.ietf.org/rfc/rfc2068.txt), so the behaviour of both
+the client and the server is well defined, and usually reliable. In the
+simplest possible interaction with mitmproxy, a client connects directly to the
+proxy, and makes a request that looks like this:
+
+<pre>GET http://example.com/index.html HTTP/1.1</pre>
+
+This is a proxy GET request - an extended form of the vanilla HTTP GET request
+that includes a schema and host specification, and it includes all the
+information mitmproxy needs to proceed.
+
+<img src="explicit.png"/>
+
+<table class="table">
+ <tbody>
+ <tr>
+
+ <td><b>1</b></td>
+
+ <td>The client connects to the proxy and makes a request.</td>
+
+ </tr>
+
+ <tr>
+
+ <td><b>2</b></td>
+
+ <td>Mitmproxy connects to the upstream server and simply forwards
+ the request on.</td>
+
+ </tr>
+ </tbody>
+</table>
+
+
+<div class="page-header">
+ <h1>Explicit HTTPS</h1>
+</div>
+
+The process for an explicitly proxied HTTPS connection is quite different. The
+client connects to the proxy and makes a request that looks like this:
+
+<pre>CONNECT example.com:443 HTTP/1.1</pre>
+
+A conventional proxy can neither view nor manipulate an SSL-encrypted data
+stream, so a CONNECT request simply asks the proxy to open a pipe between the
+client and server. The proxy here is just a facilitator - it blindly forwards
+data in both directions without knowing anything about the contents. The
+negotiation of the SSL connection happens over this pipe, and the subsequent
+flow of requests and responses are completely opaque to the proxy.
+
+## The MITM in mitmproxy
+
+This is where mitmproxy's fundamental trick comes into play. The MITM in its
+name stands for Man-In-The-Middle - a reference to the process we use to
+intercept and interfere with these theoretically opaque data streams. The basic
+idea is to pretend to be the server to the client, and pretend to be the client
+to the server, while we sit in the middle decoding traffic from both sides. The
+tricky part is that the [Certificate
+Authority](http://en.wikipedia.org/wiki/Certificate_authority) system is
+designed to prevent exactly this attack, by allowing a trusted third-party to
+cryptographically sign a server's SSL certificates to verify that they are
+legit. If this signature doesn't match or is from a non-trusted party, a secure
+client will simply drop the connection and refuse to proceed. Despite the many
+shortcomings of the CA system as it exists today, this is usually fatal to
+attempts to MITM an SSL connection for analysis. Our answer to this conundrum
+is to become a trusted Certificate Authority ourselves. Mitmproxy includes a
+full CA implementation that generates interception certificates on the fly. To
+get the client to trust these certificates, we [register mitmproxy as a trusted
+CA with the device manually](@!urlTo("ssl.html")!@).
+
+## Complication 1: What's the remote hostname?
+
+To proceed with this plan, we need to know the domain name to use in the
+interception certificate - the client will verify that the certificate is for
+the domain it's connecting to, and abort if this is not the case. At first
+blush, it seems that the CONNECT request above gives us all we need - in this
+example, both of these values are "example.com". But what if the client had
+initiated the connection as follows:
+
+<pre>CONNECT 10.1.1.1:443 HTTP/1.1</pre>
+
+Using the IP address is perfectly legitimate because it gives us enough
+information to initiate the pipe, even though it doesn't reveal the remote
+hostname.
+
+Mitmproxy has a cunning mechanism that smooths this over - [upstream
+certificate sniffing](@!urlTo("features/upstreamcerts.html")!@). As soon as we
+see the CONNECT request, we pause the client part of the conversation, and
+initiate a simultaneous connection to the server. We complete the SSL handshake
+with the server, and inspect the certificates it used. Now, we use the Common
+Name in the upstream SSL certificates to generate the dummy certificate for the
+client. Voila, we have the correct hostname to present to the client, even if
+it was never specified.
+
+
+## Complication 2: Subject Alternative Name
+
+Enter the next complication. Sometimes, the certificate Common Name is not, in
+fact, the hostname that the client is connecting to. This is because of the
+optional [Subject Alternative
+Name](http://en.wikipedia.org/wiki/SubjectAltName) field in the SSL certificate
+that allows an arbitrary number of alternative domains to be specified. If the
+expected domain matches any of these, the client will proceed, even though the
+domain doesn't match the certificate Common Name. The answer here is simple:
+when extract the CN from the upstream cert, we also extract the SANs, and add
+them to the generated dummy certificate.
+
+
+## Complication 3: Server Name Indication
+
+One of the big limitations of vanilla SSL is that each certificate requires its
+own IP address. This means that you couldn't do virtual hosting where multiple
+domains with independent certificates share the same IP address. In a world
+with a rapidly shrinking IPv4 address pool this is a problem, and we have a
+solution in the form of the [Server Name
+Indication](http://en.wikipedia.org/wiki/Server_Name_Indication) extension to
+the SSL and TLS protocols. This lets the client specify the remote server name
+at the start of the SSL handshake, which then lets the server select the right
+certificate to complete the process.
+
+SNI breaks our upstream certificate sniffing process, because when we connect
+without using SNI, we get served a default certificate that may have nothing to
+do with the certificate expected by the client. The solution is another tricky
+complication to the client connection process. After the client connects, we
+allow the SSL handshake to continue until just _after_ the SNI value has been
+passed to us. Now we can pause the conversation, and initiate an upstream
+connection using the correct SNI value, which then serves us the correct
+upstream certificate, from which we can extract the expected CN and SANs.
+
+There's another wrinkle here. Due to a limitation of the SSL library mitmproxy
+uses, we can't detect that a connection _hasn't_ sent an SNI request until it's
+too late for upstream certificate sniffing. In practice, we therefore make a
+vanilla SSL connection upstream to sniff non-SNI certificates, and then discard
+the connection if the client sends an SNI notification. If you're watching your
+traffic with a packet sniffer, you'll see two connections to the server when an
+SNI request is made, the first of which is immediately closed after the SSL
+handshake. Luckily, this is almost never an issue in practice.
+
+## Putting it all together
+
+Lets put all of this together into the complete explicitly proxied HTTPS flow.
+
+<img src="explicit_https.png"/>
+
+<table class="table">
+ <tbody>
+ <tr>
+ <td><b>1</b></td>
+ <td>The client makes a connection to mitmproxy, and issues an HTTP
+ CONNECT request.</td>
+ </tr>
+ <tr>
+ <td><b>2</b></td>
+
+ <td>Mitmproxy responds with a 200 Connection Established, as if it
+ has set up the CONNECT pipe.</td>
+ </tr>
+ <tr>
+ <td><b>3</b></td>
+
+ <td>The client believes it's talking to the remote server, and
+ initiates the SSL connection. It uses SNI to indicate the hostname
+ it is connecting to.</td>
+ </tr>
+
+ <tr>
+ <td><b>4</b></td>
+
+ <td>Mitmproxy connects to the server, and establishes an SSL
+ connection using the SNI hostname indicated by the client.</td>
+
+ </tr>
+ <tr>
+ <td><b>5</b></td>
+
+ <td>The server responds with the matching SSL certificate, which
+ contains the CN and SAN values needed to generate the interception
+ certificate.</td>
+ </tr>
+ <tr>
+ <td><b>6</b></td>
+
+ <td>Mitmproxy generates the interception cert, and continues the
+ client SSL handshake paused in step 3.</td>
+ </tr>
+ <tr>
+ <td><b>7</b></td>
+
+ <td>The client sends the request over the established SSL
+ connection.</td>
+ </tr>
+ <tr>
+ <td><b>7</b></td>
+
+ <td>Mitmproxy passes the request on to the server over the SSL
+ connection initiated in step 4.</td>
+ </tr>
+ </tbody>
+</table>
+
+
+<div class="page-header">
+ <h1>Transparent HTTP</h1>
+</div>
+
+When a transparent proxy is used, the HTTP/S connection is redirected into a
+proxy at the network layer, without any client configuration being required.
+This makes transparent proxying ideal for those situations where you can't
+change client behaviour - proxy-oblivious Android applications being a common
+example.
+
+To achieve this, we need to introduce two extra components. The first is a
+redirection mechanism that transparently reroutes a TCP connection destined for
+a server on the Internet to a listening proxy server. This usually takes the
+form of a firewall on the same host as the proxy server -
+[iptables](http://www.netfilter.org/) on Linux or
+[pf](http://en.wikipedia.org/wiki/PF_\(firewall\)) on OSX. Once the client has
+initiated the connection, it makes a vanilla HTTP request, which might look
+something like this:
+
+<pre>GET /index.html HTTP/1.1</pre>
+
+Note that this request differs from the explicit proxy variation, in that it
+omits the scheme and hostname. How, then, do we know which upstream host to
+forward the request to? The routing mechanism that has performed the
+redirection keeps track of the original destination for us. Each routing
+mechanism has a different way of exposing this data, so this introduces the
+second component required for working transparent proxying: a host module that
+knows how to retrieve the original destination address from the router. In
+mitmproxy, this takes the form of a built-in set of
+[modules](https://github.com/mitmproxy/mitmproxy/tree/master/libmproxy/platform)
+that know how to talk to each platform's redirection mechanism. Once we have
+this information, the process is fairly straight-forward.
+
+<img src="transparent.png"/>
+
+
+<table class="table">
+ <tbody>
+ <tr>
+ <td><b>1</b></td>
+ <td>The client makes a connection to the server.</td>
+ </tr>
+ <tr>
+ <td><b>2</b></td>
+
+ <td>The router redirects the connection to mitmproxy, which is
+ typically listening on a local port of the same host. Mitmproxy
+ then consults the routing mechanism to establish what the original
+ destination was.</td>
+ </tr>
+ <tr>
+ <td><b>3</b></td>
+
+ <td>Now, we simply read the client's request...</td>
+ </tr>
+
+ <tr>
+ <td><b>4</b></td>
+
+ <td>... and forward it upstream.</td>
+
+ </tr>
+ </tbody>
+</table>
+
+<div class="page-header">
+ <h1>Transparent HTTPS</h1>
+</div>
+
+The first step is to determine whether we should treat an incoming connection
+as HTTPS. The mechanism for doing this is simple - we use the routing mechanism
+to find out what the original destination port is. By default, we treat all
+traffic destined for ports 443 and 8443 as SSL.
+
+From here, the process is a merger of the methods we've described for
+transparently proxying HTTP, and explicitly proxying HTTPS. We use the routing
+mechanism to establish the upstream server address, and then proceed as for
+explicit HTTPS connections to establish the CN and SANs, and cope with SNI.
+
+<img src="transparent_https.png"/>
+
+
+<table class="table">
+ <tbody>
+ <tr>
+ <td><b>1</b></td>
+ <td>The client makes a connection to the server.</td>
+ </tr>
+ <tr>
+ <td><b>2</b></td>
+
+ <td>The router redirects the connection to mitmproxy, which is
+ typically listening on a local port of the same host. Mitmproxy
+ then consults the routing mechanism to establish what the original
+ destination was.</td>
+ </tr>
+ <tr>
+ <td><b>3</b></td>
+
+ <td>The client believes it's talking to the remote server, and
+ initiates the SSL connection. It uses SNI to indicate the hostname
+ it is connecting to.</td>
+ </tr>
+
+ <tr>
+ <td><b>4</b></td>
+
+ <td>Mitmproxy connects to the server, and establishes an SSL
+ connection using the SNI hostname indicated by the client.</td>
+
+ </tr>
+ <tr>
+ <td><b>5</b></td>
+
+ <td>The server responds with the matching SSL certificate, which
+ contains the CN and SAN values needed to generate the interception
+ certificate.</td>
+ </tr>
+ <tr>
+ <td><b>6</b></td>
+
+ <td>Mitmproxy generates the interception cert, and continues the
+ client SSL handshake paused in step 3.</td>
+ </tr>
+ <tr>
+ <td><b>7</b></td>
+
+ <td>The client sends the request over the established SSL
+ connection.</td>
+ </tr>
+ <tr>
+ <td><b>7</b></td>
+
+ <td>Mitmproxy passes the request on to the server over the SSL
+ connection initiated in step 4.</td>
+ </tr>
+ </tbody>
+</table>
+
+
+[^ssl]: I use "SSL" to refer to both SSL and TLS in the generic sense, unless otherwise specified.