From f5e49ef598f46257cc783e52ef4223a3461f1d84 Mon Sep 17 00:00:00 2001 From: Aldo Cortesi Date: Thu, 3 Jan 2013 17:26:59 +1300 Subject: First draft of "How mitmproxy works", a complete guide to the mechanics of the proxy process --- doc-src/howmitmproxy.html | 341 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 341 insertions(+) create mode 100644 doc-src/howmitmproxy.html (limited to 'doc-src/howmitmproxy.html') diff --git a/doc-src/howmitmproxy.html b/doc-src/howmitmproxy.html new file mode 100644 index 00000000..6ea723cd --- /dev/null +++ b/doc-src/howmitmproxy.html @@ -0,0 +1,341 @@ + +TODO: + +- Clarify terminology: SSL vs TLS + + +Mitmproxy is an enormously flexible tool. Knowing exactly how the proxying +process works will help you deploy it more creatively, and let you understand +its fundamental assumptions and how to work around them. This document explains +mitmproxy's proxy mechanism by example, starting with the simplest explicit +proxy configuration, and working up to the most complicated interaction - +transparent proxying of SSL-protected traffic in the presence of SNI. + + + + +Configuring the client to use mitmproxy as an explicit proxy is the simplest +and most reliable way to intercept traffic. The proxy protocol is codified in +the [HTTP RFC](http://www.ietf.org/rfc/rfc2068.txt), so the behaviour of both +the client and the server is well defined, and usually reliable. In the +simplest possible interaction with mitmproxy, a client connects directly to the +proxy, and makes a request that looks like this: + +
GET http://example.com/index.html HTTP/1.1
+ +This is a proxy GET request - an extended form of the vanilla HTTP GET request +that includes a schema and host specification, and it includes all the +information mitmproxy needs to proceed. + + + + + + + + + + + + + + + + + + + + + +
1The client connects to the proxy and makes a request.
2Mitmproxy connects to the upstream server and simply forwards + the request on.
+ + + + +The process for an explicitly proxied HTTPS connection is quite different. The +client connects to the proxy and makes a request that looks like this: + +
CONNECT example.com:443 HTTP/1.1
+ +A conventional proxy can neither view nor manipulate an SSL-encrypted data +stream, so a CONNECT request simply asks the proxy to open a pipe between the +client and server. The proxy here is just a facilitator - it blindly forwards +data in both directions without knowing anything about the contents. The +negotiation of the SSL connection happens over this pipe, and the subsequent +flow of requests and responses are completely opaque to the proxy. + +## The MITM in mitmproxy + +This is where mitmproxy's fundamental trick comes in to play. The MITM in its +name stands for Man-In-The-Middle - a reference to the process we use to +intercept and interfere with these theoretially opaque data streams. The basic +idea is to pretend to be the server to the client, and pretend to be the client +to the server. The tricky part is that the Certificate Authority system is +designed to prevent exactly this attack, by allowing a trusted third-party to +cryptographically sign a server's SSL certificates to verify that the certs are +legit. If this signature is from a non-trusted party, a secure client will +simply drop the connection and refuse to proceed. Despite the many shortcomings +of the CA system as it exists today, this is usually fatal to attempts to MITM +an SSL connection for analysis. + +Our answer to this conundrum is to become a trusted Certificate Authority +ourselves. Mitmproxy includes a full CA implementation that generates +interception certificates on the fly. To get the client to trust these +certificates, we register mitmproxy as a CA with the device manually. + +## Complication 1: What's the remote hostname? + +To proceed with this plan, we need to know the domain name to use in the +interception certificate - the client will verify that the certificate is for +the domain it's connecting to, and abort if this is not the case. At first +blush, it seems that the CONNECT request above gives us all we need - in this +example, both of these values are "example.com". But what if the client had +initiated the connection as follows: + +
CONNECT 10.1.1.1:443 HTTP/1.1
+ +Using the IP address is perfectly legitimate because it gives us enough +information to initiate the pipe, even though it doesn't reveal the remote +hostname. + +Mitmproxy has a cunning mechanism that smooths this over - upstream certificate +sniffing. As soon as we see the CONNECT request, we pause the client part of +the conversation, and initiate a simultaneous connection to the server. We +complete the SSL handshake with the server, and inspect the certificates it +used. Now, we use the Common Name in the upstream SSL certificates to generate +the dummy certificate for the client. Voila, we have the correct hostname to +present to the client, even if it was never specified. + + +## Complication 2: Subject Alternate Name + +Enter the next complication. Sometimes, the certificate Common Name is not, in +fact, the hostname that the client is connecting to. This is because of the +optional Subject Alternate Name field in the SSL certificate that allows an +arbitrary number of alternate domains to be specified. If the expected domain +matches any of these, the client wil proceed, even though the domain doesn't +match the certificate Common Name. The answer here is simple: when extract the +CN from the upstream cert, we also extract the SANs, and add them to the +generated dummy certificate. + + +## Complication 3: Server Name Indication + +One of the big limitations of conventional SSL is that each certificate +requires its own IP address. This means that you couldn't do virtual hosting +where multiple domains with independent certificates share the same IP address. +In a world with a rapidly shrinking IPv4 address pool this is a problem, and we +have a solution in the form of the Server Name Indication extension to the SSL +and TLS protocols. This lets the client specify the remote server name at the +start of the SSL handshake, which then lets the server select the right +certificate to complete the process. + +SNI breaks our upstream certificate sniffing process, because when we connect +without using SNI, we get served a default certificate that may have nothing to +do with the certificate expected by the client. The solution is another tricky +complication to the client connection process. After the client connects, we +allow the SSL handshake to continue until just _after_ the SNI value has been +passed to us. Now we can pause the conversation, and initiate an upstream +connection using the correct SNI value, which then serves us the correct +upstream certificate, from which we can extract the expected CN and SANs. + + +## Putting it all together + +Lets put all of this together into the complete explicitly proxied HTTPS flow. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
1The client makes a connection to mitmproxy, and issues an HTTP + CONNECT request.
2Mitmproxy responds with a 200 Connection Established, as if it + has set up the CONNECT pipe.
3The client believes it's talking to the remote server, and + initiates the SSL connection. It uses SNI to indicate the hostname + it is connecting to.
4Mitmproxy connects to the server, and establishes an SSL + connection using the SNI hostname indicated by the client.
5The server responds with the matching SSL certificate, which + contains the CN and SAN values needed to generate the interception + certificate.
6Mitmproxy generates the interception cert, and continues the + client SSL handshake paused in step 3.
7The client sends the request over the established SSL + connection.
7Mitmproxy passes the request on to the server over the SSL + connection initiated in step 4.
+ + + + +When a transparent proxy is used, the HTTP/S connection is redirected into a +proxy at the network layer, without any client configuration being required. +This makes transparent proxying ideal for those situations where you can't +change client behaviour - proxy-oblivious Android applications being a common +example. + +To achieve this, we need to introduce two extra components. The first new +component is a router that transparently redirects the TCP connection to the +proxy. Once the client has initiated the connection, it makes a vanilla HTTP +request, which might look something like this: + +
GET /index.html HTTP/1.1
+ +Note that this request differs from the explicit proxy variation, in that it +omits the scheme and hostname. How, then, do we know which upstream host to +forward the request to? The routing mechanism that has performed the +redirection keeps track of the original destination. Each different routing +mechanism has its own ideosyncratic way of exposing this data, so this +introduces the second component required for working transparent proxying: a +host module that knows how to retrieve the original destination address from +the router. Once we have this information, the process is fairly +straight-forward. + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
1The client makes a connection to the server.
2The router redirects the connection to mitmproxy, which is + typically listening on a local port of the same host. Mitmproxy + then consults the routing mechanism to establish what the original + destination was.
3Now, we simply read the client's request...
4... and forward it upstream.
+ + + +The process for transparently proxying an HTTPS request is a merger of the +methods we've outlined for transparently proxying HTTP, and explicitly proxying +HTTPS. We use the routing mechanism to establish the upstream server address, +and then proceed as for explit HTTPS connections to establish the CN and SANs, +and cope with SNI. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
1The client makes a connection to the server.
2The router redirects the connection to mitmproxy, which is + typically listening on a local port of the same host. Mitmproxy + then consults the routing mechanism to establish what the original + destination was.
3The client believes it's talking to the remote server, and + initiates the SSL connection. It uses SNI to indicate the hostname + it is connecting to.
4Mitmproxy connects to the server, and establishes an SSL + connection using the SNI hostname indicated by the client.
5The server responds with the matching SSL certificate, which + contains the CN and SAN values needed to generate the interception + certificate.
6Mitmproxy generates the interception cert, and continues the + client SSL handshake paused in step 3.
7The client sends the request over the established SSL + connection.
7Mitmproxy passes the request on to the server over the SSL + connection initiated in step 4.
+ + + -- cgit v1.2.3