aboutsummaryrefslogtreecommitdiffstats
path: root/doc-src/howmitmproxy.html
diff options
context:
space:
mode:
authorAldo Cortesi <aldo@nullcube.com>2013-05-14 22:44:11 +1200
committerAldo Cortesi <aldo@nullcube.com>2013-05-14 22:44:11 +1200
commit36b07264f0292660c8369a8416ee45a5f95f9b06 (patch)
treeb6d446ff5ea967153dab62624446862cb694ee8f /doc-src/howmitmproxy.html
parentb5cf3b4f743f1dd3e7d58c9d21155005466640ec (diff)
downloadmitmproxy-36b07264f0292660c8369a8416ee45a5f95f9b06.tar.gz
mitmproxy-36b07264f0292660c8369a8416ee45a5f95f9b06.tar.bz2
mitmproxy-36b07264f0292660c8369a8416ee45a5f95f9b06.zip
Mods to "How mitmproxy works"
Diffstat (limited to 'doc-src/howmitmproxy.html')
-rw-r--r--doc-src/howmitmproxy.html91
1 files changed, 74 insertions, 17 deletions
diff --git a/doc-src/howmitmproxy.html b/doc-src/howmitmproxy.html
index a95bdac6..09a69ec2 100644
--- a/doc-src/howmitmproxy.html
+++ b/doc-src/howmitmproxy.html
@@ -1,7 +1,8 @@
+
Mitmproxy is an enormously flexible tool. Knowing exactly how the proxying
-process works will help you deploy it creatively, and allow you to understand
-its fundamental assumptions and how to work around them. This document explains
+process works will help you deploy it creatively, and take into account its
+fundamental assumptions and how to work around them. This document explains
mitmproxy's proxy mechanism in detail, starting with the simplest unencrypted
explicit proxying, and working up to the most complicated interaction -
transparent proxying of SSL-protected traffic[^ssl] in the presence of
@@ -67,17 +68,47 @@ flow of requests and responses are completely opaque to the proxy.
## The MITM in mitmproxy
-This is where mitmproxy's fundamental trick comes into play. The MITM in its name stands for Man-In-The-Middle - a reference to the process we use to intercept and interfere with these theoretially opaque data streams. The basic idea is to pretend to be the server to the client, and pretend to be the client to the server, while we sit in the middle decoding traffic from both sides. The tricky part is that the [Certificate Authority](http://en.wikipedia.org/wiki/Certificate_authority) system is designed to prevent exactly this attack, by allowing a trusted third-party to cryptographically sign a server's SSL certificates to verify that they are legit. If this signature doesn't match or is from a non-trusted party, a secure client will simply drop the connection and refuse to proceed. Despite the many shortcomings of the CA system as it exists today, this is usually fatal to attempts to MITM an SSL connection for analysis. Our answer to this conundrum is to become a trusted Certificate Authority ourselves. Mitmproxy includes a full CA implementation that generates interception certificates on the fly. To get the client to trust these certificates, we [register mitmproxy as a trusted CA with the device manually](@!urlTo("ssl.html")!@).
+This is where mitmproxy's fundamental trick comes into play. The MITM in its
+name stands for Man-In-The-Middle - a reference to the process we use to
+intercept and interfere with these theoretically opaque data streams. The basic
+idea is to pretend to be the server to the client, and pretend to be the client
+to the server, while we sit in the middle decoding traffic from both sides. The
+tricky part is that the [Certificate
+Authority](http://en.wikipedia.org/wiki/Certificate_authority) system is
+designed to prevent exactly this attack, by allowing a trusted third-party to
+cryptographically sign a server's SSL certificates to verify that they are
+legit. If this signature doesn't match or is from a non-trusted party, a secure
+client will simply drop the connection and refuse to proceed. Despite the many
+shortcomings of the CA system as it exists today, this is usually fatal to
+attempts to MITM an SSL connection for analysis. Our answer to this conundrum
+is to become a trusted Certificate Authority ourselves. Mitmproxy includes a
+full CA implementation that generates interception certificates on the fly. To
+get the client to trust these certificates, we [register mitmproxy as a trusted
+CA with the device manually](@!urlTo("ssl.html")!@).
## Complication 1: What's the remote hostname?
-To proceed with this plan, we need to know the domain name to use in the interception certificate - the client will verify that the certificate is for the domain it's connecting to, and abort if this is not the case. At first blush, it seems that the CONNECT request above gives us all we need - in this example, both of these values are "example.com". But what if the client had initiated the connection as follows:
+To proceed with this plan, we need to know the domain name to use in the
+interception certificate - the client will verify that the certificate is for
+the domain it's connecting to, and abort if this is not the case. At first
+blush, it seems that the CONNECT request above gives us all we need - in this
+example, both of these values are "example.com". But what if the client had
+initiated the connection as follows:
<pre>CONNECT 10.1.1.1:443 HTTP/1.1</pre>
-Using the IP address is perfectly legitimate because it gives us enough information to initiate the pipe, even though it doesn't reveal the remote hostname.
+Using the IP address is perfectly legitimate because it gives us enough
+information to initiate the pipe, even though it doesn't reveal the remote
+hostname.
-Mitmproxy has a cunning mechanism that smooths this over - [upstream certificate sniffing](@!urlTo("features/upstreamcerts.html")!@). As soon as we see the CONNECT request, we pause the client part of the conversation, and initiate a simultaneous connection to the server. We complete the SSL handshake with the server, and inspect the certificates it used. Now, we use the Common Name in the upstream SSL certificates to generate the dummy certificate for the client. Voila, we have the correct hostname to present to the client, even if it was never specified.
+Mitmproxy has a cunning mechanism that smooths this over - [upstream
+certificate sniffing](@!urlTo("features/upstreamcerts.html")!@). As soon as we
+see the CONNECT request, we pause the client part of the conversation, and
+initiate a simultaneous connection to the server. We complete the SSL handshake
+with the server, and inspect the certificates it used. Now, we use the Common
+Name in the upstream SSL certificates to generate the dummy certificate for the
+client. Voila, we have the correct hostname to present to the client, even if
+it was never specified.
## Complication 2: Subject Alternative Name
@@ -87,7 +118,7 @@ fact, the hostname that the client is connecting to. This is because of the
optional [Subject Alternative
Name](http://en.wikipedia.org/wiki/SubjectAltName) field in the SSL certificate
that allows an arbitrary number of alternative domains to be specified. If the
-expected domain matches any of these, the client wil proceed, even though the
+expected domain matches any of these, the client will proceed, even though the
domain doesn't match the certificate Common Name. The answer here is simple:
when extract the CN from the upstream cert, we also extract the SANs, and add
them to the generated dummy certificate.
@@ -95,11 +126,33 @@ them to the generated dummy certificate.
## Complication 3: Server Name Indication
-One of the big limitations of vanilla SSL is that each certificate requires its own IP address. This means that you couldn't do virtual hosting where multiple domains with independent certificates share the same IP address. In a world with a rapidly shrinking IPv4 address pool this is a problem, and we have a solution in the form of the [Server Name Indication](http://en.wikipedia.org/wiki/Server_Name_Indication) extension to the SSL and TLS protocols. This lets the client specify the remote server name at the start of the SSL handshake, which then lets the server select the right certificate to complete the process.
-
-SNI breaks our upstream certificate sniffing process, because when we connect without using SNI, we get served a default certificate that may have nothing to do with the certificate expected by the client. The solution is another tricky complication to the client connection process. After the client connects, we allow the SSL handshake to continue until just _after_ the SNI value has been passed to us. Now we can pause the conversation, and initiate an upstream connection using the correct SNI value, which then serves us the correct upstream certificate, from which we can extract the expected CN and SANs.
-
-There's another wrinkle here. Due to a limitation of the SSL library mitmproxy uses, we can't detect that a connection _hasn't_ sent an SNI request until it's too late for upstream certificate sniffing. In practice, we therefore make a vanilla SSL connection upstream to sniff non-SNI certificates, and then discard the connection if the client sends an SNI notification. If you're watching your traffic with a packet sniffer, you'll see two connections to the server when an SNI request is made, the first of which is immediately closed after the SSL handshake. Luckily, this is almost never an issue in practice.
+One of the big limitations of vanilla SSL is that each certificate requires its
+own IP address. This means that you couldn't do virtual hosting where multiple
+domains with independent certificates share the same IP address. In a world
+with a rapidly shrinking IPv4 address pool this is a problem, and we have a
+solution in the form of the [Server Name
+Indication](http://en.wikipedia.org/wiki/Server_Name_Indication) extension to
+the SSL and TLS protocols. This lets the client specify the remote server name
+at the start of the SSL handshake, which then lets the server select the right
+certificate to complete the process.
+
+SNI breaks our upstream certificate sniffing process, because when we connect
+without using SNI, we get served a default certificate that may have nothing to
+do with the certificate expected by the client. The solution is another tricky
+complication to the client connection process. After the client connects, we
+allow the SSL handshake to continue until just _after_ the SNI value has been
+passed to us. Now we can pause the conversation, and initiate an upstream
+connection using the correct SNI value, which then serves us the correct
+upstream certificate, from which we can extract the expected CN and SANs.
+
+There's another wrinkle here. Due to a limitation of the SSL library mitmproxy
+uses, we can't detect that a connection _hasn't_ sent an SNI request until it's
+too late for upstream certificate sniffing. In practice, we therefore make a
+vanilla SSL connection upstream to sniff non-SNI certificates, and then discard
+the connection if the client sends an SNI notification. If you're watching your
+traffic with a packet sniffer, you'll see two connections to the server when an
+SNI request is made, the first of which is immediately closed after the SSL
+handshake. Luckily, this is almost never an issue in practice.
## Putting it all together
@@ -233,11 +286,15 @@ this information, the process is fairly straight-forward.
<h1>Transparent HTTPS</h1>
</div>
-The process for transparently proxying an HTTPS request is a merger of the
-methods we've outlined for transparently proxying HTTP, and explicitly proxying
-HTTPS. We use the routing mechanism to establish the upstream server address,
-and then proceed as for explit HTTPS connections to establish the CN and SANs,
-and cope with SNI.
+The first step is to determine whether we should treat an incoming connection
+as HTTPS. The mechanism for doing this is simple - we use the routing mechanism
+to find out what the original destination port is. By default, we treat all
+traffic destined for ports 443 and 8443 as SSL.
+
+From here, the process is a merger of the methods we've described for
+transparently proxying HTTP, and explicitly proxying HTTPS. We use the routing
+mechanism to establish the upstream server address, and then proceed as for
+explicit HTTPS connections to establish the CN and SANs, and cope with SNI.
<img src="transparent_https.png"/>