From 2e325c38031bc88568dc065821722dd3e22259cb Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Fri, 10 Jan 2020 16:38:52 -0400 Subject: Initial commit of cloud_mdir_sync I have been using for a few months now with no ill effects. Signed-off-by: Jason Gunthorpe --- README.md | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 166 insertions(+) create mode 100644 README.md (limited to 'README.md') diff --git a/README.md b/README.md new file mode 100644 index 0000000..c160da2 --- /dev/null +++ b/README.md @@ -0,0 +1,166 @@ +# Cloud MailDir Sync + +This program will download a mailbox from the cloud into a local maildir, +monitor the local maildir for changes, then upload those changes back to the +cloud. + +It is intended to allow normal Linux MailDir based progams, such as mutt and +gnus, to work with modern cloud based email. + +There is much similarity to mbsync, but this program does not use IMAP for +manipulating email on the server. + +## Ideal Usage + +Although other use cases are possible, CMS was designed to support a 'Inbox +Zero' kind of workflow where email is read on a Linux laptop/desktop. It +supports multiple readers, including using the native cloud readers +concurrently. + +Although it will function, it has not been optimized for giant email boxes and +may not perform well. + +Currently it operates only in an 'online mode' where the daemon must be +running. Any local changes made to the mailboxes when the daemon is stopped +are discarded. + +# Microsoft Office365 Cloud Mailbox + +The motivating reason to create this program was to support email from +Office365 using modern OAUTH2 based authentication. Not only is the IMAP +service in Offic365 very poor, it currently does not support OAUTH2 and is +thus often blocked by IT departments. This often means there is no good way to +access email from a Linux systems. + +CMS's Office365 interface uses the [Microsoft Graph +REST](https://developer.microsoft.com/en-us/graph) interface over HTTP to +access the mailbox. Internally this uses a multi-connection/multi-threaded +approach that provides much better performance than the usual Office365 IMAP +service. + +There is limited support for push notifications for new email as the Graph +interface does not support any way for clients to get notifications. Instead +an old OWA REST interface is used to get notifications. + +Unlike IMAP, CMS is able to set the 'replied' flag in a way that shows up with +the other Outlook based clients. CMS is also able to set the 'ImmutableId' +flag which causes the server to provide long term stable IDs for the same +message. This avoids more cases where the messages have to be re-downloaded to +again match them to local messages. + +# Configuration + +A small configuration file, written in Python, is used to setup the mailboxes +to download. + +For instance, to synchronize a local MailDir from an Office 365 mail box use +the following `cms.cfg`: + +```Python +MailDir("~/mail/INBOX") +Office365("inbox", Office365_Account(user="user@domain.com")) +``` + +## Run from git + +CMS requires a fair number of Python modules from PyPI that are not commonly +available from distributions. It is thus recommended it run it from a Python +virtual environment. The included 'cloud-mdir-sync' script will automatically +create the required virtual environment with the needed packages downloaded +with pip and then run the program from within it. + +# OAUTH2 Authentication + +Most cloud providers are now using OAUTH2, and often also provide options to +disable simple password authentication. This is done in the name of security +as OAUTH is the standards based way to support various MFA schemes. However, +OAUTH requires an interactive Web Browser to authenticate. This is challanging +for a Linux environment. + +CMS implements this in what has become the common way for a command line +application. It provides an internal web server which interacts with the +browser to perform the OAUTH protocol. When interactive authentication is +required it automatically launches a browser window to handle it. As a public +application CMS uses the new OAUTH 2.0 Proof Key for Code Exchange (PKCE) +protocol with the Authorization Code Grant to avoid needing 'client secrets' +or special service configuration. + +The first time a user does this authentication they will be prompted to permit +the 'cloud-maildir-sync' application to access their mailbox, in the normal +way. + +Browsing to http://localhost:8080/ will trigger authentication redirects until +all required OAUTH tokens are authorized. Once completed the browser window +can be closed. + +## Interactive Authentication and Headless servers + +The simplest approach is to port foward localhost:8080 along with the ssh +session and point a browser window at the forwarded port. Note, for OAUTH to +work the URL cannot be changed, it must still be http://localhost:8080/ after +forwarding. + +At least Azure has a 'device authentication' approach that can be used for +command line applications, however it is not implemented in CMS. + +## Secrecy of OAUTH tokens + +The OAUTH exchange requests an 'offline_access' token which is a longer lived +token that can be refreshed. This token is sensitive information as it permits +access to the account until it expires. + +CMS can cache this token on disk, in encrypted format, to avoid +re-authentication challenges. However that is only done if a local keyring is +avaiable. The Python [keyring](https://pypi.org/project/keyring/) module is +used to store the encryption secret for OAUTH token storage. For Linux desktop +appications this will automatically use gnome-keyring. + +# General Operation + +CMS takes the approach that the cloud is the authoritative representation of +the mailbox. + +Upone startup it forces the local maildirs to match the cloud configration, +downloading any missing messages and deleting messages not present in the +cloud. + +Once completed it uses inotify to monitor changes in the MailDir and converts +them into REST operations for the cloud. + +After changes to the remote mailbox are completed the local maildirs are again +forced to match the cloud and take on any changes made on the server. + +## UID matching + +All mailbox schemes generate some kind of unique ID for each message. This is +not related to the Message-ID headers of the email. Matching two emails +together without having the contents of both is troublesome. + +Instead CMS uses the content hash of each message as the UID and maintains +caches for mapping each mailbox's unique UID scheme to the content hash. This +avoids having to re-download messages upon each startup. + +To eliminate races, and for general sanity, a directory containing hard links +to each message, organized by content hash, is maintained automatically. + +With this design the maildir files are never disturbed. Even if the cloud side +changes UIDs the content hash matching will keep the same filename for the +maildir after re-downloading the message. + +# Future Work/TODO +- Use delta queries on mailboxes with MS Graph. Delta queries allow + downloading only changed message meta-data and will accelerate polling of + large mailboxes. +- Implement an incremental JSON parser for GraphAPI.owa_get_notifications. + Currently push notifications only work for a single mailbox as there is no + way to determine which mailbox the notification was for unless the + incremental JSON generated by the long-lived connection is parsed. +- Support gmail. While gmail has a much better IMAP server than Offce365, it + is fairly straight forward to implement its version of a REST protocol to + give basically the same capability set. +- Provide some web-app on 'http://localhost:8080/'. CMS launches a web browser + using the Python webbrowser module to open a browser window on the URL, + however this is only functional for desktop cases. Ideally just having a + browser tab open to the URL would allow CMS to send some push notification + to trigger authentication cycles, avoiding the need to open a new browser. + This is probably essential for headless usage if token lifetimes are short. \ No newline at end of file -- cgit v1.2.3