aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md166
1 files changed, 166 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..c160da2
--- /dev/null
+++ b/README.md
@@ -0,0 +1,166 @@
+# Cloud MailDir Sync
+
+This program will download a mailbox from the cloud into a local maildir,
+monitor the local maildir for changes, then upload those changes back to the
+cloud.
+
+It is intended to allow normal Linux MailDir based progams, such as mutt and
+gnus, to work with modern cloud based email.
+
+There is much similarity to mbsync, but this program does not use IMAP for
+manipulating email on the server.
+
+## Ideal Usage
+
+Although other use cases are possible, CMS was designed to support a 'Inbox
+Zero' kind of workflow where email is read on a Linux laptop/desktop. It
+supports multiple readers, including using the native cloud readers
+concurrently.
+
+Although it will function, it has not been optimized for giant email boxes and
+may not perform well.
+
+Currently it operates only in an 'online mode' where the daemon must be
+running. Any local changes made to the mailboxes when the daemon is stopped
+are discarded.
+
+# Microsoft Office365 Cloud Mailbox
+
+The motivating reason to create this program was to support email from
+Office365 using modern OAUTH2 based authentication. Not only is the IMAP
+service in Offic365 very poor, it currently does not support OAUTH2 and is
+thus often blocked by IT departments. This often means there is no good way to
+access email from a Linux systems.
+
+CMS's Office365 interface uses the [Microsoft Graph
+REST](https://developer.microsoft.com/en-us/graph) interface over HTTP to
+access the mailbox. Internally this uses a multi-connection/multi-threaded
+approach that provides much better performance than the usual Office365 IMAP
+service.
+
+There is limited support for push notifications for new email as the Graph
+interface does not support any way for clients to get notifications. Instead
+an old OWA REST interface is used to get notifications.
+
+Unlike IMAP, CMS is able to set the 'replied' flag in a way that shows up with
+the other Outlook based clients. CMS is also able to set the 'ImmutableId'
+flag which causes the server to provide long term stable IDs for the same
+message. This avoids more cases where the messages have to be re-downloaded to
+again match them to local messages.
+
+# Configuration
+
+A small configuration file, written in Python, is used to setup the mailboxes
+to download.
+
+For instance, to synchronize a local MailDir from an Office 365 mail box use
+the following `cms.cfg`:
+
+```Python
+MailDir("~/mail/INBOX")
+Office365("inbox", Office365_Account(user="user@domain.com"))
+```
+
+## Run from git
+
+CMS requires a fair number of Python modules from PyPI that are not commonly
+available from distributions. It is thus recommended it run it from a Python
+virtual environment. The included 'cloud-mdir-sync' script will automatically
+create the required virtual environment with the needed packages downloaded
+with pip and then run the program from within it.
+
+# OAUTH2 Authentication
+
+Most cloud providers are now using OAUTH2, and often also provide options to
+disable simple password authentication. This is done in the name of security
+as OAUTH is the standards based way to support various MFA schemes. However,
+OAUTH requires an interactive Web Browser to authenticate. This is challanging
+for a Linux environment.
+
+CMS implements this in what has become the common way for a command line
+application. It provides an internal web server which interacts with the
+browser to perform the OAUTH protocol. When interactive authentication is
+required it automatically launches a browser window to handle it. As a public
+application CMS uses the new OAUTH 2.0 Proof Key for Code Exchange (PKCE)
+protocol with the Authorization Code Grant to avoid needing 'client secrets'
+or special service configuration.
+
+The first time a user does this authentication they will be prompted to permit
+the 'cloud-maildir-sync' application to access their mailbox, in the normal
+way.
+
+Browsing to http://localhost:8080/ will trigger authentication redirects until
+all required OAUTH tokens are authorized. Once completed the browser window
+can be closed.
+
+## Interactive Authentication and Headless servers
+
+The simplest approach is to port foward localhost:8080 along with the ssh
+session and point a browser window at the forwarded port. Note, for OAUTH to
+work the URL cannot be changed, it must still be http://localhost:8080/ after
+forwarding.
+
+At least Azure has a 'device authentication' approach that can be used for
+command line applications, however it is not implemented in CMS.
+
+## Secrecy of OAUTH tokens
+
+The OAUTH exchange requests an 'offline_access' token which is a longer lived
+token that can be refreshed. This token is sensitive information as it permits
+access to the account until it expires.
+
+CMS can cache this token on disk, in encrypted format, to avoid
+re-authentication challenges. However that is only done if a local keyring is
+avaiable. The Python [keyring](https://pypi.org/project/keyring/) module is
+used to store the encryption secret for OAUTH token storage. For Linux desktop
+appications this will automatically use gnome-keyring.
+
+# General Operation
+
+CMS takes the approach that the cloud is the authoritative representation of
+the mailbox.
+
+Upone startup it forces the local maildirs to match the cloud configration,
+downloading any missing messages and deleting messages not present in the
+cloud.
+
+Once completed it uses inotify to monitor changes in the MailDir and converts
+them into REST operations for the cloud.
+
+After changes to the remote mailbox are completed the local maildirs are again
+forced to match the cloud and take on any changes made on the server.
+
+## UID matching
+
+All mailbox schemes generate some kind of unique ID for each message. This is
+not related to the Message-ID headers of the email. Matching two emails
+together without having the contents of both is troublesome.
+
+Instead CMS uses the content hash of each message as the UID and maintains
+caches for mapping each mailbox's unique UID scheme to the content hash. This
+avoids having to re-download messages upon each startup.
+
+To eliminate races, and for general sanity, a directory containing hard links
+to each message, organized by content hash, is maintained automatically.
+
+With this design the maildir files are never disturbed. Even if the cloud side
+changes UIDs the content hash matching will keep the same filename for the
+maildir after re-downloading the message.
+
+# Future Work/TODO
+- Use delta queries on mailboxes with MS Graph. Delta queries allow
+ downloading only changed message meta-data and will accelerate polling of
+ large mailboxes.
+- Implement an incremental JSON parser for GraphAPI.owa_get_notifications.
+ Currently push notifications only work for a single mailbox as there is no
+ way to determine which mailbox the notification was for unless the
+ incremental JSON generated by the long-lived connection is parsed.
+- Support gmail. While gmail has a much better IMAP server than Offce365, it
+ is fairly straight forward to implement its version of a REST protocol to
+ give basically the same capability set.
+- Provide some web-app on 'http://localhost:8080/'. CMS launches a web browser
+ using the Python webbrowser module to open a browser window on the URL,
+ however this is only functional for desktop cases. Ideally just having a
+ browser tab open to the URL would allow CMS to send some push notification
+ to trigger authentication cycles, avoiding the need to open a new browser.
+ This is probably essential for headless usage if token lifetimes are short. \ No newline at end of file