aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
blob: c160da2c66127304bb243a19ba59e194591d108f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
# Cloud MailDir Sync

This program will download a mailbox from the cloud into a local maildir,
monitor the local maildir for changes, then upload those changes back to the
cloud.

It is intended to allow normal Linux MailDir based progams, such as mutt and
gnus, to work with modern cloud based email.

There is much similarity to mbsync, but this program does not use IMAP for
manipulating email on the server.

## Ideal Usage

Although other use cases are possible, CMS was designed to support a 'Inbox
Zero' kind of workflow where email is read on a Linux laptop/desktop. It
supports multiple readers, including using the native cloud readers
concurrently.

Although it will function, it has not been optimized for giant email boxes and
may not perform well.

Currently it operates only in an 'online mode' where the daemon must be
running. Any local changes made to the mailboxes when the daemon is stopped
are discarded.

# Microsoft Office365 Cloud Mailbox

The motivating reason to create this program was to support email from
Office365 using modern OAUTH2 based authentication. Not only is the IMAP
service in Offic365 very poor, it currently does not support OAUTH2 and is
thus often blocked by IT departments. This often means there is no good way to
access email from a Linux systems.

CMS's Office365 interface uses the [Microsoft Graph
REST](https://developer.microsoft.com/en-us/graph) interface over HTTP to
access the mailbox. Internally this uses a multi-connection/multi-threaded
approach that provides much better performance than the usual Office365 IMAP
service.

There is limited support for push notifications for new email as the Graph
interface does not support any way for clients to get notifications. Instead
an old OWA REST interface is used to get notifications.

Unlike IMAP, CMS is able to set the 'replied' flag in a way that shows up with
the other Outlook based clients. CMS is also able to set the 'ImmutableId'
flag which causes the server to provide long term stable IDs for the same
message. This avoids more cases where the messages have to be re-downloaded to
again match them to local messages.

# Configuration

A small configuration file, written in Python, is used to setup the mailboxes
to download.

For instance, to synchronize a local MailDir from an Office 365 mail box use
the following `cms.cfg`:

```Python
MailDir("~/mail/INBOX")
Office365("inbox", Office365_Account(user="user@domain.com"))
```

## Run from git

CMS requires a fair number of Python modules from PyPI that are not commonly
available from distributions. It is thus recommended it run it from a Python
virtual environment. The included 'cloud-mdir-sync' script will automatically
create the required virtual environment with the needed packages downloaded
with pip and then run the program from within it.

# OAUTH2 Authentication

Most cloud providers are now using OAUTH2, and often also provide options to
disable simple password authentication. This is done in the name of security
as OAUTH is the standards based way to support various MFA schemes. However,
OAUTH requires an interactive Web Browser to authenticate. This is challanging
for a Linux environment.

CMS implements this in what has become the common way for a command line
application. It provides an internal web server which interacts with the
browser to perform the OAUTH protocol. When interactive authentication is
required it automatically launches a browser window to handle it. As a public
application CMS uses the new OAUTH 2.0 Proof Key for Code Exchange (PKCE)
protocol with the Authorization Code Grant to avoid needing 'client secrets'
or special service configuration.

The first time a user does this authentication they will be prompted to permit
the 'cloud-maildir-sync' application to access their mailbox, in the normal
way.

Browsing to http://localhost:8080/ will trigger authentication redirects until
all required OAUTH tokens are authorized. Once completed the browser window
can be closed.

## Interactive Authentication and Headless servers

The simplest approach is to port foward localhost:8080 along with the ssh
session and point a browser window at the forwarded port. Note, for OAUTH to
work the URL cannot be changed, it must still be http://localhost:8080/ after
forwarding.

At least Azure has a 'device authentication' approach that can be used for
command line applications, however it is not implemented in CMS.

## Secrecy of OAUTH tokens

The OAUTH exchange requests an 'offline_access' token which is a longer lived
token that can be refreshed. This token is sensitive information as it permits
access to the account until it expires.

CMS can cache this token on disk, in encrypted format, to avoid
re-authentication challenges. However that is only done if a local keyring is
avaiable. The Python [keyring](https://pypi.org/project/keyring/) module is
used to store the encryption secret for OAUTH token storage. For Linux desktop
appications this will automatically use gnome-keyring.

# General Operation

CMS takes the approach that the cloud is the authoritative representation of
the mailbox.

Upone startup it forces the local maildirs to match the cloud configration,
downloading any missing messages and deleting messages not present in the
cloud.

Once completed it uses inotify to monitor changes in the MailDir and converts
them into REST operations for the cloud.

After changes to the remote mailbox are completed the local maildirs are again
forced to match the cloud and take on any changes made on the server.

## UID matching

All mailbox schemes generate some kind of unique ID for each message. This is
not related to the Message-ID headers of the email. Matching two emails
together without having the contents of both is troublesome.

Instead CMS uses the content hash of each message as the UID and maintains
caches for mapping each mailbox's unique UID scheme to the content hash. This
avoids having to re-download messages upon each startup.

To eliminate races, and for general sanity, a directory containing hard links
to each message, organized by content hash, is maintained automatically.

With this design the maildir files are never disturbed. Even if the cloud side
changes UIDs the content hash matching will keep the same filename for the
maildir after re-downloading the message.

# Future Work/TODO
- Use delta queries on mailboxes with MS Graph. Delta queries allow
  downloading only changed message meta-data and will accelerate polling of
  large mailboxes.
- Implement an incremental JSON parser for GraphAPI.owa_get_notifications.
  Currently push notifications only work for a single mailbox as there is no
  way to determine which mailbox the notification was for unless the
  incremental JSON generated by the long-lived connection is parsed.
- Support gmail. While gmail has a much better IMAP server than Offce365, it
  is fairly straight forward to implement its version of a REST protocol to
  give basically the same capability set.
- Provide some web-app on 'http://localhost:8080/'. CMS launches a web browser
  using the Python webbrowser module to open a browser window on the URL,
  however this is only functional for desktop cases. Ideally just having a
  browser tab open to the URL would allow CMS to send some push notification
  to trigger authentication cycles, avoiding the need to open a new browser.
  This is probably essential for headless usage if token lifetimes are short.