IMAP: replace non-UTF-8 characters rather than aborting

Emails received may not be UTF-8. Following error was observed on a specific
mail:

Traceback (most recent call last):
  File "/home/tdescham/repo/offlineimap3/offlineimap/threadutil.py", line 146, in run
    Thread.run(self)
  File "/usr/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/tdescham/repo/offlineimap3/offlineimap/folder/Base.py", line 850, in copymessageto
    message = self.getmessage(uid)
  File "/home/tdescham/repo/offlineimap3/offlineimap/folder/IMAP.py", line 327, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/home/tdescham/repo/offlineimap3/offlineimap/folder/IMAP.py", line 844, in _fetch_from_imap
    ndata1 = data[0][1].decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 10177: invalid start byte

This completely aborted offlineimap3, blocking further mail reception.

Instead, use the 'replace' error strategy in Python:

    Replace with a suitable replacement character; Python will use the
    official U+FFFD REPLACEMENT CHARACTER for the built-in Unicode codecs on
    decoding and ‘?’ on encoding.
    https://docs.python.org/2/library/codecs.html#codec-base-classes
This commit is contained in:
Thomas De Schampheleire 2020-10-21 16:29:13 +02:00
parent 820e5c855f
commit 33e0efa163

View File

@ -839,7 +839,7 @@ class IMAPFolder(BaseFolder):
# Convert bytes to str # Convert bytes to str
ndata0 = data[0][0].decode('utf-8') ndata0 = data[0][0].decode('utf-8')
ndata1 = data[0][1].decode('utf-8') ndata1 = data[0][1].decode('utf-8', errors='replace')
ndata = [ndata0, ndata1] ndata = [ndata0, ndata1]
return ndata return ndata