Bug getting i18n'ed attachment filenames (RFC2231)

Bug #1060951 reported by Aurélien Bompard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
High
Barry Warsaw

Bug Description

RFC 2231 allows filenames to have non-ascii characters. The get_filename() method in Python's Message class handles this by calling email.utils.collapse_rfc2231_value() at the end of get_filename. This method returns the filename in Unicode.

This fails in Mailman because the mailman.email.message.Message class has a wrapper around get() and __getitem__() to return unicode headers. As a result, the collapse_rfc2231_value() tries to transforms into unicode an already unicode string, and I get the following exception:

  File "/usr/lib/python2.7/email/utils.py", line 319, in collapse_rfc2231_value
    return unicode(rawval, charset, errors)
TypeError: decoding Unicode is not supported

A possible solution to this would be to make Mailman's Message get_filename() method be more than just an exception-catching wrapper, and re-implement the original get_filename() method, inserting a conversion to str before calling collapse_rfc2231_value().

Does this make sense ? Any other idea for a possible solution ?

Tags: mailman3

Related branches

Barry Warsaw (barry)
tags: added: mailman3
Revision history for this message
Aurélien Bompard (abompard) wrote :

See the TestMessageSubclass testcase I've added to the attached testsuite for a way to reproduce it.
It's actually a little harder that I first thought, encoding the filename in the middle of the method is not enough.

Revision history for this message
Mark Sapiro (msapiro) wrote :

This works for me with Mailman 2.1.15 and email 4.0.1. Does it fail for you with Mailman 2.1.x? If so, what Mailman and email versions?

[msapiro@MSAPIRO ~]$ python
Python 2.6.5 (r265:79063, Jun 12 2010, 17:07:01)
[GCC 4.3.4 20090804 (release) 1] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>> email.__version__
'4.0.1'
>>> import sys
>>> sys.path.insert('/cygdrive/f/test-mailman/')
>>> from Mailman import Message
>>> msg = email.message_from_string("""Message-ID: <email address hidden>
... Content-Type: multipart/mixed; boundary="------------050607040206050605060208"
...
... This is a multi-part message in MIME format.
... --------------050607040206050605060208
... Content-Type: text/plain; charset=UTF-8
... Content-Transfer-Encoding: quoted-printable
...
... Test message containing an attachment with an accented filename
...
... --------------050607040206050605060208
... Content-Type: text/plain; charset=UTF-8;
... name="=?UTF-8?B?dG9kby1kw6lqZXVuZXIudHh0?="
... Content-Transfer-Encoding: base64
... Content-Disposition: attachment;
... filename*=UTF-8''%74%6F%64%6F%2D%64%C3%A9%6A%65%75%6E%65%72%2E%74%78%74
...
... VmlhbmRlCk1lbnRoZQpQYWluClZpbgoKQ3Vpc2luZTogcHLDqXBhcmVyIGwnYXDDqXJvLCBj
... b3VwZXIgZXQgZmFpcmUgcmlzc29sZXIgbGVzIHBhdGF0ZXMsIGV0IGZhaXJlIGxlcyBjb29r
... aWVzCg==
... --------------050607040206050605060208--
... """, Message.Message)
>>> msg
From nobody Wed Oct 3 08:43:13 2012
Message-ID: <email address hidden>
Content-Type: multipart/mixed; boundary="------------050607040206050605060208"

This is a multi-part message in MIME format.
--------------050607040206050605060208
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Test message containing an attachment with an accented filename

--------------050607040206050605060208
Content-Type: text/plain; charset=UTF-8;
        name="=?UTF-8?B?dG9kby1kw6lqZXVuZXIudHh0?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
        filename*=UTF-8''%74%6F%64%6F%2D%64%C3%A9%6A%65%75%6E%65%72%2E%74%78%74

VmlhbmRlCk1lbnRoZQpQYWluClZpbgoKQ3Vpc2luZTogcHLDqXBhcmVyIGwnYXDDqXJvLCBj
b3VwZXIgZXQgZmFpcmUgcmlzc29sZXIgbGVzIHBhdGF0ZXMsIGV0IGZhaXJlIGxlcyBjb29r
aWVzCg==
--------------050607040206050605060208--

>>> att = msg.get_payload()[1]
>>> att
From nobody Wed Oct 3 08:43:44 2012
Content-Type: text/plain; charset=UTF-8;
        name="=?UTF-8?B?dG9kby1kw6lqZXVuZXIudHh0?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
        filename*=UTF-8''%74%6F%64%6F%2D%64%C3%A9%6A%65%75%6E%65%72%2E%74%78%74

VmlhbmRlCk1lbnRoZQpQYWluClZpbgoKQ3Vpc2luZTogcHLDqXBhcmVyIGwnYXDDqXJvLCBj
b3VwZXIgZXQgZmFpcmUgcmlzc29sZXIgbGVzIHBhdGF0ZXMsIGV0IGZhaXJlIGxlcyBjb29r
aWVzCg==
>>> att.get_filename()
u'todo-d\xe9jeuner.txt'

Revision history for this message
Aurélien Bompard (abompard) wrote :

Sorry, I should have written it : it's with Mailman 3 HEAD.

Barry Warsaw (barry)
no longer affects: mailman/2.1
no longer affects: mailman/3.0
Barry Warsaw (barry)
Changed in mailman:
milestone: none → 3.0.0b5
assignee: nobody → Barry Warsaw (barry)
importance: Undecided → High
status: New → Fix Committed
Barry Warsaw (barry)
Changed in mailman:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.