Multipurpose Internet Mail Extensions

22.11.1998
Juha Räsänen
Deparment of Computer Science and Engineering
Helsinki University of Technology
Juha.Rasanen@hut.fi

Abstract

A ten years ago electronic mails were only ASCII based text. The message syntax for emails was defined in RFC 822, published in August 1982 by David H. Crocker. Message consists of message headers and body or content and message body was left as flat ASCII text. Though it was possibly to include binary files that have been translated into additional ASCII via text a scheme like uuencode to message body. Nowadays people want to send much more complex mails via email. MIME extensions were defined to overcome these requirements for more complex and richer emails.

History

The first steps to extend emails were defined 1985 in RFC 934. This document proposed a standard for message encapsulation when replying and forwarding messages [3]. After few years, in 1988 a content-type header for email was proposed in RFC 1049, which supported message contents like postscript and troff [4]. Two years later RFC 1154 proposed encoding header field to be used in email that permitted multi-structural messages [5]. This was highly experimental document at that time.

In June 1992 were introduced RFCs 1341 and 1342 that can be considered the first version of MIME that we are today aware of. They introduced extensions for images, audio and general application, encoding schemes that are used today and representation of non-ascii data. These documents were refined and expanded two years later in RFCs 1521, 1522 and 1523. The last RFC talked only about enriched text in MIME. Two years ago, December 1996 MIME related RFCs were reworked once again and this time into a group of five RFCs 2045 through 2049. After these documents more extensions to MIME such as security has been proposed, but they have remained in separate documents.

MIME message structure

MIME was designed to be in compliance with RFC 822. New introduces message headers are themselves consistent with message header syntax defined in RFC 822. In fact, RFC 822 specifically states that unrecognized message headers should be ignored [2]. Therefore Mail User Agent should be able to receive MIME messages, although some of data is not understood.

MIME allows to create composite messages with one or more subparts, each of which can contain subparts. There is no limit to number of nested message parts in message. Each subpart is separated with a MIME boundary and has headers similar but not identical to the mail message headers. MIME defines a number of new header fields in compliance with RFC822 header syntax. They are used to describe the content of MIME message. MIME specific header can occur in at least two contexts:

  1. top level message headers
  2. subpart messages in multipart messages.
Some of the MIME header fields are mandatory, some optional.

MIME-Version header field

There has to be a way to recognize a message with MIME-format. For this purpose MIME-Version header field has been defined. It declared the version of Internet message body standard that is used. A formal Backus-Naus-Form (BNF) for MIME-Version header field is as follows:

version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT

Content-Type header field

The content-type header field describes the content of message body that it refers to. The nature of data is specified by giving a media type and subtype identifiers and providing some auxiliary information that may be required for certain media types [7]. After media type and subtype identifier may follow a set of parameters, specified in an attribute=value notation. Order of parameters is not significant.

Media type is generally used to identify the type of data in top level, while subtype specified a specific format for that type of data [7]. For example "image/jpeg" tells to the user agent that the data is a image with jpeg format. As you can see media type mechanism has been designed to be extensible and today there is a great number of officially approved media types and subtypes. The list of official content types is maintained by Internet Assigned Numbers Authority (IANA) [8]. Initially seven media types were defined, of which five refered to discrete message bodies and two composite message bodies [7]. Discrete entities are handled using non-MIME mechanisms and composite entities using MIME mechanisms and they are usually handled directly.

Text Media Type

Text media type refers to textual information context. The subtype "plain" indicates just plain text like US-ASCII [7]. This is the most common text subtype for sending emails. Other subtypes can include any word processor format that can be somehow understood without any special software. For example Microsoft Word document don't meet the criteria, but HTML-formatted or enriched text are appropriate. Associated with subtype "plain" there is usually parameter indicating character set used to generate plain text. Officially supported character sets are US-ASCII and ISO-8859-1 through ISO-8859-9. UNICODE is not supported as itself, because of it's 16-bit characters. But there are schemes to encode UNICODE content into 7- or 8-bit characters. The content-type header field might look like this:

Content-type: plain/text; charset="us-ascii"

Image Media Type

The media type "image" indicates the presence of still image. The subtype refers to a specific image format (e.g., JPEG or GIF). Due the nature of image formats, a special application such as an image viewer is required to view the contents of image. Usually user agents do not include a viewer application. Table 1. shows the most commonly used image formats that has been registered by IANA.

Format

Description

JPEG

JPEG using JFIF encoding

GIF

Graphics Interchange Format

IEF

Image Exchange Format

G3FAX

Group 3 facsimile

TIFF

Tagged Image File Format

CGM

Computer Graphics Metafile

PNG

Portable Network Graphics

Table 1. Seven image formats [1]

The content-type header field might look like this:

Content-type: image/gif; name="puffy.gif"

Audio Media Type

Audio media type is content type for digitized audio (e.g., human speech or a soundtrack). Since there is no consensus on an ideal audio format for computers, the initial subtype "basic" is specified to meet the requirements by providing the lowest common audio format. This format uses single-channel 8-bit pulse coded modulation (PCM) with a 8000-Hz sample rate [1]. It is hoped that future RFCs will standardize subtypes suitable for music up to CD quality.

Video Media Type

This media type indicates the presence of moving images, possibly with color and sound. Currently defined subtypes include MPEG and Quicktime [1]. The idea of video media type is not to refer ant particular technology or format or preclude other subtypes encoded compactly.

Application Media Type

The application media type is to be used for data which do not fit in any other categories and particularly for data files useful to some application programs (e.g., Microsoft Word or Excel). The contained information must be processed by an application before it can be viewed. Initially two subtypes were defined; octet-stream and postscript [7]. Octet-stream subtype is used to indicate a basic stream of raw binary data. All unrecognized subtypes must be treates as being equivalent to "application/octet-stream" subtype. This media type is usually associated with parameter indicating the name of application file. The content-type header field might like look like this:

Content-type: application/octet-stream; name="format.exe"

Multipart Media Type

The "multipart" content types allow one to more subparts in a single message body. This media type can appear only in message main headers and requires one parameter, boundary. Each of message subparts are delimited by a boundary delimiter line which is unique to message data. After its boundary delimiter line, each subpart has its own headers to specify details about that subpart and body. Thus a message subpart is similar to an RFC 822 message, but is not to be interpreted as actually being one. A typical "multipart" content-type header field might look like this:

Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08j34c0p

And associated boundary delimiter line is:

--gc0p4Jq0M2Yt08j34c0p

In composite message the various subparts do not (in general) all have to be of the same content type. These content types are as follows.

Message Media Type

This media type was defined to facilitate the encapsulation of another mail message. It has been suggested that "message" subtypes are defined for forwarded and rejected messages [7]. Although they can be handled as multipart messages. Currently there are three defined subtypes.

Content-Transfer-Encoding header field

Several media types presented here are represented in their natural format, as 8-bit character or binary data. Such data cannot be transmtter over some transfer protocols, because they restrict messages to 7-bit data with possibly maximum line length (e.g, SMTP). For this purpose RFC 2045 defines header field that specifies what encoding transformation must be used to restore the body to its original form and what the domain of the result is. Three encoding transformations are currently defined: identity, "quoted-printable" encoding and "base64" encoding [6].

Identity

Identity serves simply as an indicator of the domain of the body data and provides information what sort of encoding might be needed for transmission. The domains are "7bit", "8bit" and "binary". [6]

Quoted-printable

This encoding transformation allows 8-bit characters to be represented by 7-bit characters using a "quoting" mechanism. Any 8-bit character can be represented by a group of three 7-bit character; an equals sign and two hexadecimal digits representing any value between 0 and 255 decimal. If data being encoded is mostly US-ASCII, the encoded form of data. remains largely readable to human and is in this case more efficient that base64 encoding.

Base64

Base64 encoding scheme is designed to represent arbitrary octet sequence in a form that need not to be readable for human. Encoding transformation uses a 65-character subset of US-ASCII to represent three 8-bit character in a group of four 6-bit characters from transformation table. Reverse algorithm is used when decoding data. This encoding transformation is more efficient than "quoted-printable" encoding, if encoded data is general binary data .

Content-ID header field

"Content-ID" header field is syntactically similar to RFC 822 "Message-ID" header field. In MIME message bodies may be labelled using "Content-ID" header field. Like "Message-ID" header field, the "Content-ID" header field value should be world-unique" This header field is required only in the "multipart/external-body" content type [6]. In "multipart/alternative" content type message subparts with same "Content-ID" header field have identical information (i.e. no information is lossed if translated).

Content-Description header field

This header is a way to associate some description of associated MIME subpart. For example a caption that might be displayed along with an image. The description can be any arbitrary US-ASCII text [1].

Future of MIME

MIME is very powerful and flexible way to transfer also other than ASCII text. The extensibility of MIME has been noticed around the world are currently there are a great number of more or less official extensions to MIME described here. Perhaps one of the exotic extensions is media type for chemical compounds. Also security issues through MIME have been issued, but RFC 2045 - 2049 do not talk about them. RFC 2015 describes the use of Pretty Good Privacy (PGP) to provide privacy and authentication using MIME, a.k.a PGP/MIME. Also S/MIME has been defined to provide consistent way to send and receive secure MIME data.

Unfortunately only a few vendors have chosen to support MIME in all its glory. The processing power of home computer is increasing all the time and this means possibility to handle more complex messages and meet the users' requirements.

References

[1] Lawrence Hughes, Internet e-mail protocols, standards and implementation, 1998 Artech House, Inc., ISBN 0-89006-939-5
[2] Crocker D. H., RFC 822: Standard for the Format of ARPA Internet Text Messages, August 1982
<ftp://ftp.funet.fi/rfc/rfc822.txt>
[3] Marshall T. Rose, Einar A. Stefferud, RFC 934: Proposed Standard for Message Encapsulation, January 1985
<ftp://ftp.funet.fi/rfc/rfc934.txt>
[4] M. Sirbu, RFC 1049: A Content-Type Header Field for Internet Messages, March 1988
<ftp://ftp.funet.fi/rfc/rfc1049.txt>
[5] D. Robinson, R. Ullmann, RFC 1154: Encoding Header Field for Internet Messages, April 1990
<ftp://ftp.funet.fi/rfc/rfc1154>
[6] Freed N., Borenstein N., RFC 2045: Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies, November 1996
<ftp://ftp.funet.fi/rfc/rfc2045.txt>
[7] Freed N., Borenstein N., RFC 2046: Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, November 1996
<ftp://ftp.funet.fi/rfc/rfc2046.txt>
[8] Freed N., Klensin J., Postel J., RFC 2048: Multipurpose Internet Mail Extension (MIME) Part Four: Registration Procedures, November 1996
<ftp://ftp.funet.fi/rfc/rfc2311.txt>

Further Information

http://www.isi.edu/in-notes/iana/assignments/media-types/media-types
List of official media types registered by IANA
Internet Assigned Numbers Authority (IANA)
IANA Home Page
http://www.ch.ic.ac.uk/chemime/
The Chemical MIME Home Page
http://biotech.chem.indiana.edu/mime/mime.html
The Chemical MIME Connection
http://www.rsa.com/smime/sdw5/sld001.htm
S/MIME Overview (slideshow)
ftp://ftp.funet.fi/rfc/rfc1847.txt
Galvin J., Murphy S., Crocker S., Freed N., RFC 1847: Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted, October 1995
ftp://ftp.funet.fi/rfc/rfc2015.txt
Elkins M., RFC 2015: MIME Security with Pretty Good Privacy (PGP), Pctober 1996
ftp://ftp.funet.fi/rfc/rfc2047.txt
Moore K., RFC 2047: MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text, November 1996
ftp://ftp.funet.fi/rfc/rfc2048.txt
Dusse S., Hoffman P., Ramsdell B., Lundblad L., Repka L., RFC 2311: S/MIME Version 2 Message Specification, March 1998
ftp://ftp.funet.fi/rfc/rfc2312.txt
Dusse S., Hoffman P., Ramsdell B., Weinstein J., RFC 2312: S/MIME Version 2 Certificate Handling, March 1998