Pineapple messages are stored in plain text files. They are modified only slightly from the original data received from the news server.
When it is first created, a message file will have the type code ‘PMSG’. If it was created by Pineapple News, its creator code will be ‘PNEW’. If it was created by Pineapple Mail, its creator code will be ‘PMAI’. Apart from type and creator codes, pineapple message files make no use of HFS+ metadata, such as resource forks. In other words, you can transfer raw pineapple messages to other platforms and back again without losing any data. It won’t matter much if the type and creator codes are lost, because pineapple messages can be properly identified by their extension alone.
When Pineapple News or Pineapple Mail create a new pineapple message file, the leaf name will look like this:
PM022AE1E27C72E8C0D02DCE8077B124E0.pmsg
The format is: PM, for “pineapple message,” then 32 hexadecimal characters, and finally the extension, .pmsg. The hexadecimal part is an MD5 hash of the message’s message-ID, including the less-than and greater-than delimiters.
Using this naming convention has several benefits. For one thing, it’s impossible to have two messages with the same message-ID in the same directory, because they’d both have the same leaf name. It also makes quick searches possible. Say you want to know if a message with a given message-ID exists in a particular directory. Using the message-ID, it’s possible to construct the leaf name that such a message should have, and then check the directory to see if a message by that name exists.
However, this is just a convention. The pineapple programs will be still be able to deal with pineapple messages if you change their leaf names. The only hard and fast rule is that message files must have the .pmsg extension. If you change a message’s extension, the program will almost certainly lose track of it.
This header records state information for the message. The data portion of the header is always ten characters long. As of this writing, the first six character positions are in use, and the last four are reserved for future expansion. Here’s an example:
X-Pineapple-State: UCNRCxxxxx
Position 0: Message type
E E-mail message
U USENET message
x Unknown
Position 1: Message state
H Headers-only, no body
C Complete, headers and body
x Unknown
Position 2: Attachment status
A First of multi-part, single-file attachment, or multiple attachments
S Second or subsequent part of a multi-part attachment
N No attachment
x Unknown
Position 3: Unread status
U Unread
R Read
x Unknown
Position 4: Download request
N Not downloaded
R Marked for download
C Download complete
F Download attempt failed
x Unknown
Position 5: Forced charset
x No forced charset
1 ISO-8859-1
2 ISO-8859-2
... and many, many more
This type of header is only found in messages that have attachments or MIME sections. (Describing a “MIME section” is beyond the scope of this help topic. If you’re curious, see RFC822, RFC2045, and related documents.) The header denotes the byte offset where the section is located in the file, the section type, the encoding type, attachment filename, and so on.
Here are some examples. Say you’ve got a MIME alternative message that includes the same content twice, one section in plain text, another in HTML, both using the quoted-printable encoding. Its section headers might look like this:
X-Pineapple-Section: 00000378 text quot-print
X-Pineapple-Section: 00000745 html quot-print
Here’s the headers from a message that contains several uuencode attachments:
X-Pineapple-Section: 00000880 attachment uuencode 1/1 image279.jpg
X-Pineapple-Section: 0000A2E4 attachment uuencode 1/1 image280.jpg
X-Pineapple-Section: 00013D48 attachment uuencode 1/1 image281.jpg
Here’s the headers from a message that contains two MIME human-readable sections and two attachments:
X-Pineapple-Section: 00000898 text quot-print
X-Pineapple-Section: 000052C5 html quot-print
X-Pineapple-Section: 0000D068 attachment base64 1/1 124.jpeg image/jpeg
X-Pineapple-Section: 0000F061 attachment base64 1/1 104.jpeg image/jpeg
Fields in the header are separated by whitespace that is guaranteed to contain exactly one tab character.
The first field in the header is the hexadecimal byte offset in the file where the section starts.
The byte offset for uuencode and yenc attachments is well-defined. The offset given will point to the ‘b’ in the begin 666 attach.txt line, or to the ‘=’ in the =ybegin part=1... line. If this is the second or subsequent part of a multi-file uuencode attachment, the offset points to the first byte of the first uuencoded line. For MIME sections and attachments, it’s more complicated.
If a MIME message contains one or more sections, the offset will point to the beginning of the boundary line that marks the beginning of the section. A boundary line will always begin with two equals signs, so there’s an easy sanity check that can be performed. If this is an attachment in a MIME message with no boundaries — the second part of a multi-part attachment, for example — then there won’t be a boundary string to point to.
If you’re looking at the first part of a multi-part MIME attachment, there will be a set of “phantom” headers right below the message’s main headers. If there’s no boundary string, then the section offset will point to the beginning of the phantom headers.
There may not be a boundary string or phantom headers. In that case, the section offset points to the first byte of the first base64-encoded line, or to the first line of a plain-text-encoded attachment.
Whew. That’s hard. Here’s the algorithm that the pineapple programs use to advance through the beginning of a MIME section.
The second field indicates the section type, which can be one of the following:
text Human-readable text html Human-readable HTML attachment Text or binary attachment
The third field indicates the encoding type, which can be one of the following:
plain Plain text quot-print Quoted-printable text or attachment base64 MIME base64 text or attachment uuencode Uuencoded attachment yenc Yenc-encoded attachment
The remaining fields are present only for attachments. They will not be included in headers that describe human-readable text or HTML sections.
The fourth field indicates the part number of this message and the total number of parts in x/y format. The third of four parts would be represented as 3/4. If this is the only part, the field contents will be 1/1.
The fifth field is the leaf name of the attachment. Leading and trailing whitespace should be stripped off and ignored. The filename will never include tab characters, which will have been stripped out, if they existed in the original text. The filename will always be represented in the UTF-8 charset, regardless of what charset the rest of the message uses.
The sixth field is the attachment’s MIME type. It is included only if this is a MIME attachment, because the MIME type of yenc and uuencode attachments is unknown.