HTTP Header Content Type and Encodings

Sorry about this – It was thrown together quickly more as notes to myself. Maybe you can use it ?

Needed this info. for one of my websites. Wanted to put up a feature to allow downloading of PDF’s from my google app engine service. So had to dig into the google box to find this material. First up are some references i found that helped me to understand the complexity of the task. This is necessary to create http headers in the payload going to a client like a web-browser. The http session content type is like a ‘suggestion’ to the receiving client as to what the payload has in it. That way the client can properly render or deal with the payload. If no content type is indicated, then the browser will guess as to the type of content. Another important header is the ‘Content Disposition’ which has something to do with the MIME type of the document, and even more importantly, the character encoding. See wiki reference for more MIME.

Servlet Tutorial on Session Tracking

Servlet-Tutorial-Session-Tracking

Servlet Response Headers Tutorial

http://www.apl.jhu.edu/~hall/java/Servlet-Tutorial/Servlet-Tutorial-Response-Headers.html

How to Serve A PDF From a Java Servlet

This java sample works but runs as slow as water uphill. Typically you would have a server somewhere on the internet, say amazon S3 or perhaps google, at least a service that supports the running of java jvm’s. how-do-i-serve-up-a-pdf-from-a-servlet

Tutorial on Servlet Content Types

Setting_content-type_utf-8
A must read to understand servlet content-type declarations. it’s not as easy as you think !

Unicode and Character Sets

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Joel on Software http://www.joelonsoftware.com/articles/Unicode

RFC2045 (MIME) + Base64 Content-Transfer-Encoding

rfc 2045, 1996 – the document that started it all:
http://www.ietf.org/rfc/rfc2045.txt

How to Create a Custom Jquery Plugin

http://www.ibm.com/developerworks/library/wa-jqplugin/wa-jqplugin-pdf.pdf
– courtesy IBM developerworks

Wiki MIME Content Disposition

http://en.wikipedia.org/wiki/MIME#Content-Disposition

Oracles’ Servlet Specification Javadocs

http://docs.oracle.com/javaee/1.4/api/javax/servlet/ServletResponse.html#setContentType(java.lang.String)

http://docs.oracle.com/javaee/1.3/api/javax/servlet/ServletResponse.html

Streaming Large Files In A Java Servlet

http://stackoverflow.com/questions/55709/streaming-large-files-in-a-java-servlet


import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.OutputStream;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class ReaderServlet extends javax.servlet.http.HttpServlet implements javax.servlet.Servlet {
private static final long serialVersionUID = 1L;
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
 doIt(request, response);
 }
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException,
 IOException {
 doIt(request, response);
 }
private void doIt(HttpServletRequest request, HttpServletResponse response) throws ServletException,
 IOException {
// fill in tis bit with your PDF file to be read
 String pdfFileName = "somefilename.pdf";
 String contextPath = getServletContext().getRealPath(File.separator);
 File pdfFile = new File(contextPath + pdfFileName);
response.setContentType("application/pdf");
 response.addHeader("Content-Disposition", "attachment; filename=" + pdfFileName);
 response.setContentLength((int) pdfFile.length());
FileInputStream fileInputStream = new FileInputStream(pdfFile);
 OutputStream responseOutputStream = response.getOutputStream();
 int bytes;
 while ((bytes = fileInputStream.read()) != -1) {
 responseOutputStream.write(bytes);
 }
}
}

It’s possible to have a servlet serve up PDF content by specifying the content type of the servlet response to be the ‘application/pdf‘ MIME type via response.setContentType(“application/pdf“). This tut demonstrates this as follows.

The TestServlet class is mapped to /test in your web.xml file. When the TestServlet is hit by a browser request, it locates the test.pdf file in the root of the web directory. It sets the response content type to be ‘application/pdf‘, specifying that the response is an attachment, and sets the response content length. Following that, it writes the contents of the PDF file to the response output stream.

If we hit the TestServlet, the browser may ask us if we’d like to open or save the test.pdf file. Some browsers will, others do not ask.

This technique can be useful in a variety of ways. For example, PDF content can be generated dynamically and returned to a user via the response output stream without ever needing to create an actual file in the file system. In addition, having a servlet serve up PDF content can restrict access to a PDF file in the file system  since a servlet can determine who should have access to a particular PDF file.
The original article is courtesy of Deron Eriksson: http://www.avajava.com/tutorials/lessons/how-do-i-serve-up-a-pdf-from-a-servlet.html

Servlet Javadocs

http://docs.oracle.com/javaee/1.4/api/javax/servlet/ServletResponse.html

Wiki for MIME

(Multipurpose Internet Mail Extensions) discusses these issues more fully: http://en.wikipedia.org/wiki/MIME. Here we can read up on the different MIME headers, versions, content id’s, content dispositions and transfer encodings. This is a more complex discussion of multi-part messages.

IANA manage the list of known MIME Media types, see here: http://www.iana.org/assignments/media-types/index.html
internet media types wiki: http://en.wikipedia.org/wiki/Internet_media_type

java servlet response setContentType allows several possible configurations based on the MIME type: 
response.setContentType("application/json");
response.setContentType("text/html;charset=UTF-8");
response.setContentType("text/plain");

or for typical images, something like this:
Content-Type: image/jpeg
Content-Disposition: attachment; filename=santa.jpeg;


Here is a short list

Type of Application

For Multipurpose Files

application/atom+xml: Atom feeds

application/ecmascript: ECMAScript/JavaScript; Defined in RFC 4329 (equivalent to application/javascript but with stricter processing rules)

application/EDI-X12: EDI X12 data; Defined in RFC 1767

application/EDIFACT: EDI EDIFACT data; Defined in RFC 1767

application/json: JavaScript Object Notation JSON; Defined in RFC 4627

application/javascript: ECMAScript/JavaScript; Defined in RFC 4329 (equivalent to application/ecmascript but with looser processing rules) It is not accepted in IE 8 or earlier – text/javascript is accepted but it is defined as obsolete in RFC 4329. The “type” attribute of the <script> tag in HTML5 is optional and in practice omitting the media type of JavaScript programs is the most interoperable solution since all browsers have always assumed the correct default even before HTML5.

application/octet-stream: Arbitrary binary data. Generally speaking this type identifies files that are not associated with a specific application. Contrary to past assumptions by software packages such as Apache this is not a type that should be applied to unknown files. In such a case, a server or application should not indicate a content type, as it may be incorrect, but rather, should omit the type in order to allow the recipient to guess the type.

application/ogg: Ogg, a multimedia bitstream container format; Defined in RFC 5334

application/pdf: Portable Document Format, PDF has been in use for document exchange on the Internet since 1993; Defined in RFC 3778

application/postscript: PostScript; Defined in RFC 2046

application/rdf+xml: Resource Description Framework; Defined by RFC 3870

application/rss+xml: RSS feeds

application/soap+xml: SOAP; Defined by RFC 3902

application/font-woff: Web Open Font Format; (candidate recommendation; use application/x-font-woff until standard is official)

application/xhtml+xml: XHTML; Defined by RFC 3236

application/xml-dtd: DTD files; Defined by RFC 3023

application/xop+xml:XOP

application/zip: ZIP archive files;

application/x-gzip: Gzip

Type Audio

For Audio

audio/basic: mulaw audio at 8 kHz, 1 channel; Defined in RFC 2046

audio/L24: 24bit Linear PCM audio at 8-48kHz, 1-N channels; Defined in RFC 3190

audio/mp4: MP4 audio

audio/mpeg: MP3 or other MPEG audio; Defined in RFC 3003

audio/ogg: Ogg Vorbis, Speex, Flac and other audio; Defined in RFC 5334

audio/vorbis: Vorbis encoded audio; Defined in RFC 5215

audio/x-ms-wma: Windows Media Audio; Documented in MS kb288102

audio/x-ms-wax: Windows Media Audio Redirector; Documented in MS kb288102

audio/vnd.rn-realaudio: RealAudio; Documented in RealPlayer Help

audio/vnd.wave: WAV audio; Defined in RFC 2361

audio/webm: WebM open media format

Type Image

image/gif: GIF image; Defined in RFC 2045 and RFC 2046

image/jpeg: JPEG JFIF image; Defined in RFC 2045 and RFC 2046

image/pjpeg: JPEG JFIF image; Internet Explorer; Listed in ms775147(v=vs.85) – Progressive JPEG, initiated before global browser support for progressive JPEGs (Microsoft and Firefox).

image/png: Portable Network Graphics; Defined in RFC 2083

image/svg+xml: SVG vector image; Defined in SVG Tiny 1.2 Specification Appendix M

image/tiff: Tag Image File Format (only for Baseline TIFF); Defined in RFC 3302

image/vnd.microsoft.icon: ICO image;

Type Multipart

For archives and other objects with more than one part.

multipart/mixed: MIME Email; Defined in RFC 2045 and RFC 2046

multipart/alternative: MIME Email; Defined in RFC 2045 and RFC 2046

multipart/related: MIME Email; Defined in RFC 2387 and used by MHTML (HTML mail)

multipart/form-data: MIME Webform; Defined in RFC 2388

multipart/signed: Defined in RFC 1847

multipart/encrypted: Defined in RFC 1847

Type text

For Human-Readable Text and Source Code

text/cmd: commands; subtype resident in Gecko browsers like Firefox 3.5

text/css: Cascading Style Sheets; Defined in RFC 2318

text/csv: Comma-separated values; Defined in RFC 4180

text/html: HTML; Defined in RFC 2854

text/javascript (Obsolete): JavaScript; Defined in and obsoleted by RFC 4329 in order to discourage its usage in favor of application/javascript. However, text/javascript is allowed in HTML 4 and 5 and, unlike application/javascript, has cross-browser support. The “type” attribute of the <script> tag in HTML5 is optional and there is no need to use it at all since all browsers have always assumed the correct default (even in HTML 4 where it was required by the specification).

text/plain: Textual data; Defined in RFC 2046 and RFC 3676

text/vcard: vCard (contact information); Defined in RFC 6350

text/xml: Extensible Markup Language; Defined in RFC 3023

Type Video

For video

video/mpeg: MPEG-1 video with multiplexed audio; Defined in RFC 2045 and RFC 2046

video/mp4: MP4 video; Defined in RFC 4337

video/ogg: Ogg Theora or other video (with audio); Defined in RFC 5334

video/quicktime: QuickTime video;

video/webm: WebM Matroska-based open media format

video/x-matroska: Matroska open media format

video/x-ms-wmv: Windows Media Video; Documented in Microsoft KB 288102

video/x-flv: Flash video (FLV files)

XML Use On The Internet

XML is described in more details in this Wiki: http://en.wikipedia.org/wiki/XML#Use_on_the_Internet
The design goals of XML emphasize simplicity, generality, portability and usability over the Internet.

It is a textual data format with Unicode support for several world languages. It is widely used for the representation of data structures and as a data transport layer between applications written in many programming languages.

XML Dialects include RSS, Atom, SOAP, and XHTML plus several office products such as Microsoft office, OpenOffice and LibreOffice, plus one variant as a comms protocol called XMPP for chat sessions.

Content-Disposition

The original MIME specifications only described the structure of mail messages. They did not address the issue of presentation styles. The content-disposition header field was added in RFC 2183 to specify the presentation style. A MIME part can have:

  • an inline content-disposition, which means that it should be automatically displayed when the message is displayed, or
  • an attachment content-disposition, in which case it is not displayed automatically and requires some form of action from the user to open it.

Content-Transfer-Encoding

In June 1992, MIME (RFC 1341, since made obsolete by RFC 2045) defined a set of methods for representing binary data in ASCII text format. The content-transfer-encoding: MIME header has  a two-sided significance:

It indicates whether or not a binary-to-text encoding scheme has been used on top of the original encoding as specified within the Content-Type header:

  • If such a binary-to-text encoding method has been used, it states which one.
  • If not, it provides a descriptive label for the format of content, with respect to the presence of 8 bit or binary content.

The RFC and the IANA’s list of transfer encodings define the values. See: http://www.iana.org/assignments/transfer-encodings/transfer-encodings.xml

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s