Summary
How form data can be corrupted
As I wrote in the previous article on some tips for JSF character encoding, JSF mostly depends on javax.faces.request.charset session attribute to determine a correct character encoding to decode form data. JSF 1.1 and 1.2 are designed this way so that you can use existing mechanism such as JSP page directive or servlet filter to change character encoding. I think people usually just set charset in JSP page directive. Then, JSF seems to take care of the rest.
My application uses client side state saving, and view-scoped beans (using t:saveState for request-scoped managed beans.) This is the best way because this way I can let users use multiple browser windows to access the same JSF form at once. If you use session-scoped beans, beans are shared between accesses from different browser windows, and it messes things up. I thought my application does not depend on sessions in any way, so I set short session-timeout. But there was a pitfall!
Users sometimes leave JSF page open, and later come back and do updates on the page. The session has been expired by that time, and JSF falls back to use ISO-8859-1 in decoding form data, when it should be using, for example, Windows-31J for Japanese characters. Corrupted data are then stored in a backing bean, leading to corrupted characters on the response page, or corrupted data can even be saved on a database.
So what to do?
- Set long session-timeout
I don't like this idea because I wanted to set it short for the first place for some reasons.
- Set up a servlet filter to set request encoding
I feel it waste because then JSF spec would look meaningless, and if you changed charset in JSP page directive, you would also have to modify the filter. If they don't match, it will messes things up. A servlet filter sets a request character encoding first, then JSF sets a request character encoding when found in a session, then actual encoding used to decode the form data may be the former or the latter. (Note: The servlet will use the last request character encoding set to parse form data, on an initial call to request.getParameter.)
- Add a logic to disallow form data if a session is null or new
This would make it worse in terms of application usability. What would the error message say? "You were away too long, so I forgot the character encoding?"