Encode/Decode HTML in Java

It is often necessary to escape the special html code from the user input in case of avoiding cross site attack (XSS).

Initially i thought jdk provides a method somewhere to do this like function htmlentities() in php, but i failed to find it. All i found is a class called “URLEncoder ” which i don’t think can do this job.

I don’t want to reinvent the wheel as I believe there must be some java packages available that do this job. Googling “java encode html” didn’t lead me straight to the right java package (at least not the one I’d like to use).

After a while, I finally found one package i’d like to use. it’s from Apache Commons project, called “Commons Lang“. The method “StringEscapeUtils.escapeHtml(…) ” can do the encode job while the other method called unescapeHtml can do the decode job. So, I don’t have to write my own method… :)

7 Responses to “Encode/Decode HTML in Java”

  1. Jawed Ali says:

    What about this class: java.net.URLDecoder

    check out java docs, and I believe it is there for a long time.
    http://java.sun.com/j2se/1.4.2/docs/api/java/net/URLDecoder.html

  2. robin says:

    URLDecoder and URLEncoder only deal with the URL. For exmaple, URLEncoder would encodes space to %20. However, escaping the HTML content is a different thing.

  3. Jesse says:

    FYI: I think JTidy is another way to do what you want.

  4. Meher says:

    thank you robin :) thank you alls lol

  5. jukty says:

    JTidy is evil,
    it also escapes spaces and number and everything else that is not letters so that the resulting string is totally unreadable.
    Also it escapes to unicode form like &#xxxx; not to standart HTML entities like &.
    So use only apache, it rules!

  6. jukty says:

    I mean “… to standart HTML entities like & …”

  7. Nick says:

    You should also try using JSTL standard c:out tag. The c:out tag has an attribute escapeXml, which can be set to true and will escape >, <, ", '

    Example:

Leave a Reply

Spam Protection by WP-SpamFree