Google

Thursday, February 19, 2009

How to strip HTML Tags





The output is:

Honolulu Tue 5:05 AM
Washington DC Tue 10:05 AM
Oslo Tue 4:05 PM


The important parts:

String stripH = dataH.replaceAll("<.*?>"," ");

The method replaceAll("<.*?>"," ") strips out all HTML tags
by searching for anything with angle brackets ( < > ) and replace it
with a blank space ( " ").

The regular expression <.*?> means match anything within
the angle brackets once. The dot ( . ) means any character.
The quantifier ( * ) means zero or more. The ( ? ) qualifies
the quantifier ( * ) by saying match it once only. The ( *? ) is
also called a 'reluctant' quantifier.

So, it will catch all the following:


and replaces each one with a blank space.


For J2ME, use this:




Assuming :




Call the method as follows:

String stripped = doStripHtml(data);

Note that
you will still need to modify the code before
you can use it in your HTML Parser application. But I
leave that to you to do.

Below is yet another version: