View Single Post
Old Aug 28th, 2006, 11:27 AM   #6
alcdotcom
Programmer
 
Join Date: Jan 2006
Location: Dallas, TX
Posts: 49
Rep Power: 0 alcdotcom is on a distinguished road
Quote:
Originally Posted by reggaeton_king
I am in the planning process of my RSS Reader project. I am creating an RSS Reader because I want to exercise my Java skills for the end of the summer and create something that maybe people who actually use besides a stupid text editor. I know there are tons of open source RSS Readers with highly sophisticated options and features but I am not all in to that. I want to develop something simple and easy to use.

The main reason why I am posting is because I have a problem with parsing and formating the RSS document. I am able to get the RSS document from the websites such as PFO's but I want to parse it and format it to an easy on the eyes display. I have no experience in XML and for this project I think it will enhance my skills in Java by doing something I know while learning something new.

Any help or suggestions guys??

Side Note:
I came around this website, http://xerces.apache.org. I know it's a open source XML Parser for Java but I know no clue how to use it.
I created an RSS/Atom reader last year (screen shot). You can use the DocumentBuilder class as someone suggested, or you can create you own SAX parsers using org.xml.sax.helpers.DefaultHandler. I went the SAX route so that I'd have more control over the parsing process and the memory that was used. You see, DocumentBuilder will build a DOM of everything in the XML document - even stuff you may not care about, which takes time and memory. If you create your own parser (which is not all that difficult), you can decide what gets stored. I basically created parsers for each type of feed (RSS, RDF, ATOM, etc) and had classes which represent pieces of the XML document (usually elements) to store the incoming data. The first thing I did was look at the different specifications and their versions. RSS has a fairly complicated past and it's version history is not linear. There was a branch where RSS 1.0 started using the RDF namespace. I recommend looking here and here for info on that. IBM Developer Works has some good Java Sax tutorials. Something else to consider is whether you'll be rendering HTML that comes in via some feeds. If so, you'll find that Java's HTMLEditorKit is very inadequite as it doesn't even completely render HTML 4.0 (at least it didn't the last time I looked ). My solution was to try to find another Java-based HTML renderer. At the time JRex seemed like a good option, but I couldn't get it to work. So, I used the WebBrowser class from the JDIC project. I noticed many caveats in buildin my reader. First, most feeds don't implement all (or sometimes any) of the optional elements,so you'll have to account for that. Second, there are numerous parsing issues dealing with entities (e.g. "&") and making sure HTML content isn't parsed as XML elements. Hopefully this will get you started in the right direction.

P.S. that online book posted earlier looks like it has pretty good info.
P.P.S. My news reader has editable and drag-droppable file folders on he left. Something else to think about as it was a hurdle to develop.
__________________
Java Blog
alcdotcom is offline   Reply With Quote