Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   Java (http://www.programmingforums.org/forum17.html)
-   -   Starting an RSS Reader (http://www.programmingforums.org/showthread.php?t=11177)

ReggaetonKing Aug 26th, 2006 2:34 AM

Starting an RSS Reader
 
I am in the planning process of my RSS Reader project. I am creating an RSS Reader because I want to exercise my Java skills for the end of the summer and create something that maybe people who actually use besides a stupid text editor. I know there are tons of open source RSS Readers with highly sophisticated options and features but I am not all in to that. I want to develop something simple and easy to use.

The main reason why I am posting is because I have a problem with parsing and formating the RSS document. I am able to get the RSS document from the websites such as PFO's but I want to parse it and format it to an easy on the eyes display. I have no experience in XML and for this project I think it will enhance my skills in Java by doing something I know while learning something new.

Any help or suggestions guys??

Side Note:
I came around this website, http://xerces.apache.org. I know it's a open source XML Parser for Java but I know no clue how to use it.

Polyphemus_ Aug 26th, 2006 4:50 AM

I found this online book on the internet: http://www.cafeconleche.org/books/xmljava/

May be useful, especially chapter 5 :).

Game_Ender Aug 26th, 2006 10:41 AM

You can also check the out this class in the Java 1.5 api.

grimpirate Aug 26th, 2006 4:55 PM

Don't know if this helps exactly but I used an XML document for a program of mine to hold the configuration info and I created a class to parse that info. I couldn't quite understand how to implement the java xml classes hehe. Here's the code anyways, but be warned it only works with simple tags such as <this> </this> it doesn't support attributes that are embedded in the tags.
:

import java.io.File;
import java.io.RandomAccessFile;

public class ConfigParser
{
    /////////////////////
    // Instance Fields //
    /////////////////////
   
    // The contents of the XML configuration file
    private String xmlContents = "";
   
    // Flag to denote successful creation of the parser
    public boolean SUCCESS = true;
   
    //****************************************************************************************************//
   
    /////////////////
    // Constructor //
    /////////////////
   
    // If the object is constructed this way it means the the file that was passed is invalid
    public ConfigParser()
    {
        SUCCESS = false;
    }

    public ConfigParser(String fileName)
    {
        try
        {
            File xmlFile = new File(fileName);
            RandomAccessFile temp = new RandomAccessFile(xmlFile, "r");
            byte[] charStream = new byte[(int)xmlFile.length()];
            temp.read(charStream);
            temp.close();
            for(int i = 0; i < charStream.length; i++)
            {
                xmlContents += (char)charStream[i];
            }
        }
        catch(Exception e)
        {
            System.err.println("ERROR: XML Parsing");
            SUCCESS = false;
        }
    }
   
    //****************************************************************************************************//
   
    /////////////
    // Methods //
    /////////////

    // Returns the value of a given XML tag
    public String getValue(String tag)
    {
        String result;

        int offset = tag.length() + 1;
        result = xmlContents.substring(xmlContents.toLowerCase().indexOf(tag.toLowerCase()) + offset, xmlContents.toLowerCase().lastIndexOf("</" + tag.toLowerCase()));

        return result;
    }
}

The constructor takes an xml file as its input, and the getValue method basically takes the property name such as "this" and returns whatever is in its tags.

ReggaetonKing Aug 26th, 2006 11:17 PM

Thanks for the help grimpirate but do you know that there is a similar class in the Java API for configuration and "Preferences". It's the java.util.prefs.* package. Check it out, it's really good and saves the preferences in a XML file just as you do also.

Thanks guys for the links. I am looking in to these now.

alcdotcom Aug 28th, 2006 12:27 PM

Quote:

Originally Posted by reggaeton_king
I am in the planning process of my RSS Reader project. I am creating an RSS Reader because I want to exercise my Java skills for the end of the summer and create something that maybe people who actually use besides a stupid text editor. I know there are tons of open source RSS Readers with highly sophisticated options and features but I am not all in to that. I want to develop something simple and easy to use.

The main reason why I am posting is because I have a problem with parsing and formating the RSS document. I am able to get the RSS document from the websites such as PFO's but I want to parse it and format it to an easy on the eyes display. I have no experience in XML and for this project I think it will enhance my skills in Java by doing something I know while learning something new.

Any help or suggestions guys??

Side Note:
I came around this website, http://xerces.apache.org. I know it's a open source XML Parser for Java but I know no clue how to use it.

I created an RSS/Atom reader last year (screen shot). You can use the DocumentBuilder class as someone suggested, or you can create you own SAX parsers using org.xml.sax.helpers.DefaultHandler. I went the SAX route so that I'd have more control over the parsing process and the memory that was used. You see, DocumentBuilder will build a DOM of everything in the XML document - even stuff you may not care about, which takes time and memory. If you create your own parser (which is not all that difficult), you can decide what gets stored. I basically created parsers for each type of feed (RSS, RDF, ATOM, etc) and had classes which represent pieces of the XML document (usually elements) to store the incoming data. The first thing I did was look at the different specifications and their versions. RSS has a fairly complicated past and it's version history is not linear. There was a branch where RSS 1.0 started using the RDF namespace. I recommend looking here and here for info on that. IBM Developer Works has some good Java Sax tutorials. Something else to consider is whether you'll be rendering HTML that comes in via some feeds. If so, you'll find that Java's HTMLEditorKit is very inadequite as it doesn't even completely render HTML 4.0 (at least it didn't the last time I looked ). My solution was to try to find another Java-based HTML renderer. At the time JRex seemed like a good option, but I couldn't get it to work. So, I used the WebBrowser class from the JDIC project. I noticed many caveats in buildin my reader. First, most feeds don't implement all (or sometimes any) of the optional elements,so you'll have to account for that. Second, there are numerous parsing issues dealing with entities (e.g. "&amp;") and making sure HTML content isn't parsed as XML elements. Hopefully this will get you started in the right direction.

P.S. that online book posted earlier looks like it has pretty good info.
P.P.S. My news reader has editable and drag-droppable file folders on he left. Something else to think about as it was a hurdle to develop.

ReggaetonKing Aug 28th, 2006 2:41 PM

Help me A LOT! Thanks alcdotcom!!

alcdotcom Aug 28th, 2006 5:24 PM

Here's another article you might find useful. This one addresses a way to stop the SAX parser in mid parse. Tere is no built-in way to stop a SAX parser, so the gist is that you have to throw a SAXException to stop it. Why would you want to do this? Well, imagine this scenario. You load a feed, not knowing what type it is (RSS, ATOM, etc). You have parser who's sole purpose is to detemine what type of feed it is and then hand it off to an instance of the appropriate parser type. You could allow it to parse the entire document, reading only the first element to get the type, or you could abort the parsing after the type is obtained. Obviously, aborting the parse could save you loads of time, especially if you've got lots of documents to parse or a long document. Otherwise you're parsing it twice. What I did was create a sub-class of SAXParserException called ParsingAbortException (doesn't matter what you call it). This way, you can easily differentiate between your exception and actual SAXParserExceptions.

ReggaetonKing Aug 29th, 2006 1:35 AM

Your posts gave me a lot of direction, thanks man!

alcdotcom Aug 29th, 2006 8:54 AM

No problem. I've been there and I thought I'd try to save you a little time. I haven't looked at the new XML handling of 1.6 (Mustang), but I've heard that it's improved.


All times are GMT -5. The time now is 1:13 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC