Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   Java (http://www.programmingforums.org/forum17.html)
-   -   Help Parsing 1.2 Gig XML File (http://www.programmingforums.org/showthread.php?t=14921)

fahlyn Jan 11th, 2008 3:08 PM

Help Parsing 1.2 Gig XML File
 
Here is my situation: I currently get a very large xml file each month that contains data that I need to parse and load into a database. I just finished a very complex system to handle the data in this file, however I've run into a problem.

The problem is that I'm forced to open the file twice. There is a date in the header of the file that I need to grab and look at before I know if I want to parse the file. Currently, I'm using Dom4j to parse the file. I'm setting a callback handler to trigger when it finds my date node. What I'd like to be able to do is stop the parser once I find the date node. Currently, I've got this code:

:

        public void handleEffectiveDate() {
                reader.addHandler("/Package/PackageHeader/AsOfDate",
                        new ElementHandler(){
                                public void onStart(ElementPath path){}
                                public void onEnd(ElementPath path){
                                        Element asOfDate = path.getCurrent();
                                        String[] date = asOfDate.getStringValue().split("-");
                                        ((DataWarehouse32Assembler)getBatchUploadProcesser().getAssembler()).setUploadEffectiveDate(Integer.parseInt(date[0]), Integer.parseInt(date[1]), Integer.parseInt(date[2]));
                                        System.out.println("Data Effective Date saved in assembler!");
                                        //there can be only one, like the highlander, so once you find it stop looking
                                        reader.removeHandler("/Package/PackageHeader/AsOfDate");
                                        asOfDate.detach();
                                }
                        }
                );
        }


I was hoping that by removing the only handler the parser would stop, however it very obviously does not.

Any recommendations?

null_ptr0 Jan 11th, 2008 4:12 PM

Re: Help Parsing 1.2 Gig XML File
 
XStream and Properties.load()

fahlyn Jan 11th, 2008 9:20 PM

Re: Help Parsing 1.2 Gig XML File
 
How in the world would that help me?

Dameon Jan 11th, 2008 9:46 PM

Re: Help Parsing 1.2 Gig XML File
 
I'm not sure that reading only part of a document is an intended use of Dom4j. You might try throwing an exception and catching it where you call the method to begin parsing. This is just a shot in the dark.

null_ptr0 Jan 11th, 2008 9:59 PM

Re: Help Parsing 1.2 Gig XML File
 
Quote:

Originally Posted by Dameon (Post 139534)
I'm not sure that reading only part of a document is an intended use of Dom4j. You might try throwing an exception and catching it where you call the method to begin parsing. This is just a shot in the dark.

I DON'T think throwing errors as a kind of event-handling mechanism is a good thing for the jvm, or anything.

Dameon Jan 11th, 2008 11:10 PM

Re: Help Parsing 1.2 Gig XML File
 
It's ugly, but might work. I wouldn't think that one exception per undesirable file is a serious performance concern, either. There simply doesn't seem to be a clean or official way to go about it.

fahlyn Jan 12th, 2008 7:41 AM

Re: Help Parsing 1.2 Gig XML File
 
I was able to figure out a solution. What I'm doing is the date event handler validates that the file is one that I want to process, if it is then I set the additional event handlers, if not, i just kill the process.

its ugly, i know, but i'd much rather just kill the job than let it run and do nothing for another half hour.

Using Dom4j it takes only a few seconds to get the date and kill the job if I determine that the file shouldn't be processed. When I determine that the file should be processed it takes about 6 minutes to process the entire 1.2 gig file.

I'm still a huge fan of Dom4j.

fahlyn Jan 12th, 2008 7:43 AM

Re: Help Parsing 1.2 Gig XML File
 
I am still curious to know what null_ptr0 is talking about with XStream and Properties.load(). I don't see how that is relevant to this at all.

null_ptr0 Jan 16th, 2008 5:01 PM

Re: Help Parsing 1.2 Gig XML File
 
Quote:

Originally Posted by fahlyn (Post 139548)
I am still curious to know what null_ptr0 is talking about with XStream and Properties.load(). I don't see how that is relevant to this at all.

They load from XML.

fahlyn Jan 17th, 2008 6:09 PM

Re: Help Parsing 1.2 Gig XML File
 
that still doesn't make sense. A lot of things "load from XML"...but whatever...i've got my solution.


All times are GMT -5. The time now is 3:39 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC