Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Jul 24th, 2007, 11:04 AM   #1
hoffmandirt
Hobbyist Programmer
 
hoffmandirt's Avatar
 
Join Date: Jul 2005
Location: PA
Posts: 125
Rep Power: 4 hoffmandirt is on a distinguished road
Send a message via AIM to hoffmandirt
Parsing Microsoft Word Documents

I am looking for a way to parse word documents to avoid inputing archived technology profiles. The document consists of a title, date, and a table of information that I need. Currently I am using the Java POI API to do this, however it has very basic support for MS Word. The only way I can parse the table is by depending on a special character that shows up at the end of each cell and each row. However, this special character shows up other times in the cell if hyperlinks are used, etc. So this isn't dependable. Is there a better API out there?

The other option I was thinking about is converting the document to clean HTML and scrape the output. I'm not aware of any components that do this.

What are your thoughts on this?
hoffmandirt is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Replacing a word with another word 357mag C++ 1 Jun 14th, 2007 8:25 PM
microsoft word web programming hervens48 Coder's Corner Lounge 14 Jul 20th, 2006 12:53 AM
Reading/Writing Word Documents in Python (with win32com.client?) titaniumdecoy Python 2 Jul 14th, 2006 2:44 PM
crack these questions if u can!!! shagan C++ 18 Apr 3rd, 2005 6:47 AM
how to implemen the Find funtion in the text editor to locate a word in a document??? allen1984us C++ 4 Mar 8th, 2005 10:32 AM




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 8:52 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC