![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Hobbyist Programmer
|
Parsing Microsoft Word Documents
I am looking for a way to parse word documents to avoid inputing archived technology profiles. The document consists of a title, date, and a table of information that I need. Currently I am using the Java POI API to do this, however it has very basic support for MS Word. The only way I can parse the table is by depending on a special character that shows up at the end of each cell and each row. However, this special character shows up other times in the cell if hyperlinks are used, etc. So this isn't dependable. Is there a better API out there?
The other option I was thinking about is converting the document to clean HTML and scrape the output. I'm not aware of any components that do this. What are your thoughts on this? |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Replacing a word with another word | 357mag | C++ | 1 | Jun 14th, 2007 8:25 PM |
| microsoft word web programming | hervens48 | Coder's Corner Lounge | 14 | Jul 20th, 2006 12:53 AM |
| Reading/Writing Word Documents in Python (with win32com.client?) | titaniumdecoy | Python | 2 | Jul 14th, 2006 2:44 PM |
| crack these questions if u can!!! | shagan | C++ | 18 | Apr 3rd, 2005 6:47 AM |
| how to implemen the Find funtion in the text editor to locate a word in a document??? | allen1984us | C++ | 4 | Mar 8th, 2005 10:32 AM |