Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Jun 7th, 2005, 9:12 PM   #1
jonnymp316
Newbie
 
Join Date: Jun 2005
Posts: 1
Rep Power: 0 jonnymp316 is on a distinguished road
Getting Info From an HTML File

So I am looking for a simple way to do this, and I just need to be pointed in the right direction. So, I am trying to basically get information that is inside of tables in an HTML file. What I want to do is to turn the data inside the <TD> lines in the tables into an array so everything between <TD> and </TD> gets added to an array. Does anyone know a simple way of doing this? The way that I have started go is getting very complex, any help is much appreciated.

-Jonny
jonnymp316 is offline   Reply With Quote
Old Jun 10th, 2005, 9:39 AM   #2
Dizzutch
Professional Programmer
 
Dizzutch's Avatar
 
Join Date: Dec 2004
Location: Worcester, MA
Posts: 441
Rep Power: 4 Dizzutch is on a distinguished road
Send a message via ICQ to Dizzutch Send a message via AIM to Dizzutch Send a message via MSN to Dizzutch Send a message via Yahoo to Dizzutch
read the file, and do something like
if (/<td>(.*)<\/td>/) print $1;
that's probably wrong, cuz i can't test it, and i don't know it off the top of my head, but I hope you get the drift.
Dizz
__________________
naked pictures of you | PFO F@H stats
Dizzutch is offline   Reply With Quote
Old Jun 12th, 2005, 9:42 AM   #3
Dizzutch
Professional Programmer
 
Dizzutch's Avatar
 
Join Date: Dec 2004
Location: Worcester, MA
Posts: 441
Rep Power: 4 Dizzutch is on a distinguished road
Send a message via ICQ to Dizzutch Send a message via AIM to Dizzutch Send a message via MSN to Dizzutch Send a message via Yahoo to Dizzutch
here's it a little more detailed
open FILE, "<file.html";
@tables;
while (<FILE>)
{
   chomp;
   if (/<td>(.*)<\/td>/){
      push @tables, $1;
   }
}
close FILE;
Dizz
__________________
naked pictures of you | PFO F@H stats
Dizzutch is offline   Reply With Quote
Old Jul 3rd, 2005, 9:20 AM   #4
mackenga
Professional Programmer
 
Join Date: Mar 2005
Location: Glasgow, Scotland
Posts: 314
Rep Power: 4 mackenga is on a distinguished road
Rooting through HTML is something I've done a painfully large amount of. Pitfalls to look out for (I notice they've been jumped right into with the code examples above, but it's easy done) are HTML's case insensitivity (use the i option on regexps; some people use caps in their HTML tags, like <TD> rather than <td>) and the dodgy nature of much production HTML code. For example, look out for spaces and other cruft in the tags; e.g.

<   Td D colsp6o4y2gh >

OK, it's not often THIS bad, but most browsers would accept the above as:

<TD>

so any code crawlers should be equally generous. Check the HTML specification if in doubt about what to accept and where.
mackenga is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 7:57 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC