Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   JavaScript and Client-Side Browser Scripting (http://www.programmingforums.org/forum23.html)
-   -   Script that checks web page content?? (http://www.programmingforums.org/showthread.php?t=12016)

katas Nov 27th, 2006 2:26 PM

Script that checks web page content??
 
Ok here is the problem...

Let's say we have about 1000 pages and we don't know which one has content(of course pages aren't on hard drive)... In what language should the script that checks for the content be and what should it be like???

I'm mostly thinking of Javascript for this task... although i have understanding of php and perl too...

Because i have no idea if this is possible and cause i can't do it i hope you might be kind of help...

Thanx everybody in advance...

niteice Nov 27th, 2006 8:58 PM

What do you mean? Are you looking for certain content, or do you have lots of blank pages to examine for any sort of content?

mackenga Dec 1st, 2006 5:01 PM

I wouldn't actually write software to do this; I'd just use wget to pull in the files then grep my local copies. Two shell commands isn't enough hassle to convince me to eliminate the manual element.

If you do want to write a program to do this, you could still use wget to do most of the work if you don't mind calling an external program; if all the links to the (say) 1000 files are at http://host/some/path/index.html then you can do:

:

wget -r -l 1 http://host/some/path/index.html

If the situation is a little more complex you might need a fruitier collection of arguments to wget, but this is definitely the way I'd go.

I'd tend to go for Perl for this problem if you really want to write the software to retrieve the files and skim through them for a substring yourself; Perl's regular expressions make it ideal for the searching part of the job. JavaScript - by which I assume you mean JScript running inside WSH rather than JavaScript embedded in a web page (which can't do much) - could manage the skimming part but I'm not sure off the top of my head how I'd go about retrieving the files over HTTP. Heh, I'd probably call wget from the JScript then using the FileSystemObject to open the files and read through them actually.

Anyway, hope this helps.


All times are GMT -5. The time now is 10:10 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC