Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Jun 6th, 2005, 10:34 PM   #1
Komodo
Hobbyist Programmer
 
Komodo's Avatar
 
Join Date: May 2005
Location: Scranton, PA
Posts: 112
Rep Power: 0 Komodo is an unknown quantity at this point
Send a message via AIM to Komodo Send a message via MSN to Komodo
Site Crawler

http://"username":"password"@www.crazedmindz.com:2082/frontend/x/index.html
would log me in to the CPanel for crazedmindz.com


Would a script be able to crawl a password protected page like that,
and use something like substr() and strpos() to grab certain
information from the page, like first search the whole page for any
links to pages that are also on that website, and then search the page
for links to a certain type of file, and add them all to a database,
and then go through the pages that were found and do the same thing?


I ain't gonna lie, I plan on using this to crawl porn sites.

Please help...
Komodo is offline   Reply With Quote
Old Jun 10th, 2005, 9:25 AM   #2
skuinders
Hobbyist Programmer
 
skuinders's Avatar
 
Join Date: Jun 2005
Location: MA, US
Posts: 204
Rep Power: 4 skuinders is on a distinguished road
I wouldn't try doing that with PHP... since you are really dealing with the site(s) from the client side, it is not appropriate to use a server side scripting language. I would dump the page source to a local file then write a bash/perl/octave/?whatever script that parses through it looking for href tags and file extensions etc. and adds the desired information to your db.
__________________
"A stupid man's report of what a clever man says can never be accurate, because he unconciously translates what he hears into something he can understand."
- B. Russell

http://web.bryant.edu/~srk2
skuinders is offline   Reply With Quote
Old Jun 10th, 2005, 9:39 AM   #3
tempest
Programming Guru
 
tempest's Avatar
 
Join Date: Oct 2004
Posts: 1,041
Rep Power: 5 tempest is on a distinguished road
Send a message via ICQ to tempest Send a message via AIM to tempest Send a message via Yahoo to tempest
Well yes, if you have the username and password of course this is possible. What you need to do is use sockets and send a specially crafted HTTP packet asking for the page using those credentials. Possibly more than one packet, which will require analyzing how the auth has to work. Good luck, tempest.

Edit: This should be of some help: http://www.faqs.org/rfcs/rfc2617 .
__________________


Last edited by tempest; Jun 10th, 2005 at 10:16 AM.
tempest is offline   Reply With Quote
Old Jun 26th, 2005, 10:15 PM   #4
foxcity911
Programmer
 
Join Date: Jun 2005
Location: Queensland
Posts: 37
Rep Power: 0 foxcity911 is on a distinguished road
ive tried to do something similar to this, except trying to find my usage stats from my isp. i made a form with hidden inputs and pre-written values, and it just redirects to the page i wanted to goto and submits the info. its a quick way to view ur stats. u cant view the stuff thats inside the the secure page tho, u need a machine based language, gnome maybe.
foxcity911 is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 9:32 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC