![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Programmer
Join Date: Oct 2005
Posts: 54
Rep Power: 4
![]() |
web crawling PHP
Hello,
I am supposed to construct a page that searches in specific websites to extract information, like those sites from where you can rent a car for example. There is a form in the site where the user selects some fields (for instance departure and drop-off date), then the data are submitted to the other page that searches 2-3 sites and finds which cars are available on those dates. I wanted to ask if there are ready scripts to do that, if not, some hints on how to start. I am familiar with PHP forms and data extraction from mysql databases, but when you extract data from other sites, I have no clue how I can begin and deal with it... |
|
|
|
|
|
#2 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
You might search the forum for previous threads concerning Beautiful Soup. I've not used it, but Arevos is expert with it, and his name would serve as an adjunct to the search term.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
#3 | |
|
Banned
![]() ![]() |
Quote:
Sorry ktsirig, I don't know of any parsing libraries for PHP. You could always go searching for one? Otherwise, look on php.net for functions related to string searching and stripping. You'll also need the functions for retrieving pages from an external server. Searching on google will find tons of things for this, I'm sure. |
|
|
|
|
|
|
#4 |
|
Resident Grouch
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Jun 2005
Posts: 6,453
Rep Power: 10
![]() |
The PHP went ziiiinggggggg right by my head, but then again, I'm old and blind. On a properly set up server, PHP can grab web pages much like files. You can, of course, parse with string functions and regex. Be nice if there were something like Beautiful Soup for PHP, though (and maybe there is, I haven't looked).
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code. Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers |
|
|
|
|
|
#5 |
|
Hobbyist Programmer
|
If the form you are calling uses GET, you can form your search beforehand, then call the search results directly through the url. From there copy the html returned into a variable. I don't know of any available libraries to help with the parsing either. To extract the contents of a page, you can use fopen, or use libcurl and from there parse it with regex. To grab remote files using fopen you have to have the server you are on set to allow it. You can check your php.ini file for allow_url_fopen
__________________
#programmingforums relay - http://thegupstudio.com/cgi-bin/pforelay.cgi freelance scripts - http://ryanguthrie.com/index.html |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|