View Single Post
Old Dec 13th, 2006, 3:27 AM   #3
headzoo
Newbie
 
Join Date: Oct 2006
Posts: 16
Rep Power: 0 headzoo is on a distinguished road
This might help you a little bit. It's a sloppy function I created a while back that grabs the contents of a document using HTTP via sockets. It follows 301 and 302 redirects. I've tweaked it a bit for your situation.

function get_http_document($url, &$headers = null, $port = 80, $timeout = 8) {
	global $paths;
	$paths[] = $url;
	$pURL = parse_url($url);
	
	if (empty($pURL['host'])) {
		return false;
	}

	$remotePath = 		(isset($pURL['path'])) ? $pURL['path'] : '/';
	$remoteDocument = 	(empty($pURL['query'])) ? $remotePath : $remotePath . '?' . $pURL['query'];
	
	if (!$fp = @fsockopen($pURL['host'], $port, $errno, $errstr, $timeout)) {
		return false;
	}
	
	$out = 	"GET $remoteDocument HTTP/1.0\r\n";	
	$out .= "Host: {$pURL['host']}\r\n";
	$out .=  "Connection: Close\r\n\r\n";
	fwrite($fp, $out);
	unset($out);
	
	$received = '';
	while (!feof($fp)) {
		$received .= fread($fp, 128);
	}
	fclose($fp);

	// Seperate the headers from the content
	$parts = explode("\r\n\r\n", $received, 2);
	$headers = $parts[0];
	$content = $parts[1];
	unset($parts);
	

	$headerParts = explode("\r\n", $headers);
	if (!preg_match('~HTTP/1\.\d ([\d]+)~i', $headerParts[0], $matches)) {
		return false;
	}
	$statusCode = $matches[1];
	
	if ($statusCode == 200) {
		return $content;
	} else if ($statusCode != 301 && $statusCode != 302) {
		return false;
	}
	
	if (!preg_match('~^Location:(.*)$~im', $headers, $matches)) {
		return false;
	}
	$newLocation = trim($matches[1]);
	return get_http_document($newLocation, $headers, $port, $timeout);
}

Be aware this uses the global variable $paths inside the function. That's just bad coding. But you would use the function like this:

$paths = array();
$content = get_http_document('http://www.webhostdir.com/banners/banman.asp');
print_r($paths);

The contents of the remote document will be in $content. Every website that was visited to get to the final destination will be in the $paths array. That array will look like this:

Array
(
    [0] => http://www.webhostdir.com/banners/banman.asp?ZoneID=9&Task=Click&Mode=HTML&SiteID=1&PageID=
    [1] => http://www.webintellects.com
)

That shows you that the first site visited was webhostdir.com. Then it went to webintellects.com. And so on.

- Sean
headzoo is offline   Reply With Quote