Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   C++ (http://www.programmingforums.org/forum15.html)
-   -   Web Client (http://www.programmingforums.org/showthread.php?t=11426)

hbe02 Sep 29th, 2006 10:54 AM

Web Client
 
im trying to extract information from a website..
Ive tested my client on my university's email login page
IP is
:

"193.188.129.122"
i dont know when to stop reading.. i never get the
:

Connection: close\r\n
in what case do i exit this loop?
:

while(flag == true)
        {
                char * reply = new char[1024];
                recv(com->sd,reply,1024,0);
               
               
                delete [] reply;       
        }

this is the output my program gives:
:

Communication With Server on Socket 1992 Initiated
Sending: GET / HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-
excel, application/vnd.ms-powerpoint, application/msword, application/x-shockwav
e-flash, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322
; .NET CLR 2.0.50727)
Host: 192.168.141.107
Connection: Keep-Alive


RECIEVING
HTTP/1.1 200 OK
Date: Fri, 29 Sep 2006 14:23:36 GMT
Server: Apache/1.3.33 Ben-SSL/1.55 (Unix) PHP/4.3.4
X-Powered-By: PHP/4.3.4
Set-Cookie: Horde=5bf5563986bdcf4b05cc5e03ea9c8e82; path=/; domain=192.168.141.1
07
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=ISO-8859-1

e16
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-trans
itional.dtd">
<!-- IMP: Copyright 2001, The Horde Project. IMP is under the GPL. -->
<!-- Horde Project: http://horde.org/ | IMP: http://horde.org/imp/ -->
<!--    GNU Public License: http://www.fsf.org/copyleft/gpl.html  -->
<html lang="en-US"><head>
<title>Imail :: Welcome to Imail</title>
<link rel="SHORTCUT ICON" href="/imp/graphics/favicon.ico" type="image/x-icon" /
>
<link href="/css.php?app=imp" rel="stylesheet" type="text/css" />
</head>

<bo
.
.
.
more html stuff
.
.
.
<BR>
<table>
</tr></table>

</body>
</html>
<!--
<div align=center>
<br><br>
<table border="0" width="55%" style="border-style: dotted; border-width:
1px" bgcolor="#444466">
        <tr>
                <td>
                <p align="center" dir="ltr"><font color="#FFFFFF"><b>IMPORTANT
NOTICE:</b></font><br>
                <br>
                <font color="#FFFFFF">-</font> <a href="http://www.aub.edu.lb/we
bmail/"><font
color="#FF0000">WEBMAIL</font></a> <font color="#FFFFFF">is
                being phased out but can still be accessed at:</font><br>
        <a href="http://www.aub.edu.lb/webmail/">
                <font
color="#FF0000">http://www.aub.edu.lb/webmail/</font></a><br>
                <br><font color="#FFFFFF">- Support to Webmail will end
soon! So, please You are urged to export all your messages from Webmail to Imail
 by sending a request to imail@aub.edu.lb.</a><br>
              <br>  (This notice is for AUB faculty members only)</
4c
font>
</p>
                <p>&nbsp;</td>
        </tr>
</table>
</div>
-->

0

════════════════════════════════════════════════════════════════════════════════
════════════════════════════════════════════════════════════════════════════════
════════════════════════════════════════════════════════════════════════════════
════════════════════════════════════════════════════════════════════════════════
════════════════════════════════════════════════════════════════════════════════
════════════════════════════════════════════════════════════════════════════════
════════════════════════════════════════════════════════════════════════════════
════════════════════════════════════════════════════════════════════════════════
════════════════════════════════════════════════════════════════════════════════
════════════════════════════════════════════════════════════════════════════════
════════════════════════════════════════════════════════════════════════════════
════════════════════════════════════════════════════

then the program hangs until the timeout on the website, then bursts out "=======" forever.. i guess this is becasue the website cut the TCP connection after the timeout and my program continued to recieve and output the result..

sarumont Sep 29th, 2006 2:06 PM

You are opening a persistent connection.

:

Connection: Keep-Alive

This is most likely due to HTTP/1.1; try using HTTP/1.0.

climbnorth Oct 1st, 2006 1:21 AM

Look at this page for using recv, should help:

http://www.opengroup.org/onlinepubs/...ions/recv.html

Realize that recv returns a value. That value is how many bytes were read.
So read until it stops receiving bytes. Remember, it returns negative if there was an error, so handle that too.

This operation will block- which means, it will hang your application. This is not so great if your application wants to do other stuff while waiting for this data. To solve, you should put this into a seperate thread (unless your app doesn't do anything until it gets the data- in which case its ok BUT if this is GUI app, use thread or GUI will hang).

Also, you shouldn't put the declaration of your pointer inside the loop.
Also, you should consider not new/delete over and over in the loop. Instead, write to a buffer and reallocate the size as needed. There is a function called realloc that can help.


--

sarumont Oct 3rd, 2006 11:18 AM

Quote:

Originally Posted by climbnorth (Post 115405)
Look at this page for using recv, should help:

http://www.opengroup.org/onlinepubs/...ions/recv.html

Realize that recv returns a value. That value is how many bytes were read.
So read until it stops receiving bytes. Remember, it returns negative if there was an error, so handle that too.

OP does have it only reading 1024 bytes at a time, but that can still be problematic for this reason. I would concur with climbnorth's suggestion of threading the recv(); alternatively, OP could use select() rather than recv() to set a timeout on the read.


All times are GMT -5. The time now is 1:02 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC