Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Nov 11th, 2007, 9:37 AM   #1
equinox
Newbie
 
Join Date: Nov 2007
Posts: 8
Rep Power: 0 equinox is on a distinguished road
Network Programming Help.

Hi all.

I'm a second year soft ware engineering student and we've been given an assignment in network programming. The assignment specification is to take a URL as a command line argument, download it scorce, examine all it's links and finally, print out a summary of how many broken links occurred.

I've only just started this and I have a problem. I'm un-sure as to how I would find out weather a piece of HTML is actually a link or not. Our lecturer gave us a link to this example.

http://www.exampledepot.com/egs/java.../GetLinks.html

Now I've never done network programming before and this seems very hard to understand to me as the writer didn't comment the code or list the packages he used. Could anyone give me a brief explanation of how I would test a page to find links.??

I've been able to figure out how to read in a url and print out it's scorce without any problem and I've included this code below.

Thanks .

//THIS IS JUST A TEST!!!!!!!!


import java.net.*;
import java.io.*;

public class WebProject
{

	public static void main(String [] args)
	{	
		List links = new ArrayList();	//will be used to store any links


		try
		{
			URL address = new URL(args[0]);  //take in the url to be searched from args
			
			BufferedReader in = new BufferedReader(new InputStreamReader(address.openStream()));    //open up a "link" to the url
			PrintWriter myFile = new PrintWriter("output.HTML");   //create a printwroter to write the contents to a file, output.html
			
			String input = "";

			while((input = in.readLine()) != null) //continue while the is still data to read
			{
				myFile.println(input);  //print out what has just been read
			}

			in.close();  //close the link
			myFile.close();  //close and save output

		}
	
		catch(MalformedURLException e)
		{
			System.out.println("Error!!, the supplied URL doesn't exist. ");  //an error message
			System.out.println(args[0]);
		}
	
		catch(IOException e)
		{
			System.out.println("Error!!, data handeling problem");  //another warning message
		}
	}

}
equinox is offline   Reply With Quote
Old Nov 11th, 2007, 6:12 PM   #2
Jabo
Not a user?
 
Join Date: Sep 2007
Posts: 245
Rep Power: 1 Jabo is on a distinguished road
Re: Network Programming Help.

I would think a normal ping would work, as in if you ping the url, it either is tied to the server or it's not. But then, I don't know that much about networking.
Jabo is offline   Reply With Quote
Old Nov 11th, 2007, 6:24 PM   #3
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Re: Network Programming Help.

You could connect to the server and still have a broken link. That is, the server might return a 404 page, for instance.

Your first step is to parse the page for all links. That's the emphasis of the link you posted. The next step is to follow all the links and see if you get a valid page returned (200 OK, for instance).
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Nov 11th, 2007, 10:34 PM   #4
ReggaetonKing
Sexy Programmer
 
ReggaetonKing's Avatar
 
Join Date: Nov 2005
Location: New Jersey
Posts: 891
Rep Power: 3 ReggaetonKing is on a distinguished road
Send a message via AIM to ReggaetonKing
Re: Network Programming Help.

HttpURLConnection has a method called getResponseCode() and returns 200 if its a valid page. Once you parse the links from the HTML source, you can put them into a list and iterate through the list of urls to see if they are valid. Here is a method that you could use. My Java skills are a bit rusty so bare with me.
public bool isValidUrl(String urlStr)
{
	try
	{
		java.net.URL url = new java.net.URL(urlStr);
		java.net.HttpURLConnection httpConn = (java.net.HttpURLConnection)url.openConnection();
		httpConn.connect();
		if(httpConn.getResponseCode() != 200)
			return false;
		else
			return true; //it does return 200 and is a valid link
	}
	catch(Exception e)
	{
		e.printStackTrace();
	}

}
__________________
I would love to change the world, but they won't give me the source code!
ReggaetonKing is offline   Reply With Quote
Old Nov 15th, 2007, 1:25 PM   #5
equinox
Newbie
 
Join Date: Nov 2007
Posts: 8
Rep Power: 0 equinox is on a distinguished road
Re: Network Programming Help.

Thanks guys, I got that code in the link to work (after a few hours and about 5 pints of cofee !! ). I have all my methods worked out so putting the program together should be a breeze. Thanks .
equinox is offline   Reply With Quote
Old Nov 21st, 2007, 9:48 PM   #6
null_ptr0
11 years old
 
Join Date: Nov 2007
Posts: 79
Rep Power: 1 null_ptr0 is on a distinguished road
Re: Network Programming Help.

java Syntax (Toggle Plain Text)
  1. import java.net.URL;
  2. import java.net.URLConnection;
  3. import java.net.HttpURLConnection;
  4. import java.util.Vector;
  5. import java.util.regex.Pattern;
  6. import java.util.regex.Matcher;
  7.  
  8. class URLCrawler {
  9. public static void main(String[] argv) {
  10. if(argv.length != 1)
  11. System.out.println("Arguments: <address (String)>");
  12. else
  13. checkAddresses(parseAddressese(downloadSource(address))));
  14. }
  15.  
  16. private String downloadSource(String url) {
  17. byte[] read;
  18. try {
  19. URL url = new URL(address);
  20. URLConnection urlc = url.openConnection();
  21. InputStream is = urlc.getInputStream();
  22. read = new byte[urlc.getContentLength()];
  23. is.read(read);
  24. is.close();
  25. } catch(IOException ioex) {
  26. ioex.printStackTrace();
  27. System.exit(1);
  28. }
  29. return new String(read);
  30. }
  31.  
  32. private String[] parseAddresses(String html) {
  33. String regex = "a\\s+[^>]*?class=l\\s+[^>]*?href\\s?=[\\s'\"]+(.*?)['\"]+.*?>[^<]*</a>";
  34. Pattern p = Pattern.compile(regex);
  35. Matcher m = p.matcher(html);
  36. Vector addresses = new Vector<String>();
  37. while(m.find())
  38. addresses.addElement(m.group());
  39. addresses.trimToSize();
  40. return addresses.toArray(new String[0]);
  41. }
  42.  
  43. private void checkAddresses(String[] urls) {
  44. int i = 0;
  45. for(String url : urls)
  46. i = (isBroken(url) ? i + 1 : i);
  47. System.console().format("%s urls extracted were broken and %s were in tact, out of %s urls", i, urls.length - i, urls.length);
  48. }
  49.  
  50. private boolean isBroken(String url) {
  51. try {
  52. URL u = new URL(url);
  53. HttpURLConnection huc = (HttpURLConnection) u;
  54. return (!huc.getResponseCode() == 200) ? true : false;
  55. } catch(IOException ioex) {
  56. ioex.printStackTrace();
  57. }
  58. return true;
  59. }
  60. }
That's what I programmed in 5 minutes.
null_ptr0 is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
best network programming language ShadowAssasin C++ 11 Jun 13th, 2006 11:18 AM
Network programming jayme C++ 14 Dec 4th, 2005 9:15 AM
Network programming newb cyberphreak C++ 16 Oct 24th, 2005 7:47 PM
Need examples of Network programming using /dev/tcp and /dev/udp Ed Toro Bash / Shell Scripting 3 Sep 7th, 2005 4:09 PM
New to perl: Network programming question nick Perl 1 Aug 22nd, 2005 4:44 AM




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 1:28 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC