![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#241 |
|
Professional Programmer
Join Date: Apr 2005
Location: London, England
Posts: 459
Rep Power: 4
![]() |
As for things slowing to a crawl - why not call time.sleep during the intensive loops periodically to free up resources to other processes so your computer doesn't become unusable?
|
|
|
|
|
|
#242 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
I'm unfamiliar with that bit of code Cerulean, could you elaborate a bit as to what it does?
Also, is there a way to have the program check the cache and skip any company links already found there thus adding only new data to an HTML as opposed to building a new data.html from scratch beginning with the data I already have? |
|
|
|
|
|
#243 | ||
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4
![]() |
Quote:
import time time.sleep(10) # sleep for 10 seconds from time import sleep sleep(10) # sleep for 10 seconds The same holds true for any module: import some_module some_module.some_function() from some_module import some_function some_function() Quote:
for company_url in get_company_urls(company_index):
if cache.has_key(company_url):
file.write(get_company_data(company_url))
print get_company_data(company_url)
# And remember to pause so the server isn't overloaded:
sleep(1) |
||
|
|
|
|
|
#244 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
So I ran the program again, and got an error on the American College of Beirut, again. I was curious how I would add in the debugging code to the following :
file = open("data.html", "w")
file.write("<table>\n") # \n means add a newline
for industry_url in get_industry_urls(industry_page):
company_index = get_company_index(industry_url)
for company_urls in get_company_urls(company_index):
if cache.has_key(company_urls):
file.write(get_company_data(company_urls))
print get_company_data(company_urls)
file.write("</table>\n")
file.close()and rather than sys.exit, can I just have it skip over any errors? |
|
|
|
|
|
#245 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
update:
I made a little work-around that involves a bit of manual work, but at the same time, I don't have to wait for the program to do all the URLS that it's already done. I edited the code and have been running the program once for each industry... it's a little bit of work but it gets the job done. (and the data files aren't 800mb ) |
|
|
|
|
|
#246 | ||
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4
![]() |
Quote:
print company_urls Quote:
from urllib2 import urlopen, HTTPError
...
try:
file.write(get_company_data(company_urls))
except HTTPError:
print "HTTP error occurred for '%s'" % company_urls
print """Without this block, Python would just quit.
Instead, we've overridden the default behaviour, and we print
this message instead of quitting. You could even put in some
code to let Python try the URL again, to see if it works if we
do it a second time.""" |
||
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|