![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#61 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
One step at a time. There are a couple of lines in your code that suggest that you don't understand the meaning behind the code.
I think the most important concept you're missing is scope. In Python, some variables are local. This means that they don't exist outside of where they are created. They exist only in a small part of the program. Take the function below: def get_industry_urls(industry_page):
soup = BeautifulSoup(urlopen(industry_page))
links = soup.fetch("table")[7].fetch("a")
return [a['href'] for a in links]The same thing happens with "industry_page". When this function is called, there is not one, but two variables called "industry_page". One exists outside the function, and is global. One exists inside the function, and is local. If this sounds confusing, that's because it is. That's why it makes sense to give the local industry_page a different name: def get_industry_urls(url):
soup = BeautifulSoup(urlopen(url))
links = soup.fetch("table")[7].fetch("a")
return [a['href'] for a in links]Because of this, the line below does nothing in your program. You can remove it without altering your program's flow: industry_url = get_industry_urls(industry_page) def get_company_index(industry_url):
soup = BeautifulSoup(urlopen(industry_url))
company_links = soup.fetch("table")[11].fetch("a")[3]
return [a['href'] for a in company_links]
print get_company_index("http://biz.yahoo.com/ic/112.html")
raw_input("Press enter to continue...") |
|
|
|
|
|
#62 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
so that piece of code will work in isolation?
|
|
|
|
|
|
#63 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Also, think about what I said before. Do you really need a list comprehension, if company_links contains a single value?
And think about what the function is doing. "get_company_index" gets the company index from one page. The for-loop it is in handles the repetition. However, you're almost there, and you're making progress . Programming is a hard skill to get, but once you have the basics; once you get a feel for the "zen", if you like, then things suddenly become a lot clearer.The problem is getting that initial understanding, which is terribly hard. |
|
|
|
|
|
#64 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
Think of functions like tools. A chainsaw can be used for many purposes, many times over. A bottle-opener is more specific, but it too can be used in many locations. |
|
|
|
|
|
|
#65 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
okay I see where you're going... I'm still a bit confused with the purpose of the return function is. I guess I was trying to replicate your initial function because that's the only one that has worked thus far, which included a list comprehension, I'm going to look at the code some more.
going back to the function, if we replace the actual hyperlink with a value such as "industry_page" then it won't work, correct, we need to include that definition in order for it to understand what we're referring to... (basic but I'm just checking) |
|
|
|
|
|
#66 | |||
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
def foo(): return 1 + 2 2 + 3 def bar(): 1 + 2 return 2 + 3 x = foo() # x will be 3 y = bar() # y will be 5 Functions take in arguments, and return a value. The arguments are the input, the return value is the output. (Incidentally, functions can have no arguments, and they can have no return value. However, most functions usually have either a return value or arguments - often both) Quote:
![]() Quote:
However, the exception to this is in function arguments. If I have the function: def double(x): return x * 2 print double(6) |
|||
|
|
|
|
|
#67 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
okay, so I tried to get the basic framework of how this works (I haven't figured out exactly which table etc each link is contained) but how does this look:
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
industry_page = "http://biz.yahoo.com/ic/ind_index.html"
def get_industry_urls(industry_page):
soup = BeautifulSoup(urlopen(industry_page))
links = soup.fetch("table")[7].fetch("a")
return [a['href'] for a in links]
def get_company_index(industry_url):
soup = BeautifulSoup(urlopen(industry_url))
index_link = soup.fetch("table")[10].fetch("a")[3] #not sure which table, looks like the 4th link though
return [a for a in index_link]
#def get_company_urls(company_index):
#soup = BeautifulSoup(urlopen(company_index))
#urls = soup.fetch("table")[12].fetch("a") #not sure of this table either
#return[a for a in urls]
#def get_company_data(company_url)
#soup = BeautifulSoup(urlopen(company_url))
#data[0] = soup.fetch("table")[?] #figure out these tables
#data[1] = soup.fetch("table")[?]
#return data
#for industry_url in get_industry_urls(industry_page):
#company_index = get_company_index(industry_url)
#for company_url in get_company_urls(company_index):
#print get_company_data(company_url) #(well output but you know...)edit: i'm not sure if this makes sense - does the first for loop create the variable industry_url and store a value into it to be passed to get_company_index? Last edited by zem52887; May 19th, 2006 at 4:13 PM. |
|
|
|
|
|
#68 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
numbers = [1, 2, 3] for n in numbers print 2 * n numbers = [1, 2, 3] n = numbers[0] print 2 * n n = numbers[1] print 2 * n n = numbers[2] print 2 * n Let's say you have a number, say 5, and you want to double it. You could do something like this: x = 5 * 2 list = [1, 2, 3, 4, 5] doubled = [x * 2 for x in list] # doubled equals [2, 4, 6, 8, 10] list = [1, 2, 3, 4, 5] doubled = [] # start with a blank list for x in list: doubled.append(x * 2) # add x * 2 onto the new "doubled" list The key thing is that they only apply to lists. With get_industry_urls, a list of urls is returned: one for each industry. With get_company_index, only one url is returned: the url of the company index for a particular industry. |
|
|
|
|
|
|
#69 |
|
Hobbyist Programmer
Join Date: May 2006
Posts: 127
Rep Power: 3
![]() |
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
industry_url = "http://biz.yahoo.com/ic/110.html"
def get_company_index(industry_url):
soup = BeautifulSoup(urlopen(industry_url))
index_link = soup.fetch("table")[11].fetch("a")[2]
return [a for a in index_link]
print get_company_index(industry_url)it works! it works! edit: kinda works... it's returning the name of the link, and not the link itself... when I add in 'href' then it thinks that it's an index and gives me errors... I clearly am missing something but at least I know where company index is located, I just have to figure out how to get it to return the link itself, as opposed to the name of the link. |
|
|
|
|
|
#70 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
You're using a list comprehension when you don't need one, because you're dealing with a single item and not a list!
Think about what the list comprehension in my get_industry_urls function did, and how you'd apply that to a single item. |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|