|
Newbie
Join Date: Jan 2006
Posts: 13
Rep Power: 0 
|
Newbie to python: trying to submit forms
I'm trying to automate some form submission on a mediawiki site. I've never done anything like this before, and I don't know anything about internet related programming except what I've learned trying to do this.
I am aware of the existance of the python wikipedia robot framework, but I can't figure out how to use it for the simple tasks (or at all). I'd prefer to understand what I'm doing anyway.
From the wikipedia.py file in that framework, I found this code:
def putPage(self, text, comment = None, watchArticle = False, minorEdit = True, newPage = False, token = None, gettoken = False, sysop = False):
"""
Upload 'text' as new contents for this Page by filling out the edit
page.
Don't use this directly, use put() instead.
"""
safetuple = () # safetuple keeps the old value, but only if we did not get a token yet could
# TODO: get rid of safetuple
if self.site().version() >= "1.4":
if gettoken or not token:
token = self.site().getToken(getagain = gettoken, sysop = sysop)
else:
safetuple = (text, comment, watchArticle, minorEdit, newPage, sysop)
# Check whether we are not too quickly after the previous putPage, and
# wait a bit until the interval is acceptable
put_throttle()
# Which web-site host are we submitting to?
host = self.site().hostname()
# Get the address of the page on that host.
address = self.site().put_address(self.urlname())
# If no comment is given for the change, use the default
if comment is None:
comment=action
# Use the proper encoding for the comment
comment = comment.encode(self.site().encoding())
# Encode the text into the right encoding for the wiki
text = text.encode(self.site().encoding())
predata = [
('wpSave', '1'),
('wpSummary', comment),
('wpTextbox1', text)]
# Except if the page is new, we need to supply the time of the
# previous version to the wiki to prevent edit collisions
if newPage:
predata.append(('wpEdittime', ''))
else:
predata.append(('wpEdittime', self._editTime))
predata.append(('wpStarttime', self._startTime))
# Pass the minorEdit and watchArticle arguments to the Wiki.
if minorEdit:
predata.append(('wpMinoredit', '1'))
if watchArticle:
predata.append(('wpWatchthis', '1'))
# Give the token, but only if one is supplied.
if token:
predata.append(('wpEditToken', token))
# Encode all of this into a HTTP request
data = urlencode(tuple(predata))
if newPage:
output('Creating page %s' % self.aslink())
else:
output('Changing page %s' % self.aslink())
# Submit the prepared information
conn = httplib.HTTPConnection(host)
conn.putrequest("POST", address)
conn.putheader('Content-Length', str(len(data)))
conn.putheader("Content-type", "application/x-www-form-urlencoded")
conn.putheader("User-agent", "PythonWikipediaBot/1.0")
if self.site().cookies():
conn.putheader('Cookie', self.site().cookies(sysop = sysop))
conn.endheaders()
conn.send(data)
It appears to submit a page to wikipedia, but I don't understand how, and can't do a simple similar operation myself.
I think that if I can understand how to use the POST method with python I can figure it out. The thread here seems to show how to do this, but I can't make it work. I googled and found this, which, again seems to tell me exactly what to do, but I can't make it work. Here is my attempt using the interpreter:
Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> params = urllib.urlencode({'wpTextbox1': 'test1', 'wpCommment': 'This is the
first test', 'wpSave':1})
>>> f = urllib.urlopen("http://en.wikipedia.org/w/index.php?title=Wikipedia:Sand
box&action=edit", params)
>>> print f.read()
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.or
g/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859
-1">
<TITLE>ERROR: The requested URL could not be retrieved</TITLE>
<STYLE type="text/css"><!--BODY{background-color:#ffffff;font-family:verdana,san
s-serif}PRE{font-family:sans-serif}--></STYLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The requested URL could not be retrieved</H2>
<HR noshade size="1px">
<P>
While trying to retrieve the URL:
<A HREF="http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&action=
edit">http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&action=edi
t</A>
<P>
The following error was encountered:
<UL>
<LI>
<STRONG>
Access Denied.
</STRONG>
<P>
Access control configuration prevents your request from
being allowed at this time. Please contact your service provider if
you feel this is incorrect.
</UL>
<P>Your cache administrator is <A HREF="mailto:wikidown@bomis.com">wikidown@bomi
s.com</A>.
<BR clear="all">
<HR noshade size="1px">
<ADDRESS>
Generated Mon, 02 Jan 2006 16:12:47 GMT by mayflower.knams.wikimedia.org (squid/
2.5.STABLE12)
I expected it to replace the SandBox content with the text "test1", with "This is the first test" in the summary box. So, it didn't work, but I don't know why. Should there be some reference to the name of the form ("editform")?
I imagine if I do this I'll be able to log in, and append code to multiple pages without having to do it manually in Firefox. Say, add [[Category:users]] to each page in a list.
Any and all pointers are welcome, even if you think using Python is the wrong way to go about doing this.
Thanks.
|