View Single Post
Old Jan 2nd, 2006, 10:39 AM   #1
Steveire
Newbie
 
Join Date: Jan 2006
Posts: 13
Rep Power: 0 Steveire is on a distinguished road
Newbie to python: trying to submit forms

I'm trying to automate some form submission on a mediawiki site. I've never done anything like this before, and I don't know anything about internet related programming except what I've learned trying to do this.

I am aware of the existance of the python wikipedia robot framework, but I can't figure out how to use it for the simple tasks (or at all). I'd prefer to understand what I'm doing anyway.

From the wikipedia.py file in that framework, I found this code:
def putPage(self, text, comment = None, watchArticle = False, minorEdit = True, newPage = False, token = None, gettoken = False, sysop = False):

        """

        Upload 'text' as new contents for this Page by filling out the edit

        page.

        

        Don't use this directly, use put() instead.

        """

        safetuple = () # safetuple keeps the old value, but only if we did not get a token yet could

        # TODO: get rid of safetuple

        if self.site().version() >= "1.4":

            if gettoken or not token:

                token = self.site().getToken(getagain = gettoken, sysop = sysop)

            else:

                safetuple = (text, comment, watchArticle, minorEdit, newPage, sysop)

        # Check whether we are not too quickly after the previous putPage, and

        # wait a bit until the interval is acceptable

        put_throttle()

        # Which web-site host are we submitting to?

        host = self.site().hostname()

        # Get the address of the page on that host.

        address = self.site().put_address(self.urlname())

        # If no comment is given for the change, use the default

        if comment is None:

            comment=action

        # Use the proper encoding for the comment

        comment = comment.encode(self.site().encoding())

        # Encode the text into the right encoding for the wiki
        text = text.encode(self.site().encoding())

        predata = [

            ('wpSave', '1'),

            ('wpSummary', comment),

            ('wpTextbox1', text)]

        # Except if the page is new, we need to supply the time of the

        # previous version to the wiki to prevent edit collisions

        if newPage:

            predata.append(('wpEdittime', ''))

        else:

            predata.append(('wpEdittime', self._editTime))

        predata.append(('wpStarttime', self._startTime))            

        # Pass the minorEdit and watchArticle arguments to the Wiki.

        if minorEdit:

            predata.append(('wpMinoredit', '1'))

        if watchArticle:

            predata.append(('wpWatchthis', '1'))

        # Give the token, but only if one is supplied.

        if token:

            predata.append(('wpEditToken', token))

        # Encode all of this into a HTTP request

        data = urlencode(tuple(predata))

        

        if newPage:

            output('Creating page %s' % self.aslink())

        else:

            output('Changing page %s' % self.aslink())

        # Submit the prepared information

        conn = httplib.HTTPConnection(host)

    

        conn.putrequest("POST", address)

        conn.putheader('Content-Length', str(len(data)))

        conn.putheader("Content-type", "application/x-www-form-urlencoded")

        conn.putheader("User-agent", "PythonWikipediaBot/1.0")

        if self.site().cookies():

            conn.putheader('Cookie', self.site().cookies(sysop = sysop))

        conn.endheaders()

        conn.send(data)

It appears to submit a page to wikipedia, but I don't understand how, and can't do a simple similar operation myself.

I think that if I can understand how to use the POST method with python I can figure it out. The thread here seems to show how to do this, but I can't make it work. I googled and found this, which, again seems to tell me exactly what to do, but I can't make it work. Here is my attempt using the interpreter:
Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> params = urllib.urlencode({'wpTextbox1': 'test1', 'wpCommment': 'This is the
 first test', 'wpSave':1})
>>> f = urllib.urlopen("http://en.wikipedia.org/w/index.php?title=Wikipedia:Sand
box&action=edit", params)
>>> print f.read()
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.or
g/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859
-1">
<TITLE>ERROR: The requested URL could not be retrieved</TITLE>
<STYLE type="text/css"><!--BODY{background-color:#ffffff;font-family:verdana,san
s-serif}PRE{font-family:sans-serif}--></STYLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The requested URL could not be retrieved</H2>
<HR noshade size="1px">
<P>
While trying to retrieve the URL:
<A HREF="http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&amp;action=
edit">http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&amp;action=edi
t</A>
<P>
The following error was encountered:
<UL>
<LI>
<STRONG>
Access Denied.
</STRONG>
<P>
Access control configuration prevents your request from
being allowed at this time.  Please contact your service provider if
you feel this is incorrect.
</UL>
<P>Your cache administrator is <A HREF="mailto:wikidown@bomis.com">wikidown@bomi
s.com</A>.


<BR clear="all">
<HR noshade size="1px">
<ADDRESS>
Generated Mon, 02 Jan 2006 16:12:47 GMT by mayflower.knams.wikimedia.org (squid/
2.5.STABLE12)

I expected it to replace the SandBox content with the text "test1", with "This is the first test" in the summary box. So, it didn't work, but I don't know why. Should there be some reference to the name of the form ("editform")?

I imagine if I do this I'll be able to log in, and append code to multiple pages without having to do it manually in Firefox. Say, add [[Category:users]] to each page in a list.

Any and all pointers are welcome, even if you think using Python is the wrong way to go about doing this.

Thanks.
Steveire is offline   Reply With Quote