![]() |
Newbie to python: trying to submit forms
I'm trying to automate some form submission on a mediawiki site. I've never done anything like this before, and I don't know anything about internet related programming except what I've learned trying to do this.
I am aware of the existance of the python wikipedia robot framework, but I can't figure out how to use it for the simple tasks (or at all). I'd prefer to understand what I'm doing anyway. From the wikipedia.py file in that framework, I found this code: :
def putPage(self, text, comment = None, watchArticle = False, minorEdit = True, newPage = False, token = None, gettoken = False, sysop = False):It appears to submit a page to wikipedia, but I don't understand how, and can't do a simple similar operation myself. I think that if I can understand how to use the POST method with python I can figure it out. The thread here seems to show how to do this, but I can't make it work. I googled and found this, which, again seems to tell me exactly what to do, but I can't make it work. Here is my attempt using the interpreter: :
Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32I expected it to replace the SandBox content with the text "test1", with "This is the first test" in the summary box. So, it didn't work, but I don't know why. Should there be some reference to the name of the form ("editform")? I imagine if I do this I'll be able to log in, and append code to multiple pages without having to do it manually in Firefox. Say, add [[Category:users]] to each page in a list. Any and all pointers are welcome, even if you think using Python is the wrong way to go about doing this. Thanks. |
For a self-proclaimed newbie to Python, you've gone about trying to solve the problem in a very intelligent and sensible way. For the most part, you appear to do everything perfectly correct.
The only thing I can spot that might be wrong with it is the URL you send the POST data to. I took a look at the source for the Sandbox edit page, and noticed this line: :
<form id="editform" name="editform" method="post" action="/w/index.php?title=Try changing your urllib call to: :
f = urllib.urlopen("http://en.wikipedia.org/w/index.php?title=Wikipedia:Sand:
{{Please leave this line alone (sandbox heading)}}:
urllib.urlencode({ |
Quote:
Quote:
I tested the script on another form here. And it worked: :
Python 2.2.3 (#42, May 30 2003, 18:12:08) [MSC 32 bit (Intel)] on win32So I knew I was doing everything right. I went back to try again, but this time simply opening the URL, and not trying to submit any form: :
>>> f = urllib.urlopen("http://en.wikipedia.org")...followed by: :
import urllib2So I'm getting a 403, but I can't tell why. Next I tried the code on this page: :
>>> params = urllib.urlencode({:
>>> params = urllib.urlencode({I have no idea what to do next. I'll have another look at the code in the pyWikipedia files later. |
I'll take a look at this tonight, if I have time. If it helps, here's the edit form with all non-form tags removed. It might show you something you're missing.
:
<form id="editform" name="editform" method="post" action="/w/index.php?title=Wikipedia:Sandbox&action=submit" enctype="multipart/form-data"> |
30 minutes isn't a long time to be able to edit posts for, but when I wrote this:
Quote:
Thanks for that form info. I tried putting wpStarttime and wpEdittime in the params as well, but still with no luck. :
>>> params = urllib.urlencode({ |
Maybe try changing wpSave from 1 to "Save page"?
|
A good tip if you use Firefox is to use the live HTTP headers extension so you can see exactly what POST data and headers are being sent to each page, so you can copy the full string and manipulate the bits as need be.
|
Quote:
Quote:
I might have to go back to square one on this. I have some links that I'll give a look to when i get the chance: http://effbot.org/librarybook/httplib.htm might be a new way to approach this in there, by putting different headers in etc. http://comments.gmane.org/gmane.scie...echnical/21150 Quote:
Incidentally, I tried subnitting the Login form with python, but with no luck. the follwing code in a HTML file and opened in FF allows me to login. However, a similar code to submit the edit page form (which I can't seem to recreate) didn't work. :
<form name="userlogin" method="post" action="http://en.wikipedia.org/w/index.php?title=Special:Userlogin&action=submitlogin&type=login">Once again, I'm stumped. Have you ever tried to do this in python or any other language. I never imagined a few batch operations would proove so difficult if you try to involve the internet... :/ |
I regularly use Python to automate submission forms. I'm sorry to say i've never ran into major problems like this really. Just compare what you're sending, make sure the server isn't doing anything funky because of your user agent (can't imagine Wikipedia browser sniffing though, to be honest), and it normally Just Works.
Live headers doesn't show you what you can't figure out from the page source if you can be bothered to trace it through and have an understanding of what headers the browser sends to the server. This is much more time consuming than just submitting the page and looking at what headers were sent to get to the GET or POST data. |
You could also create a quick Python program that echos all TCP information it receives to STDOUT. Point your form submission program to localhost, and get the output. Then open your browser to the Wikipedia edit page. Open your hosts file and alias en.wikipedia.org to 127.0.0.1, and then try pressing the save button. The browser's output should be caught by your TCP logger. You can then compare the HTTP request from the browser, to the HTTP request you're sending from python.
|
| All times are GMT -5. The time now is 6:20 PM. |
Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC