Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Jan 5th, 2006, 12:11 PM   #11
Steveire
Newbie
 
Join Date: Jan 2006
Posts: 13
Rep Power: 0 Steveire is on a distinguished road
Quote:
Originally Posted by Cerulean
make sure the server isn't doing anything funky because of your user agent
Yeah, that's a good tip. I was thinking of messing around with the user agent info, and some other stuff, but that might have to wait til the weekend.

Quote:
Originally Posted by Arevos
You could also create a quick Python program that echos all TCP information it receives to STDOUT. Point your form submission program to localhost, and get the output. Then open your browser to the Wikipedia edit page. Open your hosts file and alias en.wikipedia.org to 127.0.0.1, and then try pressing the save button. The browser's output should be caught by your TCP logger. You can then compare the HTTP request from the browser, to the HTTP request you're sending from python.
This looks like an ingenious way to solve the problem. I didn't know that could be considered, and I would never have thought of it. I have no idea where I'd start writing it, but I'm pretty sure I see what you mean me to do. Submit the form, but send a copy to "home" as well. I'll see if google has the magic.
Steveire is offline   Reply With Quote
Old Jan 5th, 2006, 1:14 PM   #12
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4 Arevos is on a distinguished road
Quote:
Originally Posted by Steveire
Yeah, that's a good tip. I was thinking of messing around with the user agent info, and some other stuff, but that might have to wait til the weekend.


This looks like an ingenious way to solve the problem. I didn't know that could be considered, and I would never have thought of it. I have no idea where I'd start writing it, but I'm pretty sure I see what you mean me to do. Submit the form, but send a copy to "home" as well. I'll see if google has the magic.
Here's a simple TCP listener:
import socket
import sys

port = int(sys.argv[1])

sock = socket.socket()
sock.bind(('', port))
sock.listen(1)

while True:
	conn, addr = sock.accept()
	while True:
		data = conn.recv(1024)
		if not data: 
			break
		print data
	conn.close()
It takes the port as it's argument. So to listen on port 80 (default HTTP port):
python tcplistener.py 80
If you run it, and then try accessing http://localhost/, it should print out what your browser is sending to it.
Arevos is offline   Reply With Quote
Old Jan 6th, 2006, 10:18 AM   #13
Steveire
Newbie
 
Join Date: Jan 2006
Posts: 13
Rep Power: 0 Steveire is on a distinguished road
Thanks very much for that. I've been playing around with it for a while. No progress yet, but I've tried a few things.

First off, turning on the listener, and connecting to local host with Firefox:
C:\Python22>python tcplistener.py 80
GET / HTTP/1.1
Host: localhost
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/200511
11 Firefox/1.5
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai
n;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

turn on listener and attempt to submit the page with firefox:
C:\Python22>python tcplistener.py 80
POST /w/index.php?title=Wikipedia:Sandbox&action=submit HTTP/1.1
Host: en.wikipedia.org
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/200511
11 Firefox/1.5
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai
n;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&action=edit

Cookie: enwikiUserName=Steveire; enwikiUserID=411483; enwiki_session=f173420c3f5
db04746c52be228d5ec8e

Content-Type: multipart/form-data; boundary=---------------------------114782935
826962
Content-Length: 1045

-----------------------------114782935826962
Content-Disposition: form-data; name="wpSection"


-----------------------------114782935826962
Content-Disposition: form-data; name="wpStarttime"

20060106152025
-----------------------------114782935826962
Content-Disposition: form-data; name="wpEdittime"

20060106152007
-----------------------------114782935826962
Content-Disposition: form-data; name="wpScrolltop"

0
-----------------------------114782935826962
Content-Disposition: form-data; name="wpTextbox1"

{{Please leave this line alone (sandbox heading)}}
<!-- Hello! Feel free to try your formatting and editing skills below this line.
 As this page is for editing experiments, this page will automatically be cleane
d every 12 hours. -->
test2

-----------------------------114782935826962
Content-Disposition: form-data; name="wpSummary"

this is test 2
---------------------
--------114782935826962
Content-Disposition: form-data; name="wpSave"

Save page
-----------------------------114782935826962--

Using this python code:
import urllib
import httplib


params = urllib.urlencode({
    'wpTextbox1': '{{Please leave this line alone (sandbox heading)}}\nThis test is the best',
    'wpSummary': 'This is a clear test', 'wpSave': 1})
   
headers = {"Content-type": "application/x-www-form-urlencoded",
    "User-agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5",
    "Referer": "http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&action=edit",
    "Accept": "text/plain"}

conn = httplib.HTTPConnection("en.wikipedia.org:80")
conn.request("POST", "/w/index.php?title=Wikipedia:Sandbox&action=submit", params, headers)
response = conn.getresponse()
print response.status, response.reason

data = response.read()
conn.close()
print response
print data

Next, turn on the listener, and attempt to submit the code with python:
C:\Python22>python tcplistener.py 80
POST /w/index.php?title=Wikipedia:Sandbox&action=submit HTTP/1.1
Host: en.wikipedia.org
Accept-Encoding: identity
Content-Length: 137
Referer: http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&action=edit

Content-type: application/x-www-form-urlencoded
Accept: text/plain
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/200511
11 Firefox/1.5


wpSave=1&wpTextbox1=%7B%7BPlease+leave+this+line+alone+%28sandbox+heading%29%7D%
7D%0AThis+test+is+the+best&wpSummary=This+is+a+clear+test

I don't know if there is an issue with whether "User-agent" or "User-Agent" is used, but i've tried both in the python submission headers. I also added the referrer bit to see what happened. I find it interesting that it was grouped above, while the "User-agent" bit was left below, as if the program didn't know what to do with them.

I tried to listen on the Login page, but, even though i have en.wikipedia.org aliased to 127.0.0.1, it still logs in sucessfully. I imagined it was logging in through a different server, so I tried aliasing "*.wikipedia.org" and "wikipedia.org" but to no effect.

I thought this TCP listener would sort all of this out, but it seems like I'm losing sight of the original issue here, and getting bogged down a bit.

So, any other ideas on how to submit the edit page form and login form? Or can you see anything immediately and basically wrong with how I am attempting to send the HTTP headers or any other part of the code?
Steveire is offline   Reply With Quote
Old Jan 6th, 2006, 12:08 PM   #14
Cerulean
Professional Programmer
 
Cerulean's Avatar
 
Join Date: Apr 2005
Location: London, England
Posts: 459
Rep Power: 4 Cerulean is on a distinguished road
Haven't followed exactly what's up, but I just gave editting the sandbox a go now. I don't get a 403 (turns out that it was to do with the Python urllib user-agent header - just sending one for IE fixes it), but I do get a strange error saying that "The sandbox has been changed while you were editing". This is stumping me because as far as I can see i've set the header for editing start time to be just one second before the submitted time, and because if I take a look at the sandbox I can see that the page hasn't been changed since I started editing.
Haven't got the time to take it further, but i'll throw up my code in case it's of use to anyone. To write it I simply did what I suggested - used Firefox's live HTTP headers extension to quickly see what POST data is being submitted, and just set them appropriately in my script.
import urllib, urllib2, time

def wikiTimeStamp(t):
    """ Return a time stamp given a time tuple (e.g from time.gmtime()) """
    return "%s%s%s%s%s%s" % tuple([str(x).zfill(2) for x in t[:6]])

sandboxHeader = """{{Please leave this line alone (sandbox heading)}}
<!-- Hello! Feel free to try your formatting and editing skills below this line. As this page is for editing experiments, this page will automatically be cleaned every 12 hours. -->
"""

data = {}
data["wpSection"] = ""
data["wpStarttime"] = wikiTimeStamp(time.gmtime())
time.sleep(1)
data["wpEdittime"] = wikiTimeStamp(time.gmtime())
data["wpScrolltop"] = "0"
data["wpTextbox1"] = sandboxHeader + "Some kind of test edit....\n"
data["wpSummary"] = ""
data["wpSave"] = "Save page"

headers = {"User-Agent" : "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)"}

req = urllib2.Request("http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&action=submit", urllib.urlencode(data), headers)
result = urllib2.urlopen(req)
print result.read()
Run that and redirect the output into a HTML file, and then open the HTML file in the web browser to get an idea of what kind of response is being sent.
Good luck
Cerulean is offline   Reply With Quote
Old Jan 6th, 2006, 1:21 PM   #15
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4 Arevos is on a distinguished road
I wonder if it has something to do with the session cookie that Wikimedia sets. Maybe certain session variables are required for Wikipedia to accept a page edit. Maybe what you need to do is connect to the edit page, fetch the cookie data it returns, then to connect to the submit page, and pass the cookie data back to Wikimedia.

Or perhaps it has sometime to do with the way the form data is passed. Maybe it need to be in multipart/form-data format, rather than application/x-www-form-urlencoded format.
Arevos is offline   Reply With Quote
Old Jan 9th, 2006, 8:08 AM   #16
Steveire
Newbie
 
Join Date: Jan 2006
Posts: 13
Rep Power: 0 Steveire is on a distinguished road
Success

Many thanks to both of you for all of your help. My intention was not to operate a bot on Wikipedia, but other wikimedia powered projects. The code posted above worked for a wiki edit which you can see here.
I just substituted
http://en.wikipedia.org/w/index.php?title=Wikipediaandbox&action=submit
with
http://www.mwusers.com/wiki/index.php?title=Sandbox&action=submit
I'm certain I tried something very similar on that wiki before and it didn't work, but I'm too sick of not being able to do it to go back and figure out why my way didn't work before. I wrote something similar to submit the login page (this involved a cookie issue, which I think I've sorted with the help of the TCP listener ).

So, again, thank you.

Of course, the issue of this not working on en.wikipedia.org remains unresolved, but if anything comes up I'll post it here. I'll also be hanging around to post my newbie python questions.
Steveire is offline   Reply With Quote
Old Jan 15th, 2006, 9:24 AM   #17
Steveire
Newbie
 
Join Date: Jan 2006
Posts: 13
Rep Power: 0 Steveire is on a distinguished road
Looks like I spoke too soon. The above code works only for new pages. I tried submitting the page again, but with "Another test edit....\n" in the wpTextbox1 field, and got that same edit conflict message. I could only submit the page when I changed the page title to http://www.mwusers.com/wiki/index.php?title=SandBox&action=submit (ie, capitalise the 'b'. MediaWiki is very case sensitive). Could there be something in the way the urllib2.Request function works that's causing it to get an edit conflict? Icidentally, using the code without any of the timestamp information brings up the preview page, not the edit conflict page. Could that be relevant? Do you think it might be possible to submit the resulting preview page after retrieving it?

Steve.
Back at square one :(
Steveire is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 4:34 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC