![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#11 | ||
|
Newbie
Join Date: Jan 2006
Posts: 13
Rep Power: 0
![]() |
Quote:
Quote:
|
||
|
|
|
|
|
#12 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4
![]() |
Quote:
import socket
import sys
port = int(sys.argv[1])
sock = socket.socket()
sock.bind(('', port))
sock.listen(1)
while True:
conn, addr = sock.accept()
while True:
data = conn.recv(1024)
if not data:
break
print data
conn.close()python tcplistener.py 80 |
|
|
|
|
|
|
#13 |
|
Newbie
Join Date: Jan 2006
Posts: 13
Rep Power: 0
![]() |
Thanks very much for that. I've been playing around with it for a while. No progress yet, but I've tried a few things.
First off, turning on the listener, and connecting to local host with Firefox: C:\Python22>python tcplistener.py 80 GET / HTTP/1.1 Host: localhost User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/200511 11 Firefox/1.5 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai n;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive turn on listener and attempt to submit the page with firefox: C:\Python22>python tcplistener.py 80
POST /w/index.php?title=Wikipedia:Sandbox&action=submit HTTP/1.1
Host: en.wikipedia.org
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/200511
11 Firefox/1.5
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plai
n;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&action=edit
Cookie: enwikiUserName=Steveire; enwikiUserID=411483; enwiki_session=f173420c3f5
db04746c52be228d5ec8e
Content-Type: multipart/form-data; boundary=---------------------------114782935
826962
Content-Length: 1045
-----------------------------114782935826962
Content-Disposition: form-data; name="wpSection"
-----------------------------114782935826962
Content-Disposition: form-data; name="wpStarttime"
20060106152025
-----------------------------114782935826962
Content-Disposition: form-data; name="wpEdittime"
20060106152007
-----------------------------114782935826962
Content-Disposition: form-data; name="wpScrolltop"
0
-----------------------------114782935826962
Content-Disposition: form-data; name="wpTextbox1"
{{Please leave this line alone (sandbox heading)}}
<!-- Hello! Feel free to try your formatting and editing skills below this line.
As this page is for editing experiments, this page will automatically be cleane
d every 12 hours. -->
test2
-----------------------------114782935826962
Content-Disposition: form-data; name="wpSummary"
this is test 2
---------------------
--------114782935826962
Content-Disposition: form-data; name="wpSave"
Save page
-----------------------------114782935826962--Using this python code: import urllib
import httplib
params = urllib.urlencode({
'wpTextbox1': '{{Please leave this line alone (sandbox heading)}}\nThis test is the best',
'wpSummary': 'This is a clear test', 'wpSave': 1})
headers = {"Content-type": "application/x-www-form-urlencoded",
"User-agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5",
"Referer": "http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&action=edit",
"Accept": "text/plain"}
conn = httplib.HTTPConnection("en.wikipedia.org:80")
conn.request("POST", "/w/index.php?title=Wikipedia:Sandbox&action=submit", params, headers)
response = conn.getresponse()
print response.status, response.reason
data = response.read()
conn.close()
print response
print dataNext, turn on the listener, and attempt to submit the code with python: C:\Python22>python tcplistener.py 80 POST /w/index.php?title=Wikipedia:Sandbox&action=submit HTTP/1.1 Host: en.wikipedia.org Accept-Encoding: identity Content-Length: 137 Referer: http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&action=edit Content-type: application/x-www-form-urlencoded Accept: text/plain User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/200511 11 Firefox/1.5 wpSave=1&wpTextbox1=%7B%7BPlease+leave+this+line+alone+%28sandbox+heading%29%7D% 7D%0AThis+test+is+the+best&wpSummary=This+is+a+clear+test I don't know if there is an issue with whether "User-agent" or "User-Agent" is used, but i've tried both in the python submission headers. I also added the referrer bit to see what happened. I find it interesting that it was grouped above, while the "User-agent" bit was left below, as if the program didn't know what to do with them. I tried to listen on the Login page, but, even though i have en.wikipedia.org aliased to 127.0.0.1, it still logs in sucessfully. I imagined it was logging in through a different server, so I tried aliasing "*.wikipedia.org" and "wikipedia.org" but to no effect. I thought this TCP listener would sort all of this out, but it seems like I'm losing sight of the original issue here, and getting bogged down a bit. So, any other ideas on how to submit the edit page form and login form? Or can you see anything immediately and basically wrong with how I am attempting to send the HTTP headers or any other part of the code? |
|
|
|
|
|
#14 |
|
Professional Programmer
Join Date: Apr 2005
Location: London, England
Posts: 459
Rep Power: 4
![]() |
Haven't followed exactly what's up, but I just gave editting the sandbox a go now. I don't get a 403 (turns out that it was to do with the Python urllib user-agent header - just sending one for IE fixes it), but I do get a strange error saying that "The sandbox has been changed while you were editing". This is stumping me because as far as I can see i've set the header for editing start time to be just one second before the submitted time, and because if I take a look at the sandbox I can see that the page hasn't been changed since I started editing.
Haven't got the time to take it further, but i'll throw up my code in case it's of use to anyone. To write it I simply did what I suggested - used Firefox's live HTTP headers extension to quickly see what POST data is being submitted, and just set them appropriately in my script. import urllib, urllib2, time
def wikiTimeStamp(t):
""" Return a time stamp given a time tuple (e.g from time.gmtime()) """
return "%s%s%s%s%s%s" % tuple([str(x).zfill(2) for x in t[:6]])
sandboxHeader = """{{Please leave this line alone (sandbox heading)}}
<!-- Hello! Feel free to try your formatting and editing skills below this line. As this page is for editing experiments, this page will automatically be cleaned every 12 hours. -->
"""
data = {}
data["wpSection"] = ""
data["wpStarttime"] = wikiTimeStamp(time.gmtime())
time.sleep(1)
data["wpEdittime"] = wikiTimeStamp(time.gmtime())
data["wpScrolltop"] = "0"
data["wpTextbox1"] = sandboxHeader + "Some kind of test edit....\n"
data["wpSummary"] = ""
data["wpSave"] = "Save page"
headers = {"User-Agent" : "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)"}
req = urllib2.Request("http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&action=submit", urllib.urlencode(data), headers)
result = urllib2.urlopen(req)
print result.read()Good luck |
|
|
|
|
|
#15 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4
![]() |
I wonder if it has something to do with the session cookie that Wikimedia sets. Maybe certain session variables are required for Wikipedia to accept a page edit. Maybe what you need to do is connect to the edit page, fetch the cookie data it returns, then to connect to the submit page, and pass the cookie data back to Wikimedia.
Or perhaps it has sometime to do with the way the form data is passed. Maybe it need to be in multipart/form-data format, rather than application/x-www-form-urlencoded format. |
|
|
|
|
|
#16 |
|
Newbie
Join Date: Jan 2006
Posts: 13
Rep Power: 0
![]() |
Success
Many thanks to both of you for all of your help. My intention was not to operate a bot on Wikipedia, but other wikimedia powered projects. The code posted above worked for a wiki edit which you can see here.
I just substituted http://en.wikipedia.org/w/index.php?title=Wikipedia andbox&action=submitwith http://www.mwusers.com/wiki/index.php?title=Sandbox&action=submit I'm certain I tried something very similar on that wiki before and it didn't work, but I'm too sick of not being able to do it to go back and figure out why my way didn't work before. I wrote something similar to submit the login page (this involved a cookie issue, which I think I've sorted with the help of the TCP listener ).So, again, thank you. Of course, the issue of this not working on en.wikipedia.org remains unresolved, but if anything comes up I'll post it here. I'll also be hanging around to post my newbie python questions. |
|
|
|
|
|
#17 |
|
Newbie
Join Date: Jan 2006
Posts: 13
Rep Power: 0
![]() |
Looks like I spoke too soon. The above code works only for new pages. I tried submitting the page again, but with "Another test edit....\n" in the wpTextbox1 field, and got that same edit conflict message. I could only submit the page when I changed the page title to http://www.mwusers.com/wiki/index.php?title=SandBox&action=submit (ie, capitalise the 'b'. MediaWiki is very case sensitive). Could there be something in the way the urllib2.Request function works that's causing it to get an edit conflict? Icidentally, using the code without any of the timestamp information brings up the preview page, not the edit conflict page. Could that be relevant? Do you think it might be possible to submit the resulting preview page after retrieving it?
Steve. Back at square one :( |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|