![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
|
|
#1 |
|
Programming Guru
![]() ![]() |
When is it faster to cache?
I'm wondering when the line should be drawn between the speed of producing the output /or/ recalling cached data.
I have a function that will parse limited BBCode to HTML... def parse_post(self, text):
for s in self.smilies:
text = text.replace(s[0], '<img class="inlineimg" border="0" alt="%s" title="%s" src="%s" />'%(s[0], s[0], s[1]))
def check_quote(text):
n = 0
qs = []
o = False
for i in range(len(text)):
if text[i:i+7] == 'quote="':
o = i
elif text[i] == '"' and o and i-o > 7:
n += 1
qs.append(text[o+7:i])
o = False
elif text[i:i+8] == "[/quote]":
n -= 1
if n < 0:
return False
if qs:
return qs
else:
return n == 0
def check_tag(x, y, bal=False):
n = 0
leny = len(y)
ts = [ '[%s]'%y, '[/%s]'%y ]
for i in range(len(x)-leny-2):
if x[i:i+leny+2] == ts[0]:
n += 1
elif x[i:i+leny+3] == ts[1]:
n -= 1
if n < 0:
return False
if bal and n > 1:
return False
return n == 0
def get_between(x, s, e):
try:
return x.split(s)[1].split(e)[0]
except IndexError: return False
warning = ''
g = ['b', 'u', 'i']
for gg in g:
if check_tag(text, gg):
text = text.replace('[%s]'%gg, '<%s>'%gg)
text = text.replace('[/%s]'%gg, '</%s>'%gg)
else:
warning += "<b>Warning</b> Your '<i>%s</i>' tag(s) were not formatted correctly.<br />"%gg
if check_tag(text, 'img', True):
text = text.replace('[img]', '<img src="')
text = text.replace('[/img]', '" />')
else:
warning += "<b>Warning</b> Your '<i>IMG</i>' tag(s) were not formatted correctly.<br />"
if check_tag(text, 'url', True):
while 1:
lnk = get_between(text, '', '')
if not lnk:
break
else:
text = text.replace('%s'%lnk, '<a href="%s">%s</a>'%(lnk, lnk), 1)
else:
warning += "<b>Warning</b> Your '<i>URL</i>' tag(s) were not formatted correctly.<br />"
names = check_quote(text)
if type(names) == list:
for name in names:
text = text.replace( '[INSERT WORD QUOTE HERE="%s"]'%name, '<br /><br /> Quote by <b>%s</b><br /><div class="quote">'%name )
text = text.replace( '[/quote]', '</div><br /><br />' )
else:
if not names:
warning += "<b>Warning</b> Your '<i>quote</i>' tag(s) were not formatted correctly.<br />"
return text+['','<br /><br /><hr />'+warning][len(warning)>0]Will it be faster to cache the results and recall that result, or just calculate it again each time? My RAM is 512 and it's only running one python process. Edit: The source contains INSERT WORD QUOTE HERE, since having the word 'quote' plainly there made quote tags appear. Gah, the BBCode is driving my source nuts in some places. Just quote my post to see the unparsed text. |
|
|
|
|
|
#2 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
It's usually always faster to cache, especially since Python's dictionaries are generally pretty efficient.
Usually when dealing with text, the original text isn't used as the key, however. A MD5 or SHA1 hash is used instead to save memory and key-finding times. So instead of having a caching dictionary like: { original_text : converted_text }{ sha.new(original_text).hexdigest() : converted_text }Also, if you're parsing, you might want to check out PyParsing. It's a lot easier than doing things by hand. |
|
|
|
|
|
#3 |
|
Programming Guru
![]() ![]() |
{ sha.new(original_text).hexdigest() : converted_text } |
|
|
|
|
|
#4 | |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5
![]() |
Quote:
![]() |
|
|
|
|
|
|
#5 |
|
Troll
Join Date: Apr 2005
Location: Texas
Posts: 732
Rep Power: 4
![]() |
Some more info regarding usage of this program would help in that decision. Is it running on a webserver? If so, is the cache even available across multiple requests? How long is the typical block of BBCode tags? And most importantly: How often does original text repeat? Why cache what is never called up?
__________________
MD5(sig) = bcef75433db02e9ad9bf81d6f7c5c270 |
|
|
|
|
|
#6 |
|
Programming Guru
![]() ![]() |
Well before it was reparsing every post every time someone visits a thread. This is because I want to parse everything after it's saved to the data files (in case somebody finds a glitch in the parser, or I want to make changes to the effects of the tags).
But now it only needs to parse each post once, until the webserver is rebooted. It's working fine. Instead of calling self.parse_post(), I just changed that to self.gateway_parse() def gateway_parse(self, post):
checksum = sha(post).hexdigest()
try:
res = self.cache_parse[checksum]
except KeyError:
res = self.parse_post(post)
self.cache_parse[checksum] = res
return res |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|