Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Apr 1st, 2006, 8:19 PM   #1
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Location: Waterloo, Ontario
Posts: 2,086
Rep Power: 6 Sane will become famous soon enough
Send a message via MSN to Sane
When is it faster to cache?

I'm wondering when the line should be drawn between the speed of producing the output /or/ recalling cached data.

I have a function that will parse limited BBCode to HTML...
    def parse_post(self, text):
        for s in self.smilies:
            text = text.replace(s[0], '<img class="inlineimg" border="0" alt="%s" title="%s" src="%s" />'%(s[0], s[0], s[1]))

        def check_quote(text):
            n = 0
            qs = []
            o = False
            for i in range(len(text)):
                if text[i:i+7] == 'quote="':
                    o = i

                elif text[i] == '"' and o and i-o > 7:
                    n += 1
                    qs.append(text[o+7:i])
                    o = False
                    
                elif text[i:i+8] == "[/quote]":
                    n -= 1

                if n < 0:
                    return False

            if qs:
                return qs
            
            else:
                return n == 0

        def check_tag(x, y, bal=False):
            n = 0
            leny = len(y)
            ts = [ '[%s]'%y, '[/%s]'%y ]
            
            for i in range(len(x)-leny-2):
                if x[i:i+leny+2] == ts[0]:
                    n += 1
                elif x[i:i+leny+3] == ts[1]:
                    n -= 1

                if n < 0:
                    return False
                if bal and n > 1:
                    return False

            return n == 0
        
        def get_between(x, s, e):
            try:
                return x.split(s)[1].split(e)[0]
            except IndexError: return False
            
        warning = ''
        g = ['b', 'u', 'i']

        for gg in g:
            if check_tag(text, gg):
                text = text.replace('[%s]'%gg, '<%s>'%gg)
                text = text.replace('[/%s]'%gg, '</%s>'%gg)
            else:
                warning += "<b>Warning</b> Your '<i>%s</i>' tag(s) were not formatted correctly.<br />"%gg

        if check_tag(text, 'img', True):
            text = text.replace('[img]', '<img src="')
            text = text.replace('[/img]', '" />')
        else:
            warning += "<b>Warning</b> Your '<i>IMG</i>' tag(s) were not formatted correctly.<br />"
            
        if check_tag(text, 'url', True):
            while 1:
                lnk = get_between(text, '', '')
                if not lnk:
                    break
                else:
                    text = text.replace('%s'%lnk, '<a href="%s">%s</a>'%(lnk, lnk), 1)
        else:
            warning += "<b>Warning</b> Your '<i>URL</i>' tag(s) were not formatted correctly.<br />"

        names = check_quote(text)
        if type(names) == list:
            for name in names:
                text = text.replace( '[INSERT WORD QUOTE HERE="%s"]'%name, '<br /><br /> &nbsp; &nbsp; Quote by <b>%s</b><br /><div class="quote">'%name )
            text = text.replace( '[/quote]', '</div><br /><br />' )
        else:
            if not names:
                warning += "<b>Warning</b> Your '<i>quote</i>' tag(s) were not formatted correctly.<br />"
        
        return text+['','<br /><br /><hr />'+warning][len(warning)>0]

Will it be faster to cache the results and recall that result, or just calculate it again each time? My RAM is 512 and it's only running one python process.


Edit: The source contains INSERT WORD QUOTE HERE, since having the word 'quote' plainly there made quote tags appear. Gah, the BBCode is driving my source nuts in some places. Just quote my post to see the unparsed text.
Sane is offline   Reply With Quote
Old Apr 2nd, 2006, 5:56 AM   #2
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
It's usually always faster to cache, especially since Python's dictionaries are generally pretty efficient.

Usually when dealing with text, the original text isn't used as the key, however. A MD5 or SHA1 hash is used instead to save memory and key-finding times. So instead of having a caching dictionary like:
{ original_text : converted_text }
Use a dictionary like:
{ sha.new(original_text).hexdigest() : converted_text }
Since you're potentially dealing with large chunks of text, you might also want to put a limit on your cache of some kind. Maybe cached entries are only around for a fixed amount of time, or maybe the dictionary is assigned a fixed amount of memory, and the oldest entry is deleted to make way for the newest.

Also, if you're parsing, you might want to check out PyParsing. It's a lot easier than doing things by hand.
Arevos is offline   Reply With Quote
Old Apr 2nd, 2006, 12:19 PM   #3
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Location: Waterloo, Ontario
Posts: 2,086
Rep Power: 6 Sane will become famous soon enough
Send a message via MSN to Sane
{ sha.new(original_text).hexdigest() : converted_text }
That's brilliant! Wow, that's a really good idea.
Sane is offline   Reply With Quote
Old Apr 2nd, 2006, 12:25 PM   #4
Dameon
Troll
 
Dameon's Avatar
 
Join Date: Apr 2005
Location: Texas
Posts: 732
Rep Power: 4 Dameon is on a distinguished road
Some more info regarding usage of this program would help in that decision. Is it running on a webserver? If so, is the cache even available across multiple requests? How long is the typical block of BBCode tags? And most importantly: How often does original text repeat? Why cache what is never called up?
__________________
MD5(sig) = bcef75433db02e9ad9bf81d6f7c5c270
Dameon is offline   Reply With Quote
Old Apr 2nd, 2006, 3:10 PM   #5
Sane
Programming Guru
 
Sane's Avatar
 
Join Date: Apr 2005
Location: Waterloo, Ontario
Posts: 2,086
Rep Power: 6 Sane will become famous soon enough
Send a message via MSN to Sane
Well before it was reparsing every post every time someone visits a thread. This is because I want to parse everything after it's saved to the data files (in case somebody finds a glitch in the parser, or I want to make changes to the effects of the tags).

But now it only needs to parse each post once, until the webserver is rebooted. It's working fine. Instead of calling self.parse_post(), I just changed that to self.gateway_parse()
    def gateway_parse(self, post):
        checksum = sha(post).hexdigest()
        try:
            res = self.cache_parse[checksum]
        except KeyError:
            res = self.parse_post(post)
            self.cache_parse[checksum] = res
        return res
Sane is offline   Reply With Quote
Old Apr 2nd, 2006, 4:11 PM   #6
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by Sane
That's brilliant! Wow, that's a really good idea.
That's more or less what I said when I was introduced to that method by a programmer on the Drupal mailing list some years ago
Arevos is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 1:16 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC