Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Aug 29th, 2005, 5:19 PM   #1
Eryk
Programmer
 
Join Date: Jul 2005
Posts: 62
Rep Power: 4 Eryk is on a distinguished road
Data Compression

Well, I'm trying to pack down some information, but not based upon bytes, but rather characters. I need to turn an estimated guess of 600 (possibly only 500) characters into 400 characters. So 2/3's of the amount I guess. I can't just use a utility and put it in either, I want it to be able to be changed back in forth in the language I'm using (which is JS).

I've tried programming several things myself, but instead they end up longer due to bad ideas. I have already searched google for things like "digital compression" and "data compression", but I've found nothing that helped me figure it out.

I appreciate the help, thanks.
Eryk is offline   Reply With Quote
Old Aug 29th, 2005, 5:27 PM   #2
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4 Arevos is on a distinguished road
Data compression is not an exact science; there's no way to guarentee that a string of 600 characters can be compressed to 400. Javascript compression is also likely to be pretty slow.

That said, have you looked up Huffman Coding? That's a pretty simple compression algorithm.
Arevos is offline   Reply With Quote
Old Aug 29th, 2005, 5:28 PM   #3
Ooble
I eat cake for breakfast.
 
Ooble's Avatar
 
Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9 Ooble is on a distinguished road
You could check out the gzip source.
__________________
Me :: You :: Them
Ooble is offline   Reply With Quote
Old Aug 29th, 2005, 5:51 PM   #4
Silvanus
Hobbyist Programmer
 
Silvanus's Avatar
 
Join Date: Aug 2005
Location: Hiding from... them...
Posts: 110
Rep Power: 3 Silvanus is on a distinguished road
You can compress/decompress strings using the bz2 module in Python. I don't know if there's a similar thing for Javascript, though.
Silvanus is offline   Reply With Quote
Old Aug 29th, 2005, 7:17 PM   #5
Eryk
Programmer
 
Join Date: Jul 2005
Posts: 62
Rep Power: 4 Eryk is on a distinguished road
I'll take a look at the Huffman algorithm.

I'm not sure if this is possible, but another thing I was thinking of was a way of putting numbers together one way or another, but also the ability to get the original numbers from the new number. I guess that's still compression, just with numbers.
Eryk is offline   Reply With Quote
Old Aug 30th, 2005, 4:13 PM   #6
iignotus
Professional Programmer
 
iignotus's Avatar
 
Join Date: Apr 2005
Location: Nowhere Special
Posts: 466
Rep Power: 4 iignotus is on a distinguished road
Send a message via AIM to iignotus
Characters are bytes, unless I'm misunderstanding what you're saying. Compression is a difficult arena. What exactly are you trying to do?
__________________
% rc4 hexkey < input > output
#define S ,t=s[i],s[i]=s[j],s[j]=t /* rc4 hexkey <file */
unsigned char k[256],s[256],i,j,t;main(c,v,e)char**v;{++v;while(++i)s[ 
i]=i;for(c=0;*(*v)++;k[c++]=e)sscanf((*v)++-1,"%2x",&e);while(j+=s[i]
+k[i%c]S,++i);for(j=0;c=~getchar();putchar(~c^s[t+=s[i]]))j+=s[++i]S;}
iignotus is offline   Reply With Quote
Old Aug 30th, 2005, 5:11 PM   #7
Eryk
Programmer
 
Join Date: Jul 2005
Posts: 62
Rep Power: 4 Eryk is on a distinguished road
I'm just saying that it's not like a file size that I'm dealing with.
Eryk is offline   Reply With Quote
Old Aug 30th, 2005, 5:19 PM   #8
iignotus
Professional Programmer
 
iignotus's Avatar
 
Join Date: Apr 2005
Location: Nowhere Special
Posts: 466
Rep Power: 4 iignotus is on a distinguished road
Send a message via AIM to iignotus
Everything on a computer is a file. What exactly is this 'series of characters' that you're trying to compress?
__________________
% rc4 hexkey < input > output
#define S ,t=s[i],s[i]=s[j],s[j]=t /* rc4 hexkey <file */
unsigned char k[256],s[256],i,j,t;main(c,v,e)char**v;{++v;while(++i)s[ 
i]=i;for(c=0;*(*v)++;k[c++]=e)sscanf((*v)++-1,"%2x",&e);while(j+=s[i]
+k[i%c]S,++i);for(j=0;c=~getchar();putchar(~c^s[t+=s[i]]))j+=s[++i]S;}
iignotus is offline   Reply With Quote
Old Aug 30th, 2005, 5:22 PM   #9
Eryk
Programmer
 
Join Date: Jul 2005
Posts: 62
Rep Power: 4 Eryk is on a distinguished road
This information will be stored in a database, not a seperate file. The file size is not what matters, which is my point.

This string will variate and could be any number of possibilities.

Perhaps I'm going at this from the wrong angle, maybe I should be checking for a way to change the encoding to one that has more characters, enough to allow the number of characters to change through smoothly.
Eryk is offline   Reply With Quote
Old Aug 30th, 2005, 5:56 PM   #10
Cerulean
Professional Programmer
 
Cerulean's Avatar
 
Join Date: Apr 2005
Location: London, England
Posts: 459
Rep Power: 4 Cerulean is on a distinguished road
Gzip does a good job of getting text files waay down. You can then pad the data up to 400. Should be pretty easy.
Here's a template you can use when you code it in C++ - it's written in Python (solely because I got curious).
import zlib

def make400Bytes(s):
    assert len(s) > 400
    compressed = zlib.compress(s)
    if len(compressed) > 400:
        print "Ouch, data size compression problem. Dataset too big, or bad data"
        return None
    compressed = compressed + "#" * (400 - len(compressed))
    return compressed

def make600Bytes(s):
    return zlib.decompress(s.rstrip("#"))


before = """Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer at purus. Aliquam posuere nibh. Vestibulum vitae turpis non arcu venenatis luctus. Sed at nisi. Aliquam erat volutpat. Sed urna. Quisque sit amet arcu eu tellus luctus mollis. Phasellus rhoncus vulputate sem. Nunc lacinia nibh. Ut fermentum augue nec odio. Duis fringilla tincidunt elit. 
Mauris urna metus, placerat vitae, porttitor non, scelerisque et, libero. Donec nibh quam, mollis eget, placerat quis, sollicitudin in, magna. Nullam non urna. Vestibulum metus arcu, condimentum pellentesque, porttitor in, tempus ac, erat. Pelle"""
 
print "len(before) =", len(before) # 600
after = make400Bytes(before)
print "len(after) =", len(after) # down to 400
afterAfter = make600Bytes(after)
print "len(afterAfter) =", len(afterAfter) # back up to 600
I'm not sure, but I don't think the C zlib library is simple as the standard-library Python one with a "compress" and "decompress" function pair. If so, it should be quite easily to translate.
Cerulean is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 6:55 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC