![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Programmer
Join Date: Jul 2005
Posts: 62
Rep Power: 4
![]() |
Data Compression
Well, I'm trying to pack down some information, but not based upon bytes, but rather characters. I need to turn an estimated guess of 600 (possibly only 500) characters into 400 characters. So 2/3's of the amount I guess. I can't just use a utility and put it in either, I want it to be able to be changed back in forth in the language I'm using (which is JS).
I've tried programming several things myself, but instead they end up longer due to bad ideas. I have already searched google for things like "digital compression" and "data compression", but I've found nothing that helped me figure it out. I appreciate the help, thanks. |
|
|
|
|
|
#2 |
|
Programming Guru
![]() Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 4
![]() |
Data compression is not an exact science; there's no way to guarentee that a string of 600 characters can be compressed to 400. Javascript compression is also likely to be pretty slow.
That said, have you looked up Huffman Coding? That's a pretty simple compression algorithm. |
|
|
|
|
|
#3 |
|
I eat cake for breakfast.
![]() ![]() ![]() ![]() Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9
![]() |
You could check out the gzip source.
|
|
|
|
|
|
#4 |
|
Hobbyist Programmer
Join Date: Aug 2005
Location: Hiding from... them...
Posts: 110
Rep Power: 3
![]() |
You can compress/decompress strings using the bz2 module in Python. I don't know if there's a similar thing for Javascript, though.
|
|
|
|
|
|
#5 |
|
Programmer
Join Date: Jul 2005
Posts: 62
Rep Power: 4
![]() |
I'll take a look at the Huffman algorithm.
I'm not sure if this is possible, but another thing I was thinking of was a way of putting numbers together one way or another, but also the ability to get the original numbers from the new number. I guess that's still compression, just with numbers. |
|
|
|
|
|
#6 |
|
Professional Programmer
|
Characters are bytes, unless I'm misunderstanding what you're saying. Compression is a difficult arena. What exactly are you trying to do?
__________________
% rc4 hexkey < input > output
#define S ,t=s[i],s[i]=s[j],s[j]=t /* rc4 hexkey <file */
unsigned char k[256],s[256],i,j,t;main(c,v,e)char**v;{++v;while(++i)s[
i]=i;for(c=0;*(*v)++;k[c++]=e)sscanf((*v)++-1,"%2x",&e);while(j+=s[i]
+k[i%c]S,++i);for(j=0;c=~getchar();putchar(~c^s[t+=s[i]]))j+=s[++i]S;} |
|
|
|
|
|
#7 |
|
Programmer
Join Date: Jul 2005
Posts: 62
Rep Power: 4
![]() |
I'm just saying that it's not like a file size that I'm dealing with.
|
|
|
|
|
|
#8 |
|
Professional Programmer
|
Everything on a computer is a file. What exactly is this 'series of characters' that you're trying to compress?
__________________
% rc4 hexkey < input > output
#define S ,t=s[i],s[i]=s[j],s[j]=t /* rc4 hexkey <file */
unsigned char k[256],s[256],i,j,t;main(c,v,e)char**v;{++v;while(++i)s[
i]=i;for(c=0;*(*v)++;k[c++]=e)sscanf((*v)++-1,"%2x",&e);while(j+=s[i]
+k[i%c]S,++i);for(j=0;c=~getchar();putchar(~c^s[t+=s[i]]))j+=s[++i]S;} |
|
|
|
|
|
#9 |
|
Programmer
Join Date: Jul 2005
Posts: 62
Rep Power: 4
![]() |
This information will be stored in a database, not a seperate file. The file size is not what matters, which is my point.
This string will variate and could be any number of possibilities. Perhaps I'm going at this from the wrong angle, maybe I should be checking for a way to change the encoding to one that has more characters, enough to allow the number of characters to change through smoothly. |
|
|
|
|
|
#10 |
|
Professional Programmer
Join Date: Apr 2005
Location: London, England
Posts: 459
Rep Power: 4
![]() |
Gzip does a good job of getting text files waay down. You can then pad the data up to 400. Should be pretty easy.
Here's a template you can use when you code it in C++ - it's written in Python (solely because I got curious). import zlib
def make400Bytes(s):
assert len(s) > 400
compressed = zlib.compress(s)
if len(compressed) > 400:
print "Ouch, data size compression problem. Dataset too big, or bad data"
return None
compressed = compressed + "#" * (400 - len(compressed))
return compressed
def make600Bytes(s):
return zlib.decompress(s.rstrip("#"))
before = """Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer at purus. Aliquam posuere nibh. Vestibulum vitae turpis non arcu venenatis luctus. Sed at nisi. Aliquam erat volutpat. Sed urna. Quisque sit amet arcu eu tellus luctus mollis. Phasellus rhoncus vulputate sem. Nunc lacinia nibh. Ut fermentum augue nec odio. Duis fringilla tincidunt elit.
Mauris urna metus, placerat vitae, porttitor non, scelerisque et, libero. Donec nibh quam, mollis eget, placerat quis, sollicitudin in, magna. Nullam non urna. Vestibulum metus arcu, condimentum pellentesque, porttitor in, tempus ac, erat. Pelle"""
print "len(before) =", len(before) # 600
after = make400Bytes(before)
print "len(after) =", len(after) # down to 400
afterAfter = make600Bytes(after)
print "len(afterAfter) =", len(afterAfter) # back up to 600 |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|