Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   Python (http://www.programmingforums.org/forum43.html)
-   -   Reading a binary file (http://www.programmingforums.org/showthread.php?t=10829)

titaniumdecoy Jul 24th, 2006 7:55 PM

Reading a binary file
 
I need to read a binary log file. After extensive searching, I found out that the struct module can convert a certain number of bytes to another type (eg, int, float, etc.). However, I know that the file I am reading from has numerous packets in this format:

:

0x10 0x?? ... data ... 0x10 0x3
(Disregard spaces. 0x?? is this packet's hexadecimal ID number.)

How do I read a hexadecimal number from a binary file? Also, how do you know if the file is bigendian or littleendian? I'm not too familiar with binary/hex stuff; any help is appreciated. Thanks.

DaWei Jul 24th, 2006 10:29 PM

This has been discussed a number of times before. All files are binary. The difference is that some files have numeric values encoded in textual (or other) formats. This could be a different binary value expressing the value in decimal, hex, or any other base. I realize this sounds confusing, but it's something you really need to take time to get your head around.

The binary value, 10 (decimal) would be expressed (as a byte, anyway) as 00001010. Expressing it as a textual value, in ASCII, would mean you'd choose the characters for a 1 and a 0 and record them (it's binary, still, on the disk surface, but an encoding) as 00110001 and 00110000. A hex value representing 10 decimal would be recorded as ASCII 00000000 01000001 (0A). Numbers might also be encoded as one of the varieties of EBCDIC, or something else.

If you're not deciding the formatting method, then you just have to find out what your correspondent has specified and is expecting. As far as whether the other end (or your end, for that matter) is expecting big- or little-endian, you just have to find that out, too, and accomodate it.

titaniumdecoy Jul 25th, 2006 12:50 PM

Thanks DaWei. That clears things up somewhat.

Now my question is, how do I read a single byte and convert it to a hexadecimal number? I can read a single byte as a char (eg, str), as follows:

:

print struct.unpack('c', file.read(1)))
This prints ('\x10',).

Is there a way to read the byte as a hexadecimal number, rather than a char, in the first place? Or how can I easily convert this to a hexadecimal number? Thanks.

DaWei Jul 25th, 2006 1:18 PM

That is a hexadecimal number represented in ASCII (or another character set). For portability, you'd want to use a built-in converter that would accomodate the locale and type of character representation (unicode or whatever). I can't tell you what that would be, in Python (novice here), but someone probably can. If you know the encoding scheme, you can roll your own. The problem is if your app is moved to a system with a different local or character-encoding method.

To answer your question in one form: IF that were ASCII, you could toss the characters "\x" and leave yourself with the 1 and the 0. Subtract the offset (30 hex, decimal 48) that makes it a character, leaving yourself with two values, 1 hex and 0 hex. Multiply the first one by 16 (16^1 -- it's the 'tens' digit for base 16), multiply the second by 1 (16^0, it's the 'units' digit), add them, and Bob's your aunt (no, that's Gertie, isn't it?). I'm not recommending you do it that way, portability suffers. I tell you only so you can see the process of encoding and decoding and conversion from one base to another.

Arevos Jul 26th, 2006 6:25 PM

file.read(n) reads in n bytes and returns them as a string. Thus, file.read(1) will return a string of exactly one byte, which also means the string will be exactly one character. One can convert a string of one character into its corresponding ASCII integer value via ord:
:

  1. byte = ord(file.read(1))

struct.unpack('c', file.read(1)) does the same thing, I believe, but in a more roundabout fashion.

As for turning into hexidecimal; integers in Python do not have a base. Only when they are printed, or turned into a string, is a base applied (by default, base 10). One can quite happily add a decimal number, to a hexidecimal number, to a octal number:
:

  1. >>> 12 + 0xf + 010
  2. 35

Creating a hexidecimal string representation of an integer can be most easily achieved through string formatting:
:

  1. print "%x" % byte


DaWei Jul 26th, 2006 7:27 PM

Quote:

12 + 0xf + 010
This is true in many languages, including C. It's the point I try to make periodically when the question comes up. Each of those values has an absolute magnitude. The system will deal with it, under the hood, as binary. What you see is merely a representation that makes you feel comfortable -- nothing real is changed by visually expressing a real magnitude in any number of formatted forms.


All times are GMT -5. The time now is 12:41 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC