Some time ago, a dear friend of mine came up to me and asked about the
Python module binascii – particularly about the methods hexlify()
and unhexlify()
. Since he asked for it, I’m going to share my answer
publicly with you.
First of all, I’m defining the used nomenclature:
- ASCII characters are being written in single quotes
- decimal numbers are of the type Long with a L suffix
- hex values have a x prefix
First, let me quote the documentation:
binascii.b2a_hex(data) binascii.hexlify(data) Return the hexadecimal representation of the binary data. Every byte of data is converted into the corresponding 2-digit hex representation. The resulting string is therefore twice as long as the length of data.
binascii.a2b_hex(hexstr) binascii.unhexlify(hexstr) Return the binary data represented by the hexadecimal string hexstr. This function is the inverse of b2a_hex(). hexstr must contain an even number of hexadecimal digits (which can be upper or lower case), otherwise a TypeError is raised.
I’ll begin with hexlify()
. As the documentation states, this method
splits a string which consists of hex-tuples into distinct bytes.
The ASCII character ‘A’ has 65L as numerical representation. To verify this in Python:
long(ord('A'))
65L
You might ask “Why is this even relevant to understand binascii?” Well, we don’t know anything about how ord() does its job. But with binascii we can re-calculate manually and verify.
binascii.hexlify('A')
'41'
Now we know that an ‘A’ – interpreted as binary data and shown in hex
– resembles ’41’. But wait, ’41’ is a string and no hex value! That’s
no biggy, hexlify()
represents its result as string.
To stay with the example, let’s convert 41 into a decimal number and check if it equals 65L.
long('41', 16)
65L
Tada! It seems that ‘A’ = 41 = 65L. You might have known that already, but please, stay with me a minute longer.
To make it look a little more complex:
binascii.hexlify('A') == '%x' % long('41', 16)
True
Be aware that '%x' % n
converts a decimal number n
into its hex
representation.
binascii.unhexlify()
naturally does the same thing as hexlify()
,
but in reverse. It takes binary data and displays it in tuples of
hex-values.
I’ll start off with an example:
binascii.unhexlify('41')
'A'
binascii.unhexlify('%x' % ord('A'))
'A'
Here, unhexlify()
takes the numerical representation 65L from the
ASCII character ‘A’
ord('A')
65
converts it into hex 41
'%x' % ord('A')
'41'
and represents it as a 1-tuple (meaning dimension of one) of hex values.
And now the conclusio – why might all of this be useful? Right now, I can think of at least four use cases:
- cryptography
- data-transformation (i.e. Base64 for MIME/E-Mail attachments)
- security (deciphering binary readings off a network, pattern matching, …)
- textual representation of escape sequences
Taking up the last example, I’ll show you how to visualize the Bell
escape sequence (you know, that thing that keeps beeping in your
terminal). Taken from the ASCII table, the numerical representation of
the Bell is 7. Programmers might know it better as \a
.
ord('\a') == 7
True
Presuming you read such a character in some kind of binary data – for
example from a socket and you want to visualize this data with
print
, you will not get any results – at least none visible. You
might hear the Bell sound if you’re not on a silent terminal.
Now, finally – binascii to the rescue:
binascii.hexlify('\a')
'07'
Voilà, the dubious string is decrypted.