Data types

Data types

Date types

When reading the Bitcoin developer reference, it becomes immediately clear that the Bitcoin protocol requires the user to work with only specific Bitcoin data types. You can’t just insert numbers as int and expect it to work. Each field, of every packet that is send or received by our node needs to be properly formatted.

Let’s have a look at the version message documentation in the Bitcoin developer reference.
We can see that the first field should contain the protocol version number (currently 70012). But we can’t just send the number as-is, it’s specifically stated that the number should be 4 Bytes, int_32 type. And we can also see that any variable, any piece of information that is either received or send will be formatted in the predefined manner that was specified in the Bitcoin protocol documentation. Luckily, Python have the struct module that allows us to easily predefine our data type. In our example we want to pack the number “70012” into a 4 bytes int (remember, 32 bits is 4 bytes) with the variable name “version”.
So using the struct module in our code should look like this:

import struct

version = struct.pack("i", 70012)

The i in the code represents 4 bytes integer. For a complete lists of characters and their meaning, have a look at the following table in the struct module documentation.

This is quite a simple process, just look at the Bitcoin documentation to find out how each variable should be parsed, and then head to the struct module documentation to find the corresponding character. But once done again and again for each an every variable, it will surely cause our code to get out of control and errors are a sure thing. So Alexis suggested that we’ll predefine all of the data types that are required in one file. Now, instead of using the previous code for our version variable, we can just use the predefined function to_int32(v):

import struct

def to_int32(v):
    return struct.pack("i", v)

version = to_int32(70012)

We’ve also added a read_int32, which allows us to easily get back our variable.

import struct

def to_int32(v):
    return struct.pack("i", v)

version = to_int32(70012) # The number 70012 is now packed.

print version # Unreadable


def read_int32(v):
    return struct.unpack("i", v)[0]


print read_int32(version) # The number 70012 is readable again
 

Most of the data types were easy to define, but the Bitcoin protocol has one special type of data type which is called compactSize_uint.
In this data type, every number higher than 252 will have a prefix that will indicate the length of the number. This type of data type is mostly used for variables of changing length.

import struct

def to_compactSize_uint(v):
    if 0xfd > v:
        return struct.pack("<B", v)
     elif 0xffff > v:
        return "FD".decode("hex") + struct.pack("<H", v)
     elif 0xffffffff > v:
        return "FE".decode("hex") + struct.pack("<I", v)
    else:
        return "FF".decode("hex") + struct.pack("<Q", v)



def read_compactSize_uint(s):  # S is a stream of bytes

    # Read an unsigned char to get the format
    size = ord(s.read(1))

    # Return the value
    if size < 0xFD:
        return size
    if size == 0xFD:
        return read_uint16(s.read(2))
    if size == 0xFE:
        return read_uint32(s.read(4))
    if size == 0xFF:
        return read_uint64(s.read(8))

The parse_ip bug

We’ve also tried to built a parse_ip function to properly displaying IP addresses. But unfortunately we’ve came across when using Windows. You can read more about our attempts to deal with the bug at our trello board

Edit (4-Jul-2016): Python 2.5 to 3.5 migration

Please read the general notes about the transition from Python 2.5 to 3.5 over here. And the complete github change log for the migration over here.

Most of the data types function have remained unchanged. With the exceptions of:

The functions that dealt with reading and writing charterers were replaced by two function: to_chars and read_chars.

def to_chars(v, length=-1):
    if length == -1:
        length = len(v)
return struct.pack(">%ss" % length, v)

def read_chars(v, length= -1):
     if length == -1:
         length = len(v)
         return struct.unpack(">%ss" % length, v)[0]

These new functions can accept a specific variable size (length) If now length is inserted, it will calculate the size of the string automatically. This allows us to deals with strings of varies sizes.

The parse_ip function was fixed and replaced by the following code:

def parse_ip(ip):
    IPV4_COMPAT = b"\x00" * 10 + b"\xff" * 2

    # IPv4
    if ip[0:12] == IPV4_COMPAT:
        ip = read_hexa(ip[12:])# we remove the first 10 "\x00" an 2 "\xff , and convert bytes to hexa
        ip = "%i.%i.%i.%i" % (int(ip[0:2], 16), int(ip[2:4], 16), int(ip[4:6], 16), int(ip[6:8], 16))

    # IPv6
    else:
        # TODO
        pass

    return ip

We’ve also added a two more functions for encoding and decoding hexadecimals:

def to_hexa(v):
    return bytes.fromhex(v)

def read_hexa(v):
    return v.hex()

 

Leave a Reply

Your email address will not be published. Required fields are marked *