Data types
Date types
When reading the Bitcoin developer reference, it becomes immediately clear that the Bitcoin protocol requires the user to work with only specific Bitcoin data types. You can’t just insert numbers as int
and expect it to work. Each field, of every packet that is send or received by our node needs to be properly formatted.
Let’s have a look at the version message documentation in the Bitcoin developer reference.
We can see that the first field should contain the protocol version number (currently 70012). But we can’t just send the number as-is, it’s specifically stated that the number should be 4 Bytes, int_32 type. And we can also see that any variable, any piece of information that is either received or send will be formatted in the predefined manner that was specified in the Bitcoin protocol documentation. Luckily, Python have the struct
module that allows us to easily predefine our data type. In our example we want to pack the number “70012” into a 4 bytes int (remember, 32 bits is 4 bytes) with the variable name “version”.
So using the struct module in our code should look like this:
import struct version = struct.pack("i", 70012)
The i
in the code represents 4 bytes integer. For a complete lists of characters and their meaning, have a look at the following table in the struct module documentation.
This is quite a simple process, just look at the Bitcoin documentation to find out how each variable should be parsed, and then head to the struct module documentation to find the corresponding character. But once done again and again for each an every variable, it will surely cause our code to get out of control and errors are a sure thing. So Alexis suggested that we’ll predefine all of the data types that are required in one file. Now, instead of using the previous code for our version
variable, we can just use the predefined function to_int32(v)
:
import struct def to_int32(v): return struct.pack("i", v) version = to_int32(70012)
We’ve also added a read_int32
, which allows us to easily get back our variable.
import struct def to_int32(v): return struct.pack("i", v) version = to_int32(70012) # The number 70012 is now packed. print version # Unreadable def read_int32(v): return struct.unpack("i", v)[0] print read_int32(version) # The number 70012 is readable again
Most of the data types were easy to define, but the Bitcoin protocol has one special type of data type which is called compactSize_uint
.
In this data type, every number higher than 252 will have a prefix that will indicate the length of the number. This type of data type is mostly used for variables of changing length.
import struct def to_compactSize_uint(v): if 0xfd > v: return struct.pack("<B", v) elif 0xffff > v: return "FD".decode("hex") + struct.pack("<H", v) elif 0xffffffff > v: return "FE".decode("hex") + struct.pack("<I", v) else: return "FF".decode("hex") + struct.pack("<Q", v) def read_compactSize_uint(s): # S is a stream of bytes # Read an unsigned char to get the format size = ord(s.read(1)) # Return the value if size < 0xFD: return size if size == 0xFD: return read_uint16(s.read(2)) if size == 0xFE: return read_uint32(s.read(4)) if size == 0xFF: return read_uint64(s.read(8))
The parse_ip bug
We’ve also tried to built a parse_ip
function to properly displaying IP addresses. But unfortunately we’ve came across when using Windows. You can read more about our attempts to deal with the bug at our trello board
Edit (4-Jul-2016): Python 2.5 to 3.5 migration
Please read the general notes about the transition from Python 2.5 to 3.5 over here. And the complete github change log for the migration over here.
Most of the data types function have remained unchanged. With the exceptions of:
The functions that dealt with reading and writing charterers were replaced by two function: to_chars
and read_chars
.
def to_chars(v, length=-1): if length == -1: length = len(v) return struct.pack(">%ss" % length, v) def read_chars(v, length= -1): if length == -1: length = len(v) return struct.unpack(">%ss" % length, v)[0]
These new functions can accept a specific variable size (length
) If now length is inserted, it will calculate the size of the string automatically. This allows us to deals with strings of varies sizes.
The parse_ip
function was fixed and replaced by the following code:
def parse_ip(ip): IPV4_COMPAT = b"\x00" * 10 + b"\xff" * 2 # IPv4 if ip[0:12] == IPV4_COMPAT: ip = read_hexa(ip[12:])# we remove the first 10 "\x00" an 2 "\xff , and convert bytes to hexa ip = "%i.%i.%i.%i" % (int(ip[0:2], 16), int(ip[2:4], 16), int(ip[4:6], 16), int(ip[6:8], 16)) # IPv6 else: # TODO pass return ip
We’ve also added a two more functions for encoding and decoding hexadecimals:
def to_hexa(v): return bytes.fromhex(v) def read_hexa(v): return v.hex()