Latinendian vs Arabendian

16/02/2020

Let's say we have a string like cat which we want to write to a file. How do we actually write this down in terms of binary data?

Well, assuming we have an ASCII string, each character is represented by a single byte - an integer between 0 and 255 - which can be converted to binary and written to the file one by one.

                               c        a        t
                              99       97      116
                        01100011 01100001 01110100

So far so good, but what if instead we have a larger number like, let's say 134480385. This number doesn't fit into a single byte. One thing we could do is split it into multiple parts each the length of a byte.

                                              134480385
                    00001000 00000100 00000010 00000001
                      part 4   part 3   part 2   part 1

Then we can write it down part by part:

               c        a        t                           134480385
              99       97      116
        01100011 01100001 01110100 00000001 00000010 00000100 00001000  

But wait a second, what happened? It seems that after we wrote it down the order of the bytes was reversed. The problem is we've written the number starting with part 1 on the right (the so called "least significant byte") and moved to the left writing each part in turn, reversing the order. But now we can't read it like a normal binary number any more because the bits are all messed up...

Well we could reverse the individual bits in the number too and tell people to read it all from left to right - then it would read correctly as a binary number in some sense, but perhaps we'd have to reverse bits every time we read and write a number. Also, it may be hard for programmers or other people reading the data to know that the bits are in the opposite order to what they expected. That seems like a good recipe for confusion.

Alternatively, we could write the parts in the opposite order starting with part 4 - the most significant byte. In this case the number would read correctly and the order of bits would be as expected, but, we would basically need to know ahead of time how large a number was if we were reading or writing it. In addition, converting between different sizes of integers such as 8-bit, 16-bit, and 32-bit would required shifting around bytes and putting in zeros to set all the bits in the right place rather than just adjusting how many bytes we read or write.

This debate, of which order to read or write the bytes, is the so called big-endian vs little-endian debate you may have heard of. Here, big-endian meaning starting with part 4 and little-endian meaning starting with part 1. And, if you're curious, most likely your machine is little-endian while the data you send over the network, for example, is probably big-endian.

                        big-endian                134480385
                        00001000 00000100 00000010 00000001

                        little-endian             134480385
                        00000001 00000010 00000100 00001000

But before you pick a side, perhaps it's worth finding out where this confusion comes from in the first place. Shouldn't writing down numbers be a relatively simple and solved problem? Well - interestingly, it has nothing to do with binary, bytes, or even computers - the problem comes from the fact that while our letters are Latin, our numerals are Arabic.

More specifically, while we inherited our text from latin, and so read characters and words from left to right, early European translators of the Arabic books from which we get our numerals (and where the text goes from right to left) reversed the ordering of words but kept the ordering of the symbols in numerals the same. While a European and someone who reads Arabic both write the number 521 with the same order of numerals, a European reads it as "five-hundred, and twenty one" - while (in formal Arabic at least) the same number is read as "one, twenty, and five hundred"[0][1].

Binary numbers are no different - just as decimal numbers are written with the smallest unit on the right and the largest on the left, binary numbers are the same, and when we try to read or write them we encounter the same confusion in ordering where mindlessly writing what you encounter from left to right doesn't work if you want to write the least significant thing first.

If we reversed the order in which we wrote numerals (or letters for that matter) there wouldn't be any more endian wars - everything would be consistent - we'd always write the least significant thing first, and then the only war left to fight would be on the choice of left-to-right or right-to-left for everything - Latinendian vs Arabendian?