As we all know, a computer only knows two things: 1s and 0s. Every letter in this sentence, every color, every second of a video or of a piece of music, every web page, every program is nothing other than a long string of 1s and 0s. This is binary, and if we hope to communicate efficiently with these machines as programmers, we must understand how this base 2 numbering system works.
Why do Computers Use the Binary System?
We humans have 10 fingers (or digits). It is perfectly logical for us to count in base 10, that is, with a decimal system. It’s totally arbitrary: had we been octopuses, with eight legs and no fingers, a base 8 (or octal) would seem natural to us. If we were dolphins, with only two flippers, we would never consider anything other than base 2, or binary.
A computer is not a dolphin and has no arms or fingers for that matter. What it does have is electrical current. It can distinguish between a powered electronic component (on) and an inactive one (off). We represent these two states with the symbols 1 and 0 respectively.
Understanding Decimal Positional Notation
To understand a computer’s binary system, we must first break down our human decimal system.
In base 10, we have a unique symbol for each digit from 0 to 9.
Number Decimal
Zero 0
One 1
Two 2
Three 3
Four 4
Five 5
Six 6
Seven 7
Eight 8
Nine 9
Ten No symbol!
Eleven No symbol!
Code language: plaintext (plaintext)
We don’t have any unique symbols for the numbers that follow like ten, eleven, twelve, etc. We’d have to spend our entire lives at school if there were as many symbols as there are numbers! Thankfully, we’ve devised a clever system that allows us to reuse these ten base digits over and over, to express larger numbers: positional notation.
With this notation, when we get to the number ten, we only need to add a digit to the left to indicate tens, and then when we get to a hundred, we add another one to the left to denote hundreds, etc. Each digit to the right of the unit represents a power of ten.
Let’s take the number 54,627, for example. In this number there are 7 units, 2 tens, 6 hundreds, 4 thousands and 5 tens of thousands. We could write this number this way:
(5 x 10 000) + (4 x 1 000) + (6 x 100) + (2 x 10) + (7 x 1)
Code language: plaintext (plaintext)
To summarize this in a convenient table:
5 | 4 | 6 | 2 | 7 |
x 10,000 | x 1,000 | x 100 | x 10 | x 1 |
10^{4} | 10^{3} | 10^{2} | 10^{1} | 10^{0} |
Tens of thousands | Thousands | Hundreds | Tens | Units |
Counting in Binary
First and foremost, let’s note that each binary digit is called a bit. A bit can have one of two values: 0 or 1.
Just like the decimal system, binary also uses positional notation, only there are fewer symbols to work with. We can count 0, then 1, and we’ve already run out of symbols. So, we reuse these symbols and add a bit to the left: 2 in decimal is written 10 in binary, 3 is written 11, 4 is 100, etc. In base 2, each bit to the left of the unit represents a power of 2.
Having so few symbols means that binary numbers become pretty long pretty quick:
Decimal Binary
0 0
1 1
2 10 <-- 2 power of 1
3 11
4 100 <-- 2 power of 2
5 101
6 110
7 111
8 1000 <-- 2 power of 3
9 1001
10 1010
11 1011
12 1100
13 1101
14 1110
15 1111
16 10000 <-- 2 power of 4
17 10001
18 10010
19 10011
20 10100
... ...
100 1100100
... ...
1000 1111101000
... ...
9000 10001100101000
Code language: plaintext (plaintext)
Take the binary number 101010. We can deduce its decimal equivalent this way:
1 | 0 | 1 | 0 | 1 | 0 |
x 32 | x 16 | x 8 | x 4 | x 2 | x 1 |
2^{5} | 2^{4} | 2^{3} | 2^{2} | 2^{1} | 2^{0} |
Thirty-twos | Sixteens | Eights | Fours | Twos | Units |
2^1 + 2^3 + 2^5
= 2 + (2 x 2 x 2) + (2 x 2 x 2 x 2 x 2)
= 2 + 8 + 32
= 42
Code language: plaintext (plaintext)
So 101010 in binary is worth 42 in decimal.
Negative Numbers in Binary
In our decimal system, all we need to do to indicate that a number is negative is add a sign (-) in front. It would make sense to do the same in binary, and for a human usage, this would be perfect. The problem is that computers really don’t understand anything at all besides 0s and 1s. It’s impossible to add another symbol to mark the sign of a number. It’s either on or off, that’s it.
Yet computers do know about negative numbers, and thank goodness for that! So, how do they do it, without a sign of some sort? To understand this, we need to first take a look at how a computer stores binary numbers.
How Does a Computer Store a Binary Number?
A binary number, just like a decimal number, can be written with as many 0s as we’d like without altering its value.
Binary Decimal
111 = 7
0111 = 7
00111 = 7
Code language: plaintext (plaintext)
As a matter of fact, computers store numbers over several bits, regardless of the number’s actual value. Typically, an integer is stored within 4 bytes, which is 32 bits. This is slightly overkill, for our little number 7:
0000 0000 0000 0000 0000 0000 0000 0111 = 7
Code language: plaintext (plaintext)
Note: the spaces are only there for clarity, in reality, there are no spaces between bits.
It is no coincidence that the largest number that can be stored in an unsigned integer is 4,294,967,295, which is written this way over 32 bits:
1111 1111 1111 1111 1111 1111 1111 1111
Code language: plaintext (plaintext)
Segfault after segfault, we may have come to learn that the maximum number a signed integer can hold is 2,147,483,647, or, in binary:
0111 1111 1111 1111 1111 1111 1111 1111
Code language: plaintext (plaintext)
Let’s note that here, the leftmost bit doesn’t seem to be used…
So let’s see what the minimum number a signed integer can hold, -2,147,483,648, looks like in binary:
1000 0000 0000 0000 0000 0000 0000 0000
Code language: plaintext (plaintext)
Surprise! All the bits are inverted, including the leftmost bit, which takes on the function of a sign. This is how computers turn a positive number into a negative one, by inverting its bits.
As an aside, let’s highlight the fact that, to a computer, there is no intrinsic difference between positive and negative numbers.
1111 1111 1111 1111 1111 1111 1111 1010
Code language: plaintext (plaintext)
This number could be either +4,294,967,290 or -6. This is why, in most low-level programming languages like C for instance, we must specify whether our variables are of type int
or unsigned int
. It is solely a question of interpretation.
Why Invert the Bits?
Doesn’t inverting all the bits of a number seem like a lot of work? Why not simply indicate a negative binary number by changing the leftmost bit and be done with it?
The first problem we run into with that approach is that we’d have two representations of the number 0:
0000 = 0
1000 = 0
Code language: plaintext (plaintext)
But worst of all, to use this method, we’d have to rewire the entire binary addition algorithm (described below). If one of the numbers in our addition is negative, we get a totally wrong result:
Unsigned Decimal | Binary Addition | Signed Decimal |
3 |
| 3 |
+ 132 | + | + (-4) |
= 135 | = | = -7 (Wrong!) |
If the number is interpreted as an unsigned integer, the addition works fine. However, if the number is a signed integer, the addition is completely wrong.
This double representation of 0 as well as these arithmetic problems led to the conception of a better system: inverting every bit.
One’s Complement
One’s complement is the value resulting from the operation to invert all the bits in a number.
0111 = 7
1000 = -7 (one's complement)
Code language: plaintext (plaintext)
This representation of a negative number works perfectly well with binary mathematical operations:
Unsigned Decimal | Binary Addition | Signed Decimal |
3 |
| 3 |
+ 251 | + | + (-4) |
= 254 | = | = -1 |
But this alone does not fix the double zero problem:
0000 = 0
1111 = 0
Code language: plaintext (plaintext)
Having two possible values to represent a zero means having to create two distinct tests to measure the null value of a result, which is not very convenient. To counter this shortcoming, a new method was devised, one which almost every modern computer uses: two’s complement.
Two’s Complement
To get a two’s complement, we only need add a simple step after we invert the bits as for the one’s complement: add 1.
- Take a positive number
- Invert the bits
- Add 1
0111 = 7
1000 -> invert the bits (one's complement)
+ 0001 -> add 1 (two's complement)
1001 = -7
Code language: plaintext (plaintext)
There it is, the answer to both our problems. Zero is now a unique and distinct number. Even if we try to convert 0 into its negative counterpart, the result will always be 0:
0000 = 0
1111 -> invert the bits (one's complement)
+ 0001 -> add 1 (two's complement)
0000 = 0
Code language: plaintext (plaintext)
We can also check that the usual binary arithmetic is error-free:
Unsigned Decimal | Binary Addition | Signed Decimal |
3 |
| 3 |
+ 252 | + | + (-4) |
= 255 | = | = -1 |
Calculating in Binary Like a Computer
We will never be able to calculate as fast as a computer, but we can try! Or, at the very least, we can understand the peculiar methods computers use to add, subtract, multiply and divide in binary.
Adding
The binary addition table is very simple:
+ | 0 | 1 |
0 | 0 | 1 |
1 | 1 | 10 |
Let’s go back to 2nd grade but this time, let’s take the binary numbers 101010 (42 decimal) and 1111 (15 decimal). As with any decimal number, we can add them together from right to left and carrying any digits over to the next position if need be. In the following example, we will add two 0s before the number 1111 to make it easier to read, which doesn’t affect its value any more than writing 0015 in decimal.
111 <-- carries
101010 <-- 42
+ 001111 <-- 15
--------
111001 <-- 57
Code language: plaintext (plaintext)
What we’ve done here can be explained another way:
From right to left:
0 + 1 = 1
1 + 1 = 0, carry the 1
0 + 1 + 1 = 0, carry the 1
1 + 1 + 1 = 1, carry the 1
0 + 0 + 1 = 1
1 + 0 = 1
For the final result: 111001
Code language: plaintext (plaintext)
Subtracting
Whether in decimal or binary, subtracting is a more complex operation than addition.
We might have vague memories of the borrowing method, which consists of stealing one digit from the left to subtract a bigger digit from a smaller one. The bridging technique might ring a bell, or the compensation method. We humans can use any of these techniques with our fingers and pencils and paper. Electronic components don’t find any of these very convenient.
A computer just doesn’t do subtraction.
Instead of subtracting, it’s much more efficient to simply add negative numbers. So instead of doing 42 – 3 for instance, a computer would rather do 42 + (-3). It’s a subtle distinction, but a very important one to a computer that can add at lightning speed and doesn’t want to have superfluous circuitry for other operations.
To perform the operation 42 – 3 as a computer would, we first have to turn the positive number, 3, into a negative one. As we’ve seen previously, a computer uses the two’s complement method.
0000011 = 3
1111100 -> invert the bits (one's complement)
+ 0000001 -> add 1 (two's complement)
1111101 = -3
Code language: plaintext (plaintext)
Then, we can add 42 and -3:
1111 -> carries
0101010 = 42
+ 1111101 = -3
(1)0100111 = 39 (ignoring the last carry on the left which is an overflow)
Code language: plaintext (plaintext)
Multiplying
The binary multiplication table is predictably much easier to memorize than the decimal one:
x | 0 | 1 |
0 | 0 | 0 |
1 | 0 | 1 |
Despite this simplicity, a computer would still much rather ignore it and add instead. For a computer, 42 x 3 is simply 42 + 42 + 42. What could be simpler?
1 1 1 -> carry
00101010 -> 42
+ 00101010 -> + 42
+ 00101010 -> + 42
---------
01111110 -> = 126
Dividing
Once again, the computer refuses to do divisions the human way and stubbornly sticks to addition.
For a computer, the operation 42/7 means subtracting 7 from 42 (well, adding -7 to 42) until the operation results in 0 or a smaller number than 7 and counting how many times it performed the operation before having to stop.
42 + (-7) + (-7) + (-7) + (-7) + (-7) + (-7) = 0
1 + 1 + 1 + 1 + 1 + 1 = 6
Code language: plaintext (plaintext)
Or, for a number that’s not divisible by 7:
41 + (-7) + (-7) + (-7) + (-7) + (-7) = 6
1 + 1 + 1 + 1 + 1 = 5
Code language: plaintext (plaintext)
As we can see, a division is clearly more complicated for a computer than any other arithmetic operation. This process is indeed a significantly slower one, and we must try to avoid it by using multiplication whenever possible if we wish to optimize our programs for speed.
Conclusion
Because of its electronic nature, a computer only knows binary and can only add. However simple its operation at the smallest scale, a computer can do wonders: edit text, display an image, simulate entire virtual worlds. All it needs is millions of tiny electronic components to do billions of operations every second.
It’s a little like neurons in living organisms: there is a bio-electric signal or there is not. Enough interconnected neurons can give rise to emotion, imagination, calculation, logic. And maybe even a brain that can create machines that can count and calculate in binary…
Sources and Further Reading
- Charles Petzold, 1999, Code: The Hidden Language of Computer Hardware and Software
- Greg Perry, 2002, Absolute Beginner’s Guide to Programming, 3rd Edition, Binary Arithmetic [InformIT]
- Wikipedia, Postional Notation [Wikipedia]
- Wikipedia, Binary Number [Wikipedia]
- The Organic Chemistry Tutor, Binary Addition and Subtraction With Negative Numbers, 2’s Complements & Signed Magnitude [YouTube]
- In One Lesson, How a CPU Works [YouTube]