# Getting rid of binary strings, part 1 Every programmer is used to manipulating data at the byte level, but most rarely, if ever, have to directly manipulate bits. When such a situation arises, the “obvious” way to do it (when it comes to beginners) is to use binary strings, that is, strings of ones and zeroes. The manipulations are done using the usual string concatenation, substr and other basic string manipulation tools. The string is then finally parsed 8 characters at a time into bytes (using the language’s equivalent of parseInt()). In many languages, including Java and Javascript, strings are immutable, so any concatenation causes the string to be copied all over, everytime.

Any half decent programmer would feel “dirty” writing such code, but my feeling is that most would do it anyway and say something like “it’s fast enough”, “it’s more readable”, “I’ll optimize it later if it’s too slow, but that’ll do for now” to justify themselves. Until relatively recently, I used to be one of them and hopefully you will know better than using binary strings after reading this. I’ll be using Javascript, but this article is language-agnostic.

Partial bytes are often used in compression and stream algorithms.

Appending bits

By far the most common operation performed on them is to append more binary data, such as:

``````
var arr = [],
bin = "";

bin += "101110";
bin += "0101";
bin += "0000";
bin += "1100110011";

//Pad the last byte if needed
if (bin.length % 8 !== 0){
bin += Array(8 - bin.length%8 +1).join("0");
}

//Parse the string
for (var i=0, len=Math.ceil(bin.length/8);i< len;i++){
arr.push(parseInt(bin.substring(i*8, (i+1)*8), 2));
}

console.log(arr); //Prints [185, 67, 51] ``````

Let’s replicate that example using bit manipulations. I’ll need some kind of buffer to hold incomplete bytes. A long in most languages is ideal due to being 64 bits, but I’ll be using Javascript’s 32-bit integers.

``````
var arr = [], //List to store the full bytes
buffer = 0; //Buffer to store incomplete bytes

//Make space for the new bits
buffer = buffer < < 6;
buffer = buffer | 46; //101110 = 46

//0101 = 5
buffer = (buffer < < 4) | 5;
//There's now 10 bits in the buffer
//Put the 8 left-most bits in the list
arr.push(buffer >>> 2);
//Only keep the last 2 bits in the buffer
//This builds a mask to drop everything except the last 2 bits
//    1011100101
//    BINARY AND
//    0000000011
// => 0000000001
buffer &= (1 < < 2) - 1;

//0000 = 0
buffer = (buffer < < 4) | 0;

//1100110011 = 819
buffer = (buffer < < 10) | 819;
//There's now 16 bits in the buffer
arr.push(buffer >>> 8);
//Remove them
buffer &= (1 < < 8) - 1;
arr.push(buffer);

console.log(arr);
//Prints [185, 67, 51]``````

Here’s how the buffer int and arr list look like throughout the above example:

``````
[]
00000000 00000000 00000000 00000000

[]
00000000 00000000 00000000 00101110

[]
00000000 00000000 00000010 11100101


00000000 00000000 00000000 00000001


00000000 00000000 00000000 00010000


00000000 00000000 01000011 00110011

[10111001, 01000011, 00110011]
=> [185, 67, 51]``````

As soon as the buffer contains more than 7 bits, I empty as many full bytes as possible from it, starting from the left. An obvious limitation of this method is that I can’t concatenate more than 25 bits at a time (32-7, or 57 with 64 bits integers). However, it is more than enough for the vast majority of applications and the function can be called multiple times anyway.

Let’s abstract that algorithm into a function.

``````
var arr = [],
buffer = 0,
nbBits = 0;
function appendBits(val, nb){
buffer = (buffer < < nb) | val;
nbBits += nb;
while (nbBits >= 8){
arr.push(buffer >>> (nbBits - 8));
buffer &= (1 < < (nbBits - 8)) - 1;
nbBits -= 8;
}
}
appendBits(46, 6); //101110
appendBits(5, 4); //0101
appendBits(0, 4); //0000
appendBits(819, 10); //1100110011

console.log(arr); //[185, 67, 51]``````

How much faster is it?

``````for (var i=0;i<1000000;i++){
bin += "101110";
bin += "0101";
bin += "0000";
bin += "1100110011";
}
for (var i=0, len=Math.ceil(bin.length/8);i˂len;i++){
arr.push(parseInt(bin.substring(i*8, (i+1)*8), 2));``````

VS

``````for (var i=0;i<1000000;i++){
appendBits(46, 6); //101110
appendBits(5, 4); //0101
appendBits(0, 4); //0000
appendBits(819, 10); //1100110011
}``````

On my i7-3770K with Node v0.10.4, binary strings take 600ms to concatenate and parse 4 million short binary strings (for a total of 3,000,000 bytes). The same test takes 135 milliseconds with the function. Binary strings run out of memory if I try with 40 million concatenations, the function simply takes 10 times longer, as expected.

In part 2, I’ll be prepending bits, it’s quite a bit (pun intended) trickier!

This entry was posted in Algorithms, Binary, JavaScript.