Getting rid of binary strings, part 2

Click here for part 1

Prepending bits
Another common operation is to add bits to the beginning of some data.

var bin = "10011101011110100"; //80628
bin = "0110011" + bin; //51
console.log(bin); //Prints 011001110011101011110100
console.log(parseInt(bin, 2)); //Prints 6765300

It’s trivial to do it using bit manipulations as long as everything fits into the buffer.

var buffer = 80628,
    nbBits = 17;
buffer |= 51 << nbBits;
nbBits += 7; //I'm adding 7 relevant bits to the buffer
console.log(buffer); //6765300

//Explanation
// 51 << 17 = 11001100000000000000000 (51 in binary followed by 17 zeroes)
//Thus
//          10011101011110100
//            BINARY OR
//    11001100000000000000000
// => 11001110011101011110100 (6765300)


But what if it doesn’t all fit into the buffer?
Then it gets more complicated than appending, but still faster than just using binary strings (I’m guessing, I haven’t benchmarked it yet!).
Let’s suppose that I already have 45 bits of data and want to prepend 2 bits and then 9 bits, to form 7 full bytes. Note that 45 bits isn’t a multiple of 8; it’s 5 full bytes followed by 5 bits in a buffer.

var arr = [11, 225, 76, 0, 98], //40 bits here
    headBuffer = 0,
    tailBuffer = 9, //and another 5 here: 01001
    nbBitsHB = 0,
    nbBitsTB = 5;
function prependBits(val, nb){
    headBuffer |= (val << nbBitsHB);
    nbBitsHB += nb;
    //Drain the tail buffer, if possible
    while (nbBitsHB >= 8){
        //Add the 8 right-most bits to the beginning of the list
        arr.unshift(headBuffer & 255);
        headBuffer >>>= 8;
        nbBitsHB -= 8;
    }
}

prependBits(2, 2);
prependBits(255, 9);

console.log(arr); //Prints [254, 11, 225, 76, 0, 98]
console.log(nbBitsHB); //Prints 3
console.log(nbBitsTB); //Prints 5

This part is easy and works exactly like appending bits, except that instead of taking the 8 left-most bytes of the buffer and adding them to the end of the list, I take the 8 right-most and add them to the beginning of the list. Also note that I’m using 2 buffers, one for the head of the list and one for the tail. I added 11 bits to the “bitstream”, which means that 8 of them were taken and prepended to the list. The other 3 are still in the head buffer and the tail buffer still has 5 bits in it.

The next logical step is to create a flushBuffer function to shift the entire bitstream 3 bits to the right and end up with 6 full bytes and empty buffers.

function flushHeadBuffer(){
    var buffer = 0; //Used to hold the bits that will become the new headBuffer
    //Shift each byte with the "dangling" bits of the previous byte
    for (var i=0,len=arr.length;i<len;i++){
        buffer = arr[i] & ((1 << nbBitsHB) -1);
        arr[i] = (headBuffer << (8 - nbBitsHB)) | (arr[i] >>> nbBitsHB);
        headBuffer = buffer;
    }
    //After shifting the whole array, I need to merge the "dangling" bits
    //into the tailBuffer and empty it as much as possible
    tailBuffer |= headBuffer << nbBitsTB;
    nbBitsTB += nbBitsHB;
    headBuffer = 0;
    nbBitsHB = 0;
    if (nbBitsTB >= 8){
        arr.push(tailBuffer >>> (nbBitsTB - 8));
        tailBuffer &= (1 << (nbBitsTB - 8)) - 1;
        nbBitsTB -= 8;
    }
}
console.log(arr); //Prints [127, 193, 124, 41, 128, 12, 73]
console.log(nbBitsHB); //Prints 0
console.log(headBuffer); //Prints 0
console.log(nbBitsTB); //Prints 0
console.log(tailBuffer); //Prints 0

Let’s compare!
The flushHeadBuffer() function runs in O(n), which isn’t ideal, but it only needs to be called once, at the end before reading the binary data, so the binary manipulations should be way faster than copying strings thousands of times, right?

var arr = [],
    bin = "000010111110000101001100000000000110001001001";
for (var i=0;i<35000;i++){
    bin = "10" + bin;
    bin = "011111111" + bin;
    bin = "0101000111" + bin;
}
if (bin.length % 8 !== 0){
    bin += Array(8 - bin.length%8 +1).join("0");
}
for (var i=0, len=Math.ceil(bin.length/8);i<len;i++){
    arr.push(parseInt(bin.substring(i*8, (i+1)*8), 2));
}

VS

var arr = [11, 225, 76, 0, 98], //40 bits here
    headBuffer = 0,
    tailBuffer = 9, //and another 5 here: 01001
    nbBitsHB = 0,
    nbBitsTB = 5;
for (var i=0;i<35000;i++){
    prependBits(2, 2); //10
    prependBits(255, 9); //011111111
    prependBits(327, 10); //0101000111
}
flushHeadBuffer();

155ms for the strings.. and 3.6 SECONDS for the binary manipulations. My first reaction was to comment out the flushHeadBuffer function, but it had no significant impact. I found the culprit pretty fast: it’s the unshift() function. Push() appends an element to a list and unshift() prepends. The catch is that Javascript’s “arrays” are actually single-linked list* and they’re optimized for appending, so unshift needs to iterate over every single element, every time and as the list gets longer it takes more and more time. It takes over 32 seconds with 50,000 iterations!

This is the culprit line:

arr.unshift(headBuffer & 255);

Let’s try list concatenations.
I replaced it with:

arr = [headBuffer & 255].concat(arr);

4.0 seconds.

What about splicing then?

arr.splice(0, 0, headBuffer & 255);

3.6 seconds, unsurprisingly.

Wait, so appending to a list is really fast and O(1), right?

var head = [];
head.push(headBuffer & 255);


//And in flushHeadBuffer:
arr = head.reverse().concat(arr);

4 milliseconds. That’s a 38x improvement over binary strings.

The flushHeadBuffer function can be adapted to allow inserting anywhere in the stream.
Finally, here’s the library containing all the features shown here: bitconcatJS.

 

* Pretty much but not exactly, but that’s for another post.

 

Leave a Reply