Logarithmic scaling with the fastest JSON validator

If there’s something any backend engineer loves, it’s when they manage to solve a problem in an O(log(n)) way. I did that this week.

As the number of requests per second goes up exponentially, the number of servers needed to handle them goes up linearly (until we reach a bandwidth bottleneck). Amazing!

First, let me recap. We were having performance issues in an application. We needed a lot of big and expensive EC2 compute-optimized instances to handle fairly low traffic by our standards (250mb/s). Those instances were burning CPU like crazy. The codebase is built with performance in mind and we couldn’t find a single inefficiency in it. That means it’s time for flame graphs!

They look like this:
flame

Continue reading

A more functional split function

The split function is a major part of modern languages and string processing. It transforms a string (or an array or a list) into a list of sub-elements by splitting on every occurrence of the supplied separator. The best thing about split is that it fits right into modern programming paradigms: a split string can be mapped over and then joined, folded, filtered, etc.

Strictly speaking, split is a looping operation, but unlike a for/while loop, a map or a fold, split operates at a higher abstraction level. It’s an implicit loop. It’s about the WHAT, not the HOW. It’s about the result, not the process. There’s room for improvement, though. It can be made more generic and much more powerful.
Continue reading

My new favorite sort algorithm

I haven’t laughed that much in a while.

I just learned about 4chan’s brilliant Sleep Sort algorithm (2011): start a thread for each integer in the array and make it sleep X seconds where X is the value of the integer. When the thread resumes, add that integer to the end of the sorted array. Time complexity is left as an exercise for the reader.

My asynchronous JS implementation:


Array.prototype.sleepsort = function(callback){
    var self = this,
        sorted = [];
    for (var i=0,len=self.length;i<len;i++){
        (function(i){
            setTimeout(function(){
                sorted.push(self[i]);
                if (sorted.length === self.length){
                    callback(sorted);
                }
            }, self[i]);
        })(i);
    }
};

[85, 91, 25, 72, 37, 27, 24, 42, 38, 18].sleepsort(function(arr){
    console.log(arr); // [18, 24, 25, 27, 37, 38, 42, 72, 85, 91]
});

The original bash version:


#!/bin/bash
function f() {
    sleep "$1"
    echo "$1"
}
while [ -n "$1" ]
do
    f "$1" &
    shift
done
wait

Original 4chan thread
Code golf on Stack Overflow
Mandatory XKCD reference

Getting rid of binary strings, part 2

Click here for part 1

Prepending bits
Another common operation is to add bits to the beginning of some data.

var bin = "10011101011110100"; //80628
bin = "0110011" + bin; //51
console.log(bin); //Prints 011001110011101011110100
console.log(parseInt(bin, 2)); //Prints 6765300

It’s trivial to do it using bit manipulations as long as everything fits into the buffer.

var buffer = 80628,
    nbBits = 17;
buffer |= 51 < < nbBits;
nbBits += 7; //I'm adding 7 relevant bits to the buffer
console.log(buffer); //6765300

//Explanation
// 51 << 17 = 11001100000000000000000 (51 in binary followed by 17 zeroes)
//Thus
//          10011101011110100
//            BINARY OR
//    11001100000000000000000
// => 11001110011101011110100 (6765300)

Continue reading

Getting rid of binary strings, part 1

Every programmer is used to manipulating data at the byte level, but most rarely, if ever, have to directly manipulate bits. When such a situation arises, the “obvious” way to do it (when it comes to beginners) is to use binary strings, that is, strings of ones and zeroes. The manipulations are done using the usual string concatenation, substr and other basic string manipulation tools. The string is then finally parsed 8 characters at a time into bytes (using the language’s equivalent of parseInt()). In many languages, including Java and Javascript, strings are immutable, so any concatenation causes the string to be copied all over, everytime.

Any half decent programmer would feel “dirty” writing such code, but my feeling is that most would do it anyway and say something like “it’s fast enough”, “it’s more readable”, “I’ll optimize it later if it’s too slow, but that’ll do for now” to justify themselves. Until relatively recently, I used to be one of them and hopefully you will know better than using binary strings after reading this. I’ll be using Javascript, but this article is language-agnostic.

Partial bytes are often used in compression and stream algorithms.

Appending bits
By far the most common operation performed on them is to append more binary data, such as:

var arr = [],
    bin = "";

bin += "101110";
bin += "0101";
bin += "0000";
bin += "1100110011";

//Pad the last byte if needed
if (bin.length % 8 !== 0){
    bin += Array(8 - bin.length%8 +1).join("0");
}

//Parse the string
for (var i=0, len=Math.ceil(bin.length/8);i<len;i++){
    arr.push(parseInt(bin.substring(i*8, (i+1)*8), 2));
}

console.log(arr); //Prints [185, 67, 51]

Continue reading