A more functional split function

thoughts about software and science

Simon Grondin

5 min read

The split function is a major part of modern languages and string processing. It transforms a string (or an array or a list) into a list of sub-elements by splitting on every occurrence of the supplied separator. The best thing about split is that it fits right into modern programming paradigms: a split string can be mapped over and then joined, folded, filtered, etc.

Strictly speaking, split is a looping operation, but unlike a for/while loop, a map or a fold, split operates at a higher abstraction level. It’s an implicit loop. It’s about the WHAT, not the HOW. It’s about the result, not the process. There’s room for improvement, though. It can be made more generic and much more powerful.

Introducing fsplit()

Array::fsplit = (pred) ->
    buf = []
    ret = []
    @.forEach (c) ->
        buf.push c
        if pred buf
           # Add all elements except the last one
            ret.push buf[..-2]
            buf = if not pred [c] then [c] else []
    ret.push buf
    ret.filter (a) -> a.length > 0

# Split on 3
console.log [1,2,3,4,5,1,2,3,3,4,5].fsplit (a) ->
    3 in a
# [[1, 2], [4, 5, 1, 2], [4, 5]]

# Get all consecutive intervals with a sum of 4 or less
console.log [1,2,3,4,5,1,2,3,3,4,5].fsplit (a) ->
    (a.reduce (a,b) -> a+b) > 4
# [[1, 2], [3], [4], [1, 2], [3], [3], [4]]

String::fsplit = (pred) ->
    buf = ""
    ret = []
    for c in @
        buf += c
        if pred buf
            # Add all elements except the last one
            ret.push buf[..-2]
            buf = if not pred c then c else ""
    ret.push buf
    ret.filter (a) -> a.length > 0

# Each substring must contain no more than two vowels
console.log "lorem ipsum dolor sit amet".fsplit (a) ->
    a.match(/[aeiouy]{1}/g)?.length > 2
# ['lorem ', 'ipsum d', 'olor s', 'it am', 'et']

Fsplit takes a single ‘predicate’ function as argument. It loops over the array and adds one element at a time to a ‘buffer’ array. After adding an element, it calls the predicate on the buffer. If the predicate returns true, the buffer without the last element is added to the ‘ret’ array. At the end, that ‘ret’ array is returned, filtered from any 0-length elements.

Split can only operate on a fixed pattern. Fsplit can search for complex and variable patterns, it can process the entire buffer looking for very specific properties beyond what regular expressions can do. That generalization makes it much more powerful.

Fsplit abstracts away implementation details by removing a common case of explicit looping. Explicit loops are the source of innumerable errors: they are too low level and pollute the code with irrelevant details and complexity. They should be avoided whenever possible. Fsplit is a higher order function and is pure in the strictest sense. It’s the very embodiment of what functional programming is about.

Paddy3118’s Python version


  • The filter on the last line of the algorithm should an if be inside the loop, but I decided to go for the more readable version.
  • It can all be done as a fold, but in my opinion it’s a lot of boilerplate code and a lot harder at a first glance to figure out what’s going on.

This entry was posted in Algorithms, Functional Programming.