String#split and the principle of least surprise

Designing APIs is a tricky business. On the one hand, you want to support common use cases via smart defaults. On the other hand, the API should generalize from the defaults in a sensical way. When it doesn’t, it violates the principle of least surprise and confuses people. Gary Bernhardt uses the word Wat to describe extreme cases of this. When the developers and users of such APIs defend such unintuitive, confusing behaviour, Zed Shaw draws an analogy to brain damage.

Consider Ruby’s String#split. Given a regular expression, it splits a string on matches of the expression. Given a string, it splits a string on occurrences of the string.

str = "Ruby
on Rails"
# => "Ruby\non Rails"

str.split(/ /)
# => ["Ruby\non", "Rails"]

str.split(/\n/)
# => ["Ruby", "on Rails"]

str.split("on")
# => ["Ruby\n", " Rails"]

Except when it doesn’t.

str.split(" ")
# => ["Ruby", "on", "Rails"]

Wat?

The Ruby docs explain:

If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored.

Splitting a string on whitespace, (i.e. extracting its words) is a common use case, and so perhaps this should have been the behaviour of String#split when called with no arguments. But treating a non-zero-length string as a special case is confusing, surprising behaviour that is pretty much impossible to guess from the behaviour of the same method on any other input.

In other words,

Wat.

Comments