simple, type-safe string formatting in Haskell

topics: haskell, programming

Years ago, on a distant website, Lennart Augustsson responded to a question about how printf can work in Haskell, and whether it is type-safe:

You can only get a type safe printf using dependent types.

— augustss

augustss ranks among the most elite of Haskell legends, so if he says so, then… hm.

Challenge accepted.

the FmtSpecifier type

Instead of a naïve String, use a more sophisticated data type to encode the format. Each conversion specifier (those things beginning with %, like %i and %s) becomes a data constructor, with its argument being the value to print.

data FmtSpecifier = FmtStr String
                  | FmtChar Char
                  | FmtInt Int
                  | FmtFloat Double

We will use a function that can convert a FmtSpecifier to a String:

convert :: FmtSpecifier -> String
convert = \case
    FmtStr s -> s
    FmtChar c -> [c]
    FmtInt i -> show i
    FmtFloat n -> show n

the sprintf and printf functions

I rarely want to convert only one format specifier into a string — normally I want to combine multiple. printf therefore takes a list of FmtSpecifiers!

sprintf :: [FmtSpecifier] -> String
sprintf = (>>= convert)

printf :: [FmtSpecifier] -> IO ()
printf = sprintf <&> putStr

Et voila:

report :: String -> Int -> IO ()
report name number =
    printf [ FmtStr name
           , FmtStr " is player "
           , FmtInt number
           , FmtChar '\n' ]

An invocation like report "Gi-hun" 456 will happily output Gi-hun is player 456.

It’s a little wordier than "%s is player %i\n", but it’s guaranteed not to ever segfault, which is nice.1

how to left-pad (without breaking the entire web)

One of the features of printf is that the caller can adjust how the values are printed, such as by specifying a maximum or minimum width (i.e. number of characters).

Not to appear incomplete, we demonstrate how to left-pad a string to a minimum width. Add another data constructor to our FmtSpecifier type:

-- data FmtSpecifier = ... 
                  | FmtPaddedFmt Int Char FmtSpecifier

and tell the convert function how to handle this case:

-- convert = \case ...
    FmtPaddedFmt min_len char fmt ->
        if min_len > len
            then replicate (min_len - len) char <> str
            else str
            str = convert fmt
            len = length str

If the formatted thing isn’t as long as the required width (min_len), then we prepend2 as many of the character char as we need until it is!

ghci> printf [FmtPaddedFmt 8 '0' (FmtInt 1729)]

the many looks of floating-point numbers

Somebody who wanted to add all the formatting features of decimal numbers (showing the + sign, using scientific notation, and so on) might begin with a record type encapsulating all our needs3:

data FmtFloatQualifiers = FmtFloatQualifiers {
      show_sign :: Bool
    , show_decimal_point :: Bool
    , scientific_notation :: Bool
    , precision :: Int

And then, just as above:

-- data FmtSpecifier = ... 
                  | FmtQualFloat FmtFloatQualifiers Double
-- convert = \case ...
    FmtQualFloat quals n -> fmt_float quals n
            fmt_float :: FmtFloatQualifiers -> Double -> String
            fmt_float = undefined

Adding all these features is orthogonal to the purpose of this post, and so the definition of fmt_float is left as an exercise for the reader.

There you are! In about 30 lines we were able to do the (supposedly) impossible.

Okay, okay, I know that I haven’t outsmarted augustss with this post. There is nothing here that would surprise him in any way. That opener was a bit cheeky of me.

A later comment even clarifies that it can be done this way “if you choose a more informative type than String for the format.”; augustss apparently found this so obvious that he didn’t even need to reply.

Nonetheless, I thought it was a fun demonstration. The full code is available on GitLab.

  1. And printing one percent sign doesn’t require two, what a bonus!↩︎

  2. The replicate function produces the empty list when given a number of zero or less, so checking that the length we need is more than zero is not entirely necessary. I always feel that trying to make a list with negative length is something that a strong, static type system should prevent.↩︎

  3. Optionally, using custom types instead of (particularly) Bool values would make their meanings more explicit.↩︎

more from the friends of danso:

rustc_codegen_gcc: Progress Report #21

March 13

What is rustc_codegen_gcc? rustc_codegen_gcc is a GCC ahead-of-time codegen for rustc, meaning that it can be loaded by the existing rustc frontend, but benefits from GCC by having more architectures supported and having access to GCC’s optimizations. …

via Antoyo's Blog

The Tyranny of Existing Code

March 12

Most programmers have experienced that feeling of worthlessness and powerlessness when we are trying to implement what should be a simple change, but the existing code base resists our attempts. Worse, often we can see what is causing us to be blocked, yet…

via Reasonable Performance

The Sustainable Travel Zone

January 5

There's some exciting stuff happening here in Cambridge. The county council (the tier of government between municipal and national that handles Cambridge and the surrounding smaller towns) has declared that they want to build what they're calling…

via Searching For Tao

generated by openring