simple, type-safe string formatting in Haskell

posted on November 8, 12021 HE
topics: haskell, programming

Years ago, on a distant website, Lennart Augustsson responded to a question about how printf can work in Haskell, and whether it is type-safe:

You can only get a type safe printf using dependent types.

— augustss

augustss ranks among the most elite of Haskell legends, so if he says so, then… hm.

Challenge accepted.

the FmtSpecifier type

Instead of a naïve String, use a more sophisticated data type to encode the format. Each conversion specifier (those things beginning with %, like %i and %s) becomes a data constructor, with its argument being the value to print.

data FmtSpecifier = FmtStr String
                  | FmtChar Char
                  | FmtInt Int
                  | FmtFloat Double

We will use a function that can convert a FmtSpecifier to a String:

convert :: FmtSpecifier -> String
convert = \case
    FmtStr s -> s
    FmtChar c -> [c]
    FmtInt i -> show i
    FmtFloat n -> show n

the sprintf and printf functions

I rarely want to convert only one format specifier into a string — normally I want to combine multiple. printf therefore takes a list of FmtSpecifiers!

sprintf :: [FmtSpecifier] -> String
sprintf = (>>= convert)

printf :: [FmtSpecifier] -> IO ()
printf = sprintf <&> putStr

Et voila:

report :: String -> Int -> IO ()
report name number =
    printf [ FmtStr name
           , FmtStr " is player "
           , FmtInt number
           , FmtChar '\n' ]

An invocation like report "Gi-hun" 456 will happily output Gi-hun is player 456.

It’s a little wordier than "%s is player %i\n", but it’s guaranteed not to ever segfault, which is nice.1

how to left-pad (without breaking the entire web)

One of the features of printf is that the caller can adjust how the values are printed, such as by specifying a maximum or minimum width (i.e. number of characters).

Not to appear incomplete, we demonstrate how to left-pad a string to a minimum width. Add another data constructor to our FmtSpecifier type:

-- data FmtSpecifier = ... 
                  | FmtPaddedFmt Int Char FmtSpecifier

and tell the convert function how to handle this case:

-- convert = \case ...
    FmtPaddedFmt min_len char fmt ->
        if min_len > len
            then replicate (min_len - len) char <> str
            else str
            str = convert fmt
            len = length str

If the formatted thing isn’t as long as the required width (min_len), then we prepend2 as many of the character char as we need until it is!

ghci> printf [FmtPaddedFmt 8 '0' (FmtInt 1729)]

the many looks of floating-point numbers

Somebody who wanted to add all the formatting features of decimal numbers (showing the + sign, using scientific notation, and so on) might begin with a record type encapsulating all our needs3:

data FmtFloatQualifiers = FmtFloatQualifiers {
      show_sign :: Bool
    , show_decimal_point :: Bool
    , scientific_notation :: Bool
    , precision :: Int

And then, just as above:

-- data FmtSpecifier = ... 
                  | FmtQualFloat FmtFloatQualifiers Double
-- convert = \case ...
    FmtQualFloat quals n -> fmt_float quals n
            fmt_float :: FmtFloatQualifiers -> Double -> String
            fmt_float = undefined

Adding all these features is orthogonal to the purpose of this post, and so the definition of fmt_float is left as an exercise for the reader.

There you are! In about 30 lines we were able to do the (supposedly) impossible.

Okay, okay, I know that I haven’t outsmarted augustss with this post. There is nothing here that would surprise him in any way. That opener was a bit cheeky of me.

A later comment even clarifies that it can be done this way “if you choose a more informative type than String for the format.”; augustss apparently found this so obvious that he didn’t even need to reply.

Nonetheless, I thought it was a fun demonstration. The full code is available on GitLab.

  1. And printing one percent sign doesn’t require two, what a bonus!↩︎

  2. The replicate function produces the empty list when given a number of zero or less, so checking that the length we need is more than zero is not entirely necessary. I always feel that trying to make a list with negative length is something that a strong, static type system should prevent.↩︎

  3. Optionally, using custom types instead of (particularly) Bool values would make their meanings more explicit.↩︎

more from the friends of danso:

rustc_codegen_gcc: Progress Report #13

July 7

What is rustc_codegen_gcc? rustc_codegen_gcc is a GCC ahead-of-time codegen for rustc, meaning that it can be loaded by the existing rustc frontend, but benefits from GCC by having more architectures supported and having access to GCC’s optimizations. …

via Antoyo's Blog

Learning SSH etc

June 17

I am job searching! Fun times. I also have a weirdly high proportion of friends simultaneously job searching, which is cool because we can discuss interview stuff. This is exactly what led me down the road labelled "learn SSH" - my friend and co-j…

via Heather B-Log

Just: the most reductive word in software development

May 1

(Intentionally provocative title.) Do you use the word “just”? Have you ever said something like “oh, we just need to implement that interface for this new type” or “it’s just a small bash script” or something to that effect? I have, many times, and still …

via Occasionally sane

generated by openring