simple, type-safe string formatting in Haskell

topics: ,

Years ago, on a distant website, Lennart Augustsson responded to a question about how printf can work in Haskell, and whether it is type-safe:

You can only get a type safe printf using dependent types.

— augustss

augustss ranks among the most elite of Haskell legends, so if he says so, then… hm.

Challenge accepted.

the FmtSpecifier type

Instead of a naïve String, use a more sophisticated data type to encode the format. Each conversion specifier (those things beginning with %, like %i and %s) becomes a data constructor, with its argument being the value to print.

data FmtSpecifier = FmtStr String
                  | FmtChar Char
                  | FmtInt Int
                  | FmtFloat Double

We will use a function that can convert a FmtSpecifier to a String:

convert :: FmtSpecifier -> String
convert = \case
    FmtStr s -> s
    FmtChar c -> [c]
    FmtInt i -> show i
    FmtFloat n -> show n

the sprintf and printf functions

I rarely want to convert only one format specifier into a string — normally I want to combine multiple. printf therefore takes a list of FmtSpecifiers!

sprintf :: [FmtSpecifier] -> String
sprintf = (>>= convert)

printf :: [FmtSpecifier] -> IO ()
printf = sprintf <&> putStr

Et voila:

report :: String -> Int -> IO ()
report name number =
    printf [ FmtStr name
           , FmtStr " is player "
           , FmtInt number
           , FmtChar '\n' ]

An invocation like report "Gi-hun" 456 will happily output Gi-hun is player 456.

It’s a little wordier than "%s is player %i\n", but it’s guaranteed not to ever segfault, which is nice.1

how to left-pad (without breaking the entire web)

One of the features of printf is that the caller can adjust how the values are printed, such as by specifying a maximum or minimum width (i.e. number of characters).

Not to appear incomplete, we demonstrate how to left-pad a string to a minimum width. Add another data constructor to our FmtSpecifier type:

-- data FmtSpecifier = ... 
                  | FmtPaddedFmt Int Char FmtSpecifier

and tell the convert function how to handle this case:

-- convert = \case ...
    FmtPaddedFmt min_len char fmt ->
        if min_len > len
            then replicate (min_len - len) char <> str
            else str
            str = convert fmt
            len = length str

If the formatted thing isn’t as long as the required width (min_len), then we prepend2 as many of the character char as we need until it is!

ghci> printf [FmtPaddedFmt 8 '0' (FmtInt 1729)]

the many looks of floating-point numbers

Somebody who wanted to add all the formatting features of decimal numbers (showing the + sign, using scientific notation, and so on) might begin with a record type encapsulating all our needs3:

data FmtFloatQualifiers = FmtFloatQualifiers {
      show_sign :: Bool
    , show_decimal_point :: Bool
    , scientific_notation :: Bool
    , precision :: Int

And then, just as above:

-- data FmtSpecifier = ... 
                  | FmtQualFloat FmtFloatQualifiers Double
-- convert = \case ...
    FmtQualFloat quals n -> fmt_float quals n
            fmt_float :: FmtFloatQualifiers -> Double -> String
            fmt_float = undefined

Adding all these features is orthogonal to the purpose of this post, and so the definition of fmt_float is left as an exercise for the reader.

There you are! In about 30 lines we were able to do the (supposedly) impossible.

Okay, okay, I know that I haven’t outsmarted augustss with this post. There is nothing here that would surprise him in any way. That opener was a bit cheeky of me.

A later comment even clarifies that it can be done this way “if you choose a more informative type than String for the format.”; augustss apparently found this so obvious that he didn’t even need to reply.

Nonetheless, I thought it was a fun demonstration. The full code is available on GitLab.

more from the friends of danso:

rustc_codegen_gcc: Progress Report #27

November 8

What is rustc_codegen_gcc? rustc_codegen_gcc is a GCC ahead-of-time codegen for rustc, meaning that it can be loaded by the existing rustc frontend, but benefits from GCC by having more architectures supported and having access to GCC’s optimizations. …

via Antoyo's Blog

The Blood-Dimmed Tide

October 29

I've been trying to write something about Israel's attacks on Gaza for a couple hours now and while I have a lot to say about it, my thoughts are too angry to be able to write anything worth reading. For what it's worth, here's what I hav…

via Searching For Tao

So what were you up to last night?

October 6

Oh you know, the usual…

via Hey Heather, it’s me again.

generated by openring