simple, type-safe string formatting in Haskell

topics: ,

Years ago, on a distant website, Lennart Augustsson responded to a question about how printf can work in Haskell, and whether it is type-safe:

You can only get a type safe printf using dependent types.

— augustss

augustss ranks among the most elite of Haskell legends, so if he says so, then… hm.

Challenge accepted.

the FmtSpecifier type

Instead of a naïve String, use a more sophisticated data type to encode the format. Each conversion specifier (those things beginning with %, like %i and %s) becomes a data constructor, with its argument being the value to print.

data FmtSpecifier = FmtStr String
                  | FmtChar Char
                  | FmtInt Int
                  | FmtFloat Double

We will use a function that can convert a FmtSpecifier to a String:

convert :: FmtSpecifier -> String
convert = \case
    FmtStr s -> s
    FmtChar c -> [c]
    FmtInt i -> show i
    FmtFloat n -> show n

the sprintf and printf functions

I rarely want to convert only one format specifier into a string — normally I want to combine multiple. printf therefore takes a list of FmtSpecifiers!

sprintf :: [FmtSpecifier] -> String
sprintf = (>>= convert)

printf :: [FmtSpecifier] -> IO ()
printf = sprintf <&> putStr

Et voila:

report :: String -> Int -> IO ()
report name number =
    printf [ FmtStr name
           , FmtStr " is player "
           , FmtInt number
           , FmtChar '\n' ]

An invocation like report "Gi-hun" 456 will happily output Gi-hun is player 456.

It’s a little wordier than "%s is player %i\n", but it’s guaranteed not to ever segfault, which is nice.1

how to left-pad (without breaking the entire web)

One of the features of printf is that the caller can adjust how the values are printed, such as by specifying a maximum or minimum width (i.e. number of characters).

Not to appear incomplete, we demonstrate how to left-pad a string to a minimum width. Add another data constructor to our FmtSpecifier type:

-- data FmtSpecifier = ...
                  | FmtPaddedFmt Int Char FmtSpecifier

and tell the convert function how to handle this case:

-- convert = \case ...
    FmtPaddedFmt min_len char fmt ->
        if min_len > len
            then replicate (min_len - len) char <> str
            else str
          where
            str = convert fmt
            len = length str

If the formatted thing isn’t as long as the required width (min_len), then we prepend2 as many of the character char as we need until it is!

ghci> printf [FmtPaddedFmt 8 '0' (FmtInt 1729)]
00001729

the many looks of floating-point numbers

Somebody who wanted to add all the formatting features of decimal numbers (showing the + sign, using scientific notation, and so on) might begin with a record type encapsulating all our needs3:

data FmtFloatQualifiers = FmtFloatQualifiers {
      show_sign :: Bool
    , show_decimal_point :: Bool
    , scientific_notation :: Bool
    , precision :: Int
    }

And then, just as above:

-- data FmtSpecifier = ...
                  | FmtQualFloat FmtFloatQualifiers Double
-- convert = \case ...
    FmtQualFloat quals n -> fmt_float quals n
        where
            fmt_float :: FmtFloatQualifiers -> Double -> String
            fmt_float = undefined

Adding all these features is orthogonal to the purpose of this post, and so the definition of fmt_float is left as an exercise for the reader.


There you are! In about 30 lines we were able to do the (supposedly) impossible.

Okay, okay, I know that I haven’t outsmarted augustss with this post. There is nothing here that would surprise him in any way. That opener was a bit cheeky of me.

A later comment even clarifies that it can be done this way “if you choose a more informative type than String for the format.”; augustss apparently found this so obvious that he didn’t even need to reply.

Nonetheless, I thought it was a fun demonstration. The full code is available on GitLab.

more from the friends of danso:

Ask Microsoft to Stop Helping Israelis kill Palestinians

March 6

Microsoft (and Google, and many other tech firms) are pouring resources into tools that help the Israeli military wage a genocide. UTAW (my union) sent me an email today asking me to join in a letter-writing campaign with ActionNetwork to try to convince…

via Searching For Tao

ETS Match Specification Pitfall: Maps

February 9

During my work, I had a problem where I needed to efficiently write tree-shaped data. I also needed the ability to query ranges of the data. I reached for ETS tables, using the ordered_set table setting, and relied on match specifications to read ranges from the…

via Richard Kallos

Development of rustc_codegen_gcc #2

January 30

A couple months ago, I wrote an article that described the challenges we had to do the previous sync from the Rust repo to rustc_codegen_gcc. (if you don’t know what a sync is, please read that blog article). We also had many...

via Antoyo's Blog

generated by openring