simple, type-safe string formatting in Haskell
Years ago, on a distant website, Lennart Augustsson responded to a question about how printf
can work in Haskell, and whether it is type-safe:
You can only get a type safe printf using dependent types.
— augustss
augustss ranks among the most elite of Haskell legends, so if he says so, then… hm.
Challenge accepted.
the FmtSpecifier
type
Instead of a naïve String
, use a more sophisticated data type to encode the format. Each conversion specifier (those things beginning with %
, like %i
and %s
) becomes a data constructor, with its argument being the value to print.
data FmtSpecifier = FmtStr String
| FmtChar Char
| FmtInt Int
| FmtFloat Double
We will use a function that can convert a FmtSpecifier
to a String
:
convert :: FmtSpecifier -> String
= \case
convert FmtStr s -> s
FmtChar c -> [c]
FmtInt i -> show i
FmtFloat n -> show n
the sprintf
and printf
functions
I rarely want to convert only one format specifier into a string — normally I want to combine multiple. printf
therefore takes a list of FmtSpecifier
s!
sprintf :: [FmtSpecifier] -> String
= (>>= convert)
sprintf
printf :: [FmtSpecifier] -> IO ()
= sprintf <&> putStr printf
Et voila:
report :: String -> Int -> IO ()
=
report name number FmtStr name
printf [ FmtStr " is player "
, FmtInt number
, FmtChar '\n' ] ,
An invocation like report "Gi-hun" 456
will happily output Gi-hun is player 456
.
It’s a little wordier than "%s is player %i\n"
, but it’s guaranteed not to ever segfault, which is nice.1
how to left-pad (without breaking the entire web)
One of the features of printf
is that the caller can adjust how the values are printed, such as by specifying a maximum or minimum width (i.e. number of characters).
Not to appear incomplete, we demonstrate how to left-pad a string to a minimum width. Add another data constructor to our FmtSpecifier
type:
-- data FmtSpecifier = ...
| FmtPaddedFmt Int Char FmtSpecifier
and tell the convert
function how to handle this case:
-- convert = \case ...
FmtPaddedFmt min_len char fmt ->
if min_len > len
then replicate (min_len - len) char <> str
else str
where
= convert fmt
str = length str len
If the formatted thing isn’t as long as the required width (min_len
), then we prepend2 as many of the character char
as we need until it is!
ghci> printf [FmtPaddedFmt 8 '0' (FmtInt 1729)]
00001729
the many looks of floating-point numbers
Somebody who wanted to add all the formatting features of decimal numbers (showing the +
sign, using scientific notation, and so on) might begin with a record type encapsulating all our needs3:
data FmtFloatQualifiers = FmtFloatQualifiers {
show_sign :: Bool
show_decimal_point :: Bool
, scientific_notation :: Bool
, precision :: Int
, }
And then, just as above:
-- data FmtSpecifier = ...
| FmtQualFloat FmtFloatQualifiers Double
-- convert = \case ...
FmtQualFloat quals n -> fmt_float quals n
where
fmt_float :: FmtFloatQualifiers -> Double -> String
= undefined fmt_float
Adding all these features is orthogonal to the purpose of this post, and so the definition of fmt_float
is left as an exercise for the reader.
There you are! In about 30 lines we were able to do the (supposedly) impossible.
Okay, okay, I know that I haven’t outsmarted augustss with this post. There is nothing here that would surprise him in any way. That opener was a bit cheeky of me.
A later comment even clarifies that it can be done this way “if you choose a more informative type than String for the format.”; augustss apparently found this so obvious that he didn’t even need to reply.
Nonetheless, I thought it was a fun demonstration. The full code is available on GitLab.
more from the friends of danso:
Ask Microsoft to Stop Helping Israelis kill Palestinians
March 6Microsoft (and Google, and many other tech firms) are pouring resources into tools that help the Israeli military wage a genocide. UTAW (my union) sent me an email today asking me to join in a letter-writing campaign with ActionNetwork to try to convince…
via Searching For TaoETS Match Specification Pitfall: Maps
February 9During my work, I had a problem where I needed to efficiently write tree-shaped data. I also needed the ability to query ranges of the data. I reached for ETS tables, using the ordered_set table setting, and relied on match specifications to read ranges from the…
via Richard KallosDevelopment of rustc_codegen_gcc #2
January 30A couple months ago, I wrote an article that described the challenges we had to do the previous sync from the Rust repo to rustc_codegen_gcc. (if you don’t know what a sync is, please read that blog article). We also had many...
via Antoyo's Bloggenerated by openring