I’ve been struggling to generate PDF files from simple HTML code that used non-ascii characters. The tools that I was using were wkhtmltopdf (version 0.12.2 which doesn’t need an X server) and python-pdfkit. Unfortunately, I spent quite some time working on the wrong solutions. Namely, these are the things that I took a look at:
- Python. An usual suspect, but this time it was not a case of “Unicode madness”.
- An open issue regarding a problem with certain characters in python-pdfkit.
- Using different fonts. Apparently there’s another project also called pdfkit (for node) that has nothing to do with wkhtmltopdf and pdfkit for Python/Ruby.
After testing different websites (some working and some failing) I realised that I just needed to include a meta tag specifying that UTF-8 was the correct charset. It’s not surprising that some websites (and my own HTML code) did not include this attribute because this is what w3schools says about it:
If an HTML5 web page uses a different character set than UTF-8, it should be specified in the tag.
Sometimes easy solutions are difficult to find! Below you can find a working example.