As a professional science writer, I don’t get to play around with code very much at $DAYJOB
.
But, for a while now, I have been writing nearly all of my articles in Markdown using pandoc
to produce PDF, HTML and <sigh> DOCX files as outputs.
I attach all three formats to my e-mails, allowing the recipients to open the format they need (as explained below).
I thought I should document my workflow publicly in case it is useful for anyone else.
The formats
- The PDF file I build has a wide outer margin for notes and is useful if my editor prefers to print the article and hand me a marked-up version. 1
- The HTML file contains the content I eventually put online onto our website, because a copy-and-paste from a DOCX file inevitably adds a lot of unnecessary code to the HTML. The format is also particularly good if the reader of the draft is on a phone, as the responsive layout is much better for reading than PDF/DOCX.
- The DOCX output is because nearly all my colleagues prefer to use it – rather than the ODF standard – for writing, but also for sending comments and edits with changes tracked. Further, the tool that our in-house translation team uses requires DOCX files as input.
My workflow
I author the article in a Markdown file (here YYYYMMDD_FILENAME.md
), with each sentence residing on a new line.
At the top of this file, you need to add some YAML
“frontmatter”.
Most of the YAML below is for the PDF output; you can get rid of some of it (like urlcolor
, geometry
, documentclass
etc.) if you are not interested in generating PDFs.
For your PDF output, you can optionally insert margin notes using \(\LaTeX\) – just add \marginpar{Lorem ipsum}
in the appropriate place in your Markdown file and it will insert the text “Lorem ipsum” in the outer margin as defined by geometry
:
---
title: "This is the title of my article"
subtitle: "This is the subtitle (or strap) of the article"
author: "Byline: Achintya Rao"
date: "Scheduled for 1 October 2020"
# I don’t like having to manually change the language for DOCX files
lang: en-GB
urlcolor: blue
# A wide margin for the PDF allowing you to insert margin notes using:
# \marginpar{Lorem ipsum}
geometry: "twoside, top=3cm, bottom=3cm, inner=3cm, outer=6.5cm, marginparwidth=3cm, marginparsep=0.5cm"
documentclass: article
# Use the exact name of the font you wish to have the PDF formatted in.
# In my case, I’m using Alegreya (the same font as the main text in this blog):
# https://www.huertatipografica.com/en/fonts/alegreya-ht-pro
mainfont: "Alegreya"
linestretch: 1.2
# You can also include LaTeX packages using `header-includes`.
# The package here disables hyphenation of long words in the PDF output.
header-includes:
- \usepackage[none]{hyphenat}
---
Initially, I used to pop open a terminal and invoke the appropriate pandoc
command every time I saved the Markdown file:
# PDF
$ pandoc -V papersize:a4 --pdf-engine xelatex -o YYYYMMDD_FILENAME.pdf YYYYMMDD_FILENAME.md
# HTML (with the CSS file in ~/Documents/CSS/pandoc.css)
$ pandoc -s --css ~/Documents/CSS/pandoc.css --self-contained -o YYYYMMDD_FILENAME.html YYYYMMDD_FILENAME.md
# DOCX
$ pandoc -o YYYYMMDD_FILENAME.docx YYYYMMDD_FILENAME.md
This quickly got tiresome since I had to remember all the pandoc
spells commands for every new article and change the name of the input and output files.
Enter automation
I decided to look into GNU Make
and let it generate all the files for me.
I simply had to change the name of the input and output files in each article’s makefile.
But I soon got annoyed of having to run :%s/OLDFILENAME/NEWFILENAME/g
every time I copied the makefile for a new article (mostly because I kept forgetting to use the %
and had to look up how to replace a string in all lines of the file in vi
mode).
After reading about variables in Make
, and lots of trial-and-error and poking around on the internet, I managed to get my makefile into shape so I had to only change the input file name once for each new article:
2
# Make sure you change the filename from YYYYMMDD_FILENAME.md to something meaningful.
SOURCE := YYYYMMDD_FILENAME.md
HTML := $(patsubst %.md,%.html, $(SOURCE))
PDF := $(patsubst %.md,%.pdf, $(SOURCE))
DOCX := $(patsubst %.md,%.docx, $(SOURCE))
STYLE := ~/Documents/CSS/pandoc.css
# Source: https://gist.github.com/killercup/5917178
# Make sure you save this in the same directory as shown or change the path.
.PHONY : all
all : $(HTML) $(PDF) $(DOCX)
%.html : %.md $(STYLE)
@echo --- Generating HTML ---
@pandoc -s --css $(STYLE) --self-contained -o $@ $<
# You will need to have the appropriate pdf-engine installed.
%.pdf : %.md
@echo --- Generating PDF ---
@pandoc -V papersize:a4 --pdf-engine xelatex -o $@ $<
%.docx : %.md
@echo --- Generating DOCX ---
@pandoc -o $@ $<
.PHONY : clean
clean :
@echo --- Deleting generated files ---
@-rm $(HTML) $(PDF) $(DOCX)
Now, for each article, with the Markdown file and the makefile in the same directory, running the make
command generates all three outputs in one go:
$ tree
├── YYYYMMDD_FILENAME.md
└── makefile
$ make
--- Generating HTML ---
--- Generating PDF ---
--- Generating DOCX ---
$ tree
├── YYYYMMDD_FILENAME.docx
├── YYYYMMDD_FILENAME.html
├── YYYYMMDD_FILENAME.md
├── YYYYMMDD_FILENAME.pdf
└── makefile
If I make any changes to the source Markdown file, running make
will overwrite previous versions of the files with new ones.
However, I still need to remember to run make
each time and I wanted to skip this.
Rebuilding on save
Happily, I came across the entr
program recently, which I use to automatically re-run the make
command every time I save the source file:
3
$ ls *.md | entr make
Here is my workflow in action: 4
I hope you like it! Please let me know on Scholar.Social or on Twitter if you have any suggestions to improve the code here. You can also leave comments on this page itself (thanks to hypothes.is) by selecting any text and annotating it.
This has yet to happen, heh. I mainly do it because I prefer the PDF output myself.
Please don’t ask me to explain what the various lines are; as far as I’m concerned, it’s all magic!
If you use the Emacs shell (
eshell
) for example, the*.md
bit will actually expand to list the name of the Markdown file(s) in the directory. This may or may not be desirable for you.I’m using
Emacs
(with its built-in terminal) to author the text, with theolivetti
package helping to make the writing environment pretty and clean.