Jekyll: YAML lists vs Dictionaries

YAML is a simple human readable data format that is widely used across projects. It has implementation in various programming languages. I have been using YAML data with Jekyll where it’s a standard way of representing data. Jekyll uses data files which are used to display content in web pages using the liquid templating engine.

An interesting thing that I realised while reviewing a Pull Request in PyConf Hyderabad website repo. I got to know the precise difference between YAML lists and dictionaries and how it is easy to mix up between both of them in Jekyll.

YAML lists are lines beginning at the same indentation level preceded with a - (dash) with a space to distinguish each element. Lists are indexed collection of objects commonly known as the list data structure.

Jekyll Dictionaries or hashes or mappings are lines with a key value pair.

- Apple
- Banana
- Guava
Apple: red
Banana: yellow
Guava: green

In Jekyll we can use a for loop to iterate over elements of a list or dictionary. So if we save this list in a file _data/fruits.yml ( which is a standard way how data files are arranged in Jekyll ) we can use a for loop to print each item in html as show below.

{% for fruit in site.data.fruits %}
  <p>{{ fruit }}</p>
{% endfor %}

Here is the output of each of the examples

Apple
Banana
Guava
Applered
Bananayellow
Guavagreen

You can observe how Jekyll Concatenates the key value pairs for dictionaries. If instead we want to display just the key or value we can use the array notation where key = fruit[0] and value = fruit[1] Jekyll stores each key, value pair in an array of two elements. There is also another way using the dot notation which is more commonly used to represent data where the keys are decided beforehand and we use the keys to access the values.

- name: Apple
  color: red
- name: Banana
  color: yellow
- name: Guava
  color: green 
- {name: Apple, color: red}
- {name: Banana, color: yellow}
- {name: Guava, color: green}

Both of the formats above represent a list of dictionaries, the second one is is called the short hand dictionary notation which is very common to the python dictionary notation. The first one is the long format which is more commonly used to represent complex data and is more readable. We use the code below to iterate over the elements.

{% for fruit in site.data.fruits %}
  <p>{{ fruit.name }} is {{ fruit.color }} in red in color</p>
{% endfor %}

Ambiguity of Lists and Dictionaries

The confusion arises when we tend to use data with unique keys like the following example

007:
  name: James Bond
  work: Secret Agent
221B:
  name: Shelock Holmes
  work: Detective
- 007:
    name: James Bond
    work: Secret Agent
- 221B:
    name: Sherlock Holmes
    work: Detective

Here we represent some persons with some unique codes. We save it in file _data/persons.yml and access them with the following code

{% for person in site.data.persons %}
    <p><span {% if person[0] == "OO7" %} style="color: red" {% endif%}>{{ person[1].name }}</span> is a {{ person[1].work }}</p>
{% endfor %}

For the first case the code works fine and produces the following result

James Bond is a Secret Agent
Sherlock Holmes is a Detective

But for the second case we get an undesirable output. What’s the reason ? If you carefully observe there are dashes in the beginning of the line which implies it’s a list of dictionaries. So iterating over each element actually gives you a hash of the person object. The correct code to get the desired result would be

{% for person_hash in site.data.persons %}
  {% for person in person_hash %}
      <p><span {% if person[0] == "OO7" %} style="color: red" {% endif%}>{{ person[1].name }}</span> is a {{ person[1].work }}</p>
  {% endfor %}
{% endfor %}

This can be more clearly understood by revisiting the first example.

- Apple
- Banana
- Guava
Apple
Banana
Guava

If the below code is used to run both of the data then the result produced will be as follows

{% for fruit in site.data.fruits %}
  {{ fruit }}
{% endfor %}
Apple
Banana
Guava
Apple Banana Guava

The reason being in first case Jekyll interprets it as a list because of the dash in each line while in the second case Jekyll treats it as a dictionary as there are no dashes in each line. Since there are no key value pairs so it concatenates all the lines as a single key which is shown as the output.

The summary of the discussion is – whenever a line starts with a dash in the YAML data it represents a list of objects, if there’s no dash then it’s a hash with the specified key value pairs. There can be a list of dictionaries or a dictionary with a list of values. So you need to be careful while dealing with these type of data.