Testing the Gutenberg User Test

On 13th April 2017, I ran a user testing session at the Brisbane WordPress meetupMeetup All local/regional gatherings that are officially a part of the WordPress world but are not WordCamps are organized through https://www.meetup.com/. A meetup is typically a chance for local WordPress users to get together and share new ideas and seek help from one another. Searching for ‘WordPress’ on meetup.com will help you find options in your area. to test out the Gutenberg editor. In the process, I also took the opportunity to reflect on the user test itself. Here are my notes on testing the test.

About the User Test

The user test, created during WordCampWordCamp WordCamps are casual, locally-organized conferences covering everything related to WordPress. They're one of the places where the WordPress community comes together to teach one another what they’ve learned throughout the year and share the joy. Learn more. London contributor dayContributor Day Contributor Days are standalone days, frequently held before or after WordCamps but they can also happen at any time. They are events where people get together to work on various areas of https://make.wordpress.org/ There are many teams that people can participate in, each with a different focus. https://2017.us.wordcamp.org/contributor-day/ https://make.wordpress.org/support/handbook/getting-started/getting-started-at-a-contributor-day/. (a big shout out to @martinlugton, @j-falk, @lucijanblagonic, and @karmatosed for the hard work) consisted of 18 tasks. In my session, the set of tasks took around 21 minutes to complete on average.

The test was set up as an observational user test. The script gave the participants a task on each page, and asked them to rate the experience on a scale of 1 (bad) to 5 (good) scale. There was also an area for comments to be typed in. A sample question is shown below.

Image of the Gutenberg Test Script

In my session, I ended up with 3.5 hours of video footage, which took about 20 hours to process.


Wording of tasks

The tasks were very clear and unambiguous, perhaps with the exception of Task 2: You want to change the heading to a subheading. Some participants got hung-up on trying to make a heading, and inserting a sub-heading below it, but overall, this was a really minor issue.

Another minor suggestion is to remove the leading phrase “You want to…” from each task description. This is purely to reduce the number of words in the instructions, so that it takes a little less time for participants to process what to do.

Ordering of tasks

After running through a few participants, it became clear that the first few tasks were the hardest in the test. I’d suggest re-ordering the tasks to give a few easy wins for the participants, before throwing them into the more challenging waters.

Suggested task order: Task 1, Tasks 5-9, remaining tasks in order.

Processing video footage

The test was quite long, however, I don’t think this was an issue in the administration of the user tests… only in the video post-processing phase.

To make the video processing a little easier, it would help to make each question a different colour. This would make it a easier to find the right video segment when looking at each participant’s reel:

Image of participant reel

Survey questions

While it is always really tempting to include survey style questions to give us immediate feedback and nice, easy to graph results, I am not a big fan of including this type of question in observational user tests.

The reason for this is that there is often a big disconnect between how a user performs a task, and the rating that they assign to the task.

Looking at this particular testing session, here are some interesting discrepancies:

  • Task 2. One participant failed to complete the task, and gave it a rating of 2. Another completed the task successfully, but rated it as 2 as they got a little confused at the start.
  • Task 3. One participant rates a 5, but was evidently confused about the arrow interactions and movement of blocks. Another participant rates a 3, while providing all kinds of excuses why they failed to complete the task.
  • Task 4. Participant rates a 4 while articulating why they did not like the way the feature worked.
  • Task 11. Participant rates a 2 even though they completed the task successfully.
  • Task 14. Participant struggles, does not complete the task and rates it a 4.
  • Task 15. Participant rates a 5 even though they failed to complete the task.
  • Task 16. Two separate participants rate a 4 even though they fail to complete the task.
  • Task 15. One participant rates a 5 even they did not perform the task correctly. Another participant rates a 2, even though they did complete the task but found it fiddly.

Participants really hate getting things wrong, or disappointing the tester, so will tend to rate things higher. Although it is more time-consuming, I personally find that you get a lot more insight from observing, and really listening

“I would like to see the [heading] options go to H6 to be totally honest… in my experience, I generally probably only use H1 to H2… H3… very rarely will you go down to H6.”

It’s not unusual for participants to claim that they must have a feature, or can’t live without something, only to tell you a moment later that actually, they never use it!


Congratulations to the WordPress folks who put this test together. While I have some nerdy, fine tuning suggestions and changes, it is a really well thought out test and made for a great and enjoyable testing session.

Happy Testing, everyone! If you are new to user testing, or just getting started, check out my tips here.