Learn / Blog / Article

Back to blog

How to analyze open-ended questions in 5 steps [template included]

Open-ended questions are great for getting authentic feedback because they give people a chance to describe what they’re experiencing in their own voice.

User research

Last updated

9 Sep 2021

Open-ended questions are great for getting authentic feedback because they give people a chance to describe what they’re experiencing in their own voice. Analyzing such survey questions yourself is an excellent opportunity to empathize with your audience, gather essential insights, and make the right decisions.

But you may be wondering...

How do you efficiently analyze more than 100 replies? Or even 1,000?

Here’s a system we use at Hotjar to categorize and visually represent large volumes of qualitative data—and it’s easier than you might think! You’ll have to work with the technique a bit before you become comfortable with it, but once you get it, you’ll be sorting through mountains of qualitative data in no time.

What you’ll need:

The open-ended question analysis template by hotjar

To help you learn this technique, we created a data sample that you can download and use to follow along.

Now let’s begin…

Table of contents

Step 1: get your data into the template

1) Export the data from your survey or poll into a .CSV or .XLS file.

Example of survey export from Hotjar as .CSV

2) Copy the data from your .CSV or .XLS file and paste it into the sheet ‘CSV Export’ of the template.

Copy the data you want to analyze from your .CSV or .XLS file

🏆  Pro tip: use 'Paste special' to paste 'Values Only' in the Hotjar analysis template, so no formulas or formatting are copied over.

This is what your data should you like after being copy-pasted in the ‘CSV export’ sheet

3) Copy the column from the ‘CSV Export’ sheet containing the open-ended question you want to analyze first and paste it into the ‘Question 1’ sheet, in the cell marked with < Paste answers to first open-ended question here >.

Your open-ended answers should you like the above after being copy-pasted in the ‘question 1’ sheet

4) Choose wrap text for the entire column, so the data fits the column width and is easier for you to read later on.

The 'wrap' function is available from the main menu in google sheets

Step 2: identify response categories

A response category is a set of replies that can be grouped because they are part of the same theme, even if they’re worded differently.

In the sample dataset we use for this tutorial, we asked Hotjar customers to explain how their employer measures their performance (e.g., revenue, conversions, traffic). In theory, you could go through every answer to identify your response categories one-by-one, but that wouldn’t be very efficient. Instead, we’re going to use a series of techniques that help you identify the broad categories.

A) Use a text analyzer: text analyzers take your data and analyze it for the most commonly used words in your text, which helps you identify broad categories of responses.

🏆  Pro tipTextalyser is a simple, free resource that does this well.

<#Copy and paste your data into textalyser and click ‘analyze the text’
Copy and paste your data into textalyser and click ‘analyze the text’

If you do this with the sample data we’ve provided above, you’ll find that  ‘sales,’ ‘conversion,’ and ‘traffic’ are some of the most commonly used words in the data set:

<#'sales,’ ‘conversion,’ and ‘traffic’ are some of the most commonly used words in the data set and could be used as response categories
'sales,’ ‘conversion,’ and ‘traffic’ are some of the most commonly used words in the data set and could be used as response categories

As such, they represent some of the most popular replies to the question we asked. They don’t represent all the answers, of course, but they’re a good place to start when building the list of response categories.

Add each category to the top of separate a separate column (replacing the text that reads, 'Response Category 01,' 'Response Category 02,' etc.):

Add each response category to the top of the sheet in row 2

Note: some of the popular words in our text analyzer mean the same thing (e.g., 'sales' and 'revenue'), so you’ll want to create a single category for those responses called 'Sales/Revenue.' Other popular words will NOT become categories because, as stand-alone words, they tell us nothing useful (e.g., 'our,' 'rate').

B) Sort your responses alphabetically: when you sort alphabetically, you’ll notice that specific patterns emerge, and you can create more categories based on the trends you spot.

In Google sheets, select the range, right-click, and sort the range alphabetically

In our sample data, every sentence beginning with the word 'Revenue' gets grouped when you sort alphabetically. Of course, we already have a category for 'Sales/Revenue,' so there’s no need to add that category in this case—but grouping the data alphabetically will allow groups to stand out.

Sorting responses alphabetically helps you to uncover themes easily

Alphabetical sorting will also draw your attention to certain stand-alone response. For example, someone replied 'Huh?' and another person told us they didn’t understand the question. This information allows us to add a new category called 'Didn’t understand the question.'

Alphabetical sorting will also draw your attention to certain stand-alone response

Scan the alphabetically sorted responses for other categories, such as 'It’s not measured,' 'Traffic,' 'Conversions,' etc. Be on the lookout for synonyms, but don’t worry if you create a few redundant categories for now. You will combine the categories that mean the same thing at the end.

Step 3: record the individual responses

1) Place a '1' in each cell where a response (the row) matches a category (the column) to identify a positive response in each category. Add categories as you go.

For example, if you sorted our sample data alphabetically, you’ll find that the response in Row 6 reads, 'Huh?' If you added 'Did not understand the question' to Column E (as we did in the screenshot), then you’ll place a '1' in E36.

A '1' is placed in column e, row 36

Note: In our example, many respondents indicate that their performance was measured by multiple factors (e.g., lead gen + sales + customer satisfaction). Be sure to place a '1' in each category. In other words, the row for that single answer, 'Revenue, then conversion rate, then traffic.' will record three different positive responses.

Some answers might fall into multiple response categories

When you input your first '1,' the cell in Row 3 (below the category) will change to indicate the number of positive responses in that category. Row 4 will change from a '#DIV/0' error to the percentage of responses that fall into each category.

2) Use the 'Find' feature to search for words related to each category: begin with the first category (in our example, that’s 'sales') and search the data column for any response that mentions 'sales.' Read the entire response to ensure it fits the category you searched for, then place a '1' in the appropriate column for that response.

Searching for the term 'sales' leads to finding 11 responses

3) Fill in the gaps: read each row that hasn’t been categorized and place a '1' under the appropriate category, creating new categories as necessary. As you create new categories, search your data for those terms to quickly find similar responses.

⚠️ Important:  when adding a new category as you go through the responses, make sure to retroactively check previous answers that might fit in this new category.

As you create new categories and fill in the gaps, some interesting trends will start to appear

Step 4: organize your categories

1) Group your data: you will almost certainly find categories that should be grouped but ended up in different categories because respondents used different words to describe the same concept. In our sample data, we found the terms 'Lead Gen' and 'Form Submissions,' and these belong in the same category.

Drag these columns next to each other, and apply a color (any color) to the group of columns you plan to merge—this marks them as a group so you can return to them in a bit when it’s time to combine them. Repeat this step for each set of categories you plan to join.

Column k is dragged next to column h because both response categories are related

Add a new column to the left-hand side of each group. For example, with 'Lead Gen' and 'Form Submissions,' you’ll create a new category called 'Lead Gen / Form Submissions,' add up the Row 3 totals for the two old categories, and enter the new total under the new group. Copy and paste the percentage formula from any Row 4 cell, then delete the old categories.

A new category called 'lead gen / form submissions' is added

⚠️ Important: when merging multiple categories, make sure to re-add the '1s' under the newly merged category, or you run the risk of losing your data.

Repeat this step for every group you plan to merge.

2) Arrange your categories from large to small: arrange your categories in descending order from left to right. For those that only contribute to a small percentage of the total (2% or less), use the grouping method above to merge them into one category called 'Others,' which you’ll leave on the far right.

A new category called 'others' is added to merge miscellaneous categories together

Step 5: represent your data visually

1) Prep your data to create a bar chart. First, select and copy the top three rows of your spreadsheet (those that make up the 'Response Categories,' 'Total respondents who answered X,' and '% respondents who answered X').

#Select and copy the top three rows of your spreadsheet
Select and copy the top three rows of your spreadsheet

Paste them into the ‘Graph Question 1’ sheet using the 'Paste special' feature to paste only the values (so the formulas don’t copy over).

#Paste as values your selection in ‘graph question 1,’ cell a3
Paste as values your selection in ‘graph question 1,’ cell a3

Select and copy the table you just pasted, and choose 'Paste special' again—this time using 'Paste transposed' to invert the rows and columns (this makes your data more chart-friendly).

#Select and copy the table you just pasted, and choose 'paste special' again—this time using 'paste transposed' in cell a9
Select and copy the table you just pasted, and choose 'paste special' again—this time using 'paste transposed' in cell a9

This is what you should see:

#Your table containing categories, the volume of responses, and percentage should you like the above
Your table containing categories, the volume of responses, and percentage should you like the above

2) Create your chart: insert your chart, selecting the percentage column as your 'Series' and the categories as your 'X-axis.' Resize the chart however you see fit.

#Your open-ended answers are now visualized in a graph
Your open-ended answers are now visualized in a graph

And there you have it—a visual representation of your data! Feel free to experiment with different formats if you’re putting the chart into a formal presentation.

Analyzing open-ended questions efficiently and empathizing with your audience take some practice, but the more you do it, the easier it becomes. Your mind will begin to recognize patterns the more you practice this technique, so don’t be afraid to dive into it.

Hotjar's open-ended question analysis template

Want to efficiently analyze a large volume of qualitative data? Get our Google Sheets/Excel template to get started.