Run a Word Frequency query
You can use Word Frequency queries to list the most frequently occurring words in your sources.
In this topic
- Understand Word Frequency queries
- Create a Word Frequency query
- Understand the results
- What words are counted in a Word Frequency query?
- Exclude particular words when running Word Frequency queries
- Run a Text Search query for a word shown in the query results
Understand Word Frequency queries
Use Word Frequency queries to list the most frequently occurring words in your sources. You can select the source content you want to search, by selecting sources, nodes, sets or folders.
You could use a Word Frequency query to
-
Identify possible themes, particularly in the early stages of a project.
-
Analyze the most frequently used words in a particular demographic. For example, analyze the most common words used by farmers when discussing climate change. You could do a coding query to gather all content coded at climate change and at 'case' nodes with the attribute farmer—then select the result node as the criteria for the Word Frequency query.
-
You can look for exact words or include words with the same stem. For example, if you look for the most frequent words in a set of interviews, you might find that water, health, and harmful are the most frequently occurring words. However, if you include words with the same stem, you might find that pollution (including pollutants, pollution, polluted, and pollutes) occurs most frequently.
Before you run a Word Frequency query, make sure the text content language is set to the language of your source materials—refer to Set the text content language and stop words for more information.
Create a Word Frequency query
-
On the Query tab, in the Create group, click Word Frequency.
-
Choose where you want to search for matching text:
-
All sources—search for content in all the sources in your project, including externals and memos
-
Selected Items—restrict your search to selected items (for example, a set containing interview transcripts)
-
Items in Selected Folders—restrict your search to content in selected folders (for example, a folder of interview transcripts)
-
(Optional) Select Include stemmed words if you want to include words with the same stem (e.g. look for 'talk' and also find 'talking') when Finding matches. By default, Exact match only is selected.
-
(Optional) You can choose to display:
-
All to include all words found in the selected project items.
-
<number> most frequent to include a specific number of words—for example, you could display the 100 most frequently occurring words.
-
(Optional) Enter a With minimum length to exclude short words from the results—for example, enter 7 to display only words with seven or more letters.
-
Click the Run Query button at the top of Detail View.
When the query has finished running, the results are displayed in Detail View.
NOTE
-
Refer to Selecting project items for information about how to select the sources, nodes or other project items that you want to search in.
-
In this release, you cannot find matches for words with similar meanings (synonyms, specializations and generalizations). If you are working with a project that was created on NVivo 10 for Windows, you cannot run queries which find matches for synonyms, specializations and generalizations.
Understand the results
When you run a Word Frequency query the results are displayed in Detail View. You can view the results as a list on the Summary pane or as a visualization on the Word Cloud pane.
Summary pane
1 The query criteria remain visible at the top of Detail View—if you want more space to view the results of the query, click the disclosure triangle to hide the criteria.
2 The most frequently occurring words excluding any stop words. If you chose to include stemmed words, the most frequently occurring word from the group is displayed in this column.
3 Length—the number of letters or characters in the word.
4 Count—the number of times that the word occurs within the project items searched. If you chose to include stemmed words, this count is the total for all the words with the same stem.
5 Weighted Percentage—the frequency of the word relative to the total words counted. The weighted percentage assigns a portion of the word's frequency to each group so that the overall total does not exceed 100%.
6 Similar Words—other words that have been included as a result of selecting to include stemmed words—for example, pollutants, pollution, and polluted would be grouped together. This column is not available if you use Exact match only'
Word Cloud pane
1 The query criteria remain visible at the top of Detail View—if you want more space to view the results of the query, click the disclosure triangle to hide the criteria.
2 The word cloud visualization displays up to 100 words in varying font sizes, where frequently occurring words are in larger fonts.
3 Click here to choose from a gallery of styles.
NOTE You can export the word cloud as an image file which you can include in reports and presentations—refer to Export query results (Export a query visualization as an image file) for more information.
What words are counted in a Word Frequency query?
When determining the frequency of words, NVivo applies the following rules:
-
Words containing punctuation (such as hyphens, periods and other symbols) are divided into separate words. For example, part-time will be counted as part and time.
-
Words containing apostrophes (such as o'clock and d'accord) are treated as one word but if the apostrophe is followed by an 's then the s is not included (Tom's would be counted as Tom).
-
In audio and video transcripts, only words in the Transcript field (column) are counted.
-
In datasets, only words in codable fields (columns) are counted—any words in classifying fields are ignored.
-
When searching text in selected nodes, if a word is coded against multiple nodes, it is counted once for each node. Similarly, if a word has been coded by multiple users to the same node, it is counted once for each user.
-
Word Frequency queries do not include 'stop words'—refer to Exclude particular words when running Word Frequency queries for more information.
-
Word Frequency queries do not search text within images. PDFs created by scanning paper documents may contain only images—each page is a single image. If you want to use Word Frequency queries to explore the text in these PDFs, then you should consider using optical character recognition (OCR) to convert the scanned images to text (before you import the PDF files into NVivo).
Exclude particular words when running Word Frequency queries
Word Frequency queries do not include 'stop words'—by default, these are less significant words like conjunctions or prepositions, that may not be meaningful to your analysis. You can view and edit the list of stop words, refer to Set the text content language and stop words for more information.
You can add a word displayed in your query results to the stop words list—select the word you want to exclude from the query results, then click Add to Stop Words List, in the Actions group on the Query tab. The words you add to the stop word list will be excluded the next time you run a Word Frequency or Text Search query.
Run a Text Search query for a word shown in the query results
You can run a Text Search query for a selected word in the Word Frequency query results.
-
On the Query tab, in the Actions group, click Other Actions, and then click Run Text Search Query.
-
(Optional) Change the Text Search Criteria. Refer to Run a Text Search query for more information.
-
Click Run Query.
NOTE You can also double-click a word in the Word Cloud to run a Text Search query.