Top 3 Most Frequent Words - FreeCodeCamp #35 Daily Challenge
2 min
The Challenge
The goal is to write a function that takes a string (a paragraph) and returns an array of the top 3 most frequent words, ordered from highest to lowest frequency.
Requirements
- Words are case-insensitive (e.g., “The” and “the” are the same).
- Punctuation like dots (
.), commas (,), and exclamation marks (!) should be ignored. - If there is a tie in frequency, words should be sorted alphabetically.
- If the paragraph has fewer than 3 unique words, return all of them.
Initial Analysis
To solve this problem, we need a pipeline that transforms raw text into a sorted list of frequencies.
Logic Pipeline
flowchart TD A["Input: Paragraph"] --> B["Normalization: Lowercase & Remove Punctuation"] B --> C["Tokenization: Split into words"] C --> D["Frequency Count: Hash Map (word -> count)"] D --> E["Sorting: By frequency (desc) and alphabet (asc)"] E --> F["Output: Top 3 words"]
Implementation
We will use a Hash Map (a plain JavaScript object) to store frequencies and the sort() method with a custom comparator for the final ranking.
/**
* Finds the top 3 most frequent words in a paragraph.
* @param {string} paragraph - The text to analyze.
* @returns {string[]} Array with the top 3 words.
*/
function getWords(paragraph) {
if (!paragraph || typeof paragraph !== 'string') {
return []
}
// 1. Normalize to lowercase and remove specific punctuation (.,!)
const words = paragraph
.toLowerCase()
.replace(/[.,!]/g, '')
.split(/\s+/)
.filter(word => word.length > 0)
// 2. Count frequencies using an object as a Hash Map
const frequencyMap = {}
for (const word of words) {
frequencyMap[word] = (frequencyMap[word] || 0) + 1
}
// 3. Sort by descending frequency
// In case of a tie, use alphabetical order for consistency
return Object.keys(frequencyMap)
.sort((a, b) => {
const freqDiff = frequencyMap[b] - frequencyMap[a]
if (freqDiff !== 0) {
return freqDiff
}
return a.localeCompare(b)
})
.slice(0, 3)
}Complexity Analysis
Temporal:
- String Processing: , where is the length of the paragraph.
toLowerCase,replace, andsplittraverse the string linearly. - Frequency Counting: , where is the total number of words.
- Sorting: , where is the number of unique words.
- Total: . In practice, is usually much smaller than .
Spatial:
- Frequency Map: , where is the number of unique words.
- Word Array: , to store the extracted tokens.
- Total: , proportional to the number of words in the input.
Edge Cases
✅ Handled Correctly
| Case | Result | Note |
|---|---|---|
| Empty paragraph | [] | Safe return |
| Fewer than 3 words | ["word1", "word2"] | slice handles boundaries |
| Multiple spaces | ["a", "b"] | Regex \s+ avoids empty tokens |
| Only punctuation | [] | Length filter cleans the result |
Reflections
Key Concepts
- Data Normalization: Converting to a base format (lowercase, no punctuation) simplifies counting.
- Hash Maps: Ideal structure for frequency counting due to access.
- Sorting Stability: Defining a secondary criterion (alphabetical) ensures deterministic results.
Possible Optimizations
If the paragraph were massive and we only needed the top , we could use a Min-Heap of size to reduce sorting complexity from to .