Mini Project: Text Statistics
Exit

Mini Project: Text Statistics

Build a text analyzer that counts words, sentences, and top words

💻

Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.

What you will build

A text analyzer that reads an article file and reports word count, sentence count, and the three most common words.

Example output

Words: 26
Sentences: 5
Top 3 words: [('data', 5), ('python', 5), ('is', 2)]

Design

FunctionPurpose
count_words(text)Return the total word count
count_sentences(text)Return the number of sentences (count of .)
top_words(text, n)Return the n most frequent words as a sorted list of tuples

Note on top_words

Strip punctuation from each word before counting so "Python." and "Python" count as the same word. Sort results by count descending, then alphabetically to break ties.

Instructions

Build the text statistics analyzer.

  1. Add from collections import Counter at the top.
  2. Define a function named count_words that takes text. Inside, return len(text.split()).
  3. Define a function named count_sentences that takes text. Inside, return text.count(".").
  4. Define a function named top_words that takes text and n. Inside, create words by calling text.lower().split(). Create clean as a list comprehension that strips ".,!?;:" from each word in words. Create counter = Counter(clean). Create sorted_items by calling sorted(counter.items(), key=lambda item: (-item[1], item[0])). Return sorted_items[:n].
  5. Open article.txt in read mode and assign file.read() to text.
  6. Call print(f"Words: {count_words(text)}").
  7. Call print(f"Sentences: {count_sentences(text)}").
  8. Call print(f"Top 3 words: {top_words(text, 3)}").