Mini Project: Text Statistics

Build a text analyzer that counts words, sentences, and top words

💻

Writing code and entering commands is only available on desktop. Open this page on a larger screen to complete this chapter.

What you will build

A text analyzer that reads an article file and reports word count, sentence count, and the three most common words.

Example output

Words: 26
Sentences: 5
Top 3 words: [('data', 5), ('python', 5), ('is', 2)]

Design

Function	Purpose
`count_words(text)`	Return the total word count
`count_sentences(text)`	Return the number of sentences (count of `.`)
`top_words(text, n)`	Return the `n` most frequent words as a sorted list of tuples

Note on top_words

Strip punctuation from each word before counting so "Python." and "Python" count as the same word. Sort results by count descending, then alphabetically to break ties.

Instructions

Build the text statistics analyzer.

Add from collections import Counter at the top.
Define a function named count_words that takes text. Inside, return len(text.split()).
Define a function named count_sentences that takes text. Inside, return text.count(".").
Define a function named top_words that takes text and n. Inside, create words by calling text.lower().split(). Create clean as a list comprehension that strips ".,!?;:" from each word in words. Create counter = Counter(clean). Create sorted_items by calling sorted(counter.items(), key=lambda item: (-item[1], item[0])). Return sorted_items[:n].
Open article.txt in read mode and assign file.read() to text.
Call print(f"Words: {count_words(text)}").
Call print(f"Sentences: {count_sentences(text)}").
Call print(f"Top 3 words: {top_words(text, 3)}").

← Previous Chapter Browse More Courses →

# article.txt contains:
# Data science uses Python. Python is the top language for data.
# Data analysis with Python is rewarding. Python excels at data processing.
# Data engineers prefer Python.

# Step 1: Import Counter from collections

# Step 2: Define count_words(text)

# Step 3: Define count_sentences(text)

# Step 4: Define top_words(text, n) — strip punctuation, count, sort, return top n

# Step 5: Open article.txt and read into text
# Step 6: Print word count
# Step 7: Print sentence count
# Step 8: Print top 3 words