Code Self Study Forum

A shell script to spellcheck multiple webpages

Here’s a quick script to check the spelling on multiple web pages at once. To run it on Linux or Mac (and possibly also on Bash for Windows):

  1. The script requires that lynx and aspell are installed on the computer.
  2. Save the code in a file called spellcheck_urls (no file extension necessary)
  3. Make the file executable: chmod u+x spellcheck_urls
  4. Create a file that contains a list of URLs with one URL per line. Save it in the same directory as urls.txt
  5. Read the script carefully to be sure that you understand exactly what it does. :slight_smile:
  6. If it looks like it does what you want, type ./spellcheck_urls to run it.
#!/bin/bash

URLS_FILE='urls.txt'
OUTPUT_DIR='output'
REPORT_FILE="$OUTPUT_DIR/report.txt"

# Check if the output directory exists
if [ ! -d "$OUTPUT_DIR" ]; then
    echo 'creating the output directory'
    mkdir "$OUTPUT_DIR"
else
    echo 'output directory exists - skipping...'
fi

# Read in the URLs from the text file
while IFS='' read -r l || [ -n "$l" ]; do
    echo "processing $l"

    # Create the header
    echo "URL: $l" >> $REPORT_FILE
    echo "========================================" >> $REPORT_FILE

    # Download the text content from the current URL and spellcheck it
    echo "$(lynx --dump $l | aspell --list | sort | uniq -c)" >> $REPORT_FILE
    echo "" >> $REPORT_FILE
    echo "" >> $REPORT_FILE

    # Throttle the requests here, if you want
    sleep 3
done < "$URLS_FILE"

After the script finished, check the file output/report.txt for the results.

If anyone has suggestions for improvement, please leave a comment below. :slight_smile:

2 Likes

If you are on MAC and don’t have lynx or aspell you can install them in a pretty straightforward manner with brew.

brew install lynx
brew install aspell

1 Like