Crawl your test site for errors

It is important to test for any broken pages or error HTTPHTTP HTTP is an acronym for Hyper Text Transfer Protocol. HTTP is the underlying protocol used by the World Wide Web and this protocol defines how messages are formatted and transmitted, and what actions Web servers and browsers should take in response to various commands. responses returned for each page that can be crawled on the test site. You may do this by using either the wget or wget2 command line utility. Both of these can be easily installed with homebrew – e.g. `brew install wget2` on Mac. On Linux, you may install with sudo apt install wget2

You will first need your test site ready, preferably loaded with some dummy content. You may use the WordPress theme unit test data to fill up the site, with these commands:

wp plugin install wordpress-importer --activate

wget https://raw.githubusercontent.com/WordPress/theme-test-data/master/themeunittestdata.wordpress.xml

wp import --authors=create themeunittestdata.wordpress.xml

Using wget

The command to do this is 

wget -r -o /path-to/crawl-log.txt https://my-test-website.url

Let’s take a closer look at what this command does

  • The wget command makes an offline copy of the site available at the URLURL A specific web address of a website or web page on the Internet, such as a website’s URL www.wordpress.org
  • The -r flag recursively crawls each available page within the URL specified 
  • the  -o command outputs the result of the command to a log file
  • Make sure you replace /path-to/crawl-log.txt with the path and filename of a text file where you want the log to appear
  • Replace the demo URL in the command with that of your test site

Run this command in your terminal app, and open the log file it generates. It will have several entries, for each link within the site, similar to the following blockBlock Block is the abstract term used to describe units of markup that, composed together, form the content or layout of a webpage using the WordPress editor. The idea combines concepts of what in the past may have achieved with shortcodes, custom HTML, and embed discovery into a single consistent API and user experience.:

2024-03-07 11:43:07--  http://localhost:10018/Resolving localhost (localhost)... ::1, 127.0.0.1

Connecting to localhost (localhost)|::1|:10018... connected.

HTTP request sent, awaiting response... 200 OK

Length: unspecified

Saving to: ‘localhost:10018/index.html’

Scroll through the report and make sure that all pages have a `200 OK` status, and no error statuses. 

Lastly, within the same folder where you ran the command, delete the folder where the offline copy of the site is saved. It would typically appear with a similar filename as the URL. 

Top ↑

Using wget2

The wget method mentioned above may get a bit verbose. It also ends up downloading a static copy of the website locally, that requires manual clean up after the test. These can be overcome using wget2 instead of wget. 

Create a test site and install the theme unit test data with these commands:

wp plugin install wordpress-importer --activate

wget2 https://raw.githubusercontent.com/WordPress/theme-test-data/master/themeunittestdata.wordpress.xml

wp import --authors=create themeunittestdata.wordpress.xml

To crawl the test site and check for errors, run the following command

wget2 --spider --recursive http://wp-65.test 2>&1 | grep -i "error"

The command will end up showing any errors, if they exist or will complete successfully.

Last updated: