Site Verify Overview

The Problem

Even small to medium size websites may references hundreds of pages, scripts, images, music files and other web ("mime") documents. Finding broken or out-of-date links is difficult and time-consuming. SiteVerify reads all the pages below a given URL, parses them and locates the components referenced by them. Any errors are listed clearly.

SiteVerify will not, of course, exercise data-driven web sites. It is intended for verification of the static components of general web sites.

Running Site Verify

After starting the program, enter the URL you want to check in the "URL to verify" field. Then choose "Run" from the "Actions" menu.

SiteVerify is multi-threaded, which means that it will use several processes to analyze your web site as quickly as possible.

By default, SiteVerify only parses and analyzes HTML pages that are at or beneath the URL given. External web pages are just checked for existence only. This can be overridden; see the Options described below.

Main Window Parts

Link List

When SiteVerify has completed its analysis, the left-hand list will be filled with every link mentioned in any page.

These results are organized in one of two ways:

Each referencing page (source) is listed and all its references (targets) are grouped with it.
Each referent (target) is listed and all referring pages (sources) are grouped with it.

The organization can be changed using the "Display Links Forward" option (see below).

Each entry has a icon indicating the media ("mime") type or an error icon if not found or the item fails to parse.

The status bar indicates the direction of the link display.

Status Bar

This text bar at the bottom of the window indicates the link display direction and the current status of the scan.

Right-Hand Tab

The right-hand tabbed section contains the configuration information and the details of the currently selected link in the link list.

Link Details

When you select a link in the link list, its details are shown in this panel. If you've selected to preserve the parsed text, the text of the HTML page is displayed, also.

Configuration

The Configuration tab contains the URL you want to search along with other configuration controls.

URL to Verify

This is where you enter the URL of the web site you want to scan.

Treat as site, not page

If you've entered the name of a specific page, clear this check box.

Exclusions...

Clicking this button brings up a dialog that allows you to exclude certain items from analysis.

Pages and items fetched

The items in the list below are updated dynamically as SiteVerify opens and parses the HTML pages you've selected.

Menus

File

Save as CSV...

This menu item saves the results of the last scan as a "comma-separated variable" (CSV) file that can be imported into Excel.

Exit

This menu item terminates the program.

Actions

Reset

This menu item discards any prior scan results.

Run

This menu item starts the analysis process for the currently specified URL.

Stop

This menu item stops the analysis process as quickly as possible. Unfinished results are termed "incomplete".

Options

Display Links Forward

When checked, the links are displayed by originating page, with each immediate referenced link as a sub-item.

When unchecked, the links are displayed by terminating target, with every refering page treated as a sub-item.

Display Excluded Entries

Items (pages) that are excluded are usually not displayed, either as source or target links. This option shows excluded links.

Clear Incomplete Results

This option removes all incomplete results. Incomplete results are those which were queued for processing when the user stopped the scan.

Show Incomplete Items

This option shows incomplete items.

Preserve Parsed Text

This option causes all the parsed HTML to be preserved as large blocks of text. These text blocks are displayed in the detailed link information when a page link is selected.

Fold (Ignore) Address Case

This option is usually checked, which causes links to be compare as "case-insensitive". That is, differences between upper and lower case letters are ignored.

You may uncheck this option if your website uses a file system that is case-sensitive. Some OSX and most Linux file systems are case-sensitive.

Use 'relaxed' Parsing

SiteVerify tries to enforce nominal correctness in HTML pages. Unclosed tags, for example, are considered as parsing errors. If you want to ignore such errors in parse pages you should check this option.

Limit Parsing to URL and Below

This option prevents causes SiteVerify from parsing pages which do not appear to belong as sub-elements to the current URL.

Clear History

This option clears the history of URLs you have analyzed.

Threads

This option allows you to set the number of worker threads being used to fetch and parse web pages. The default setting of 5 is sufficient for most dual-processor machines with normal bandwitdh connections. If you have a very fast machine or connection, you may want to increase this number.

Target Limit

This option controls the number of links traversed by SiteVerify during analysis. If this number is exceeded the analysis is terminated.

The goal of this option is to prevent SiteVerify from unintentionally running for a long time.

Help

User's Guide

This option shows the help pages.

About Site Verify...

This option displays a dialog box giving details about your version of SiteVerify.

Exclusions

The term 'exclusion' refers to a string value that is matched against every target link processed. If the string matches, the link is ignored.

Normal Exclusions

A normal (non-RegEx) exclusion is a string that is matched, either considering or ignoring case, to any target that is about to be fetched.

Regular Expression Exclusions (RegEx)

A regular expression exclusion is a pattern-matching string that is matched to any link before the link is processed. If the match is successful the link is excluded.

For more information about Microsoft .Net Regular Expressions, see the .Net documentation at Microsoft.com and other sources.