The following sections describe the Verity Spider V 5.0 command-line options. Option names are case-sensitive.
Specifies a starting point for an indexing job. You can specify multiple instances, or use multiple values in a single instance.
When you execute an indexing job from a command line, and you do not use a command file (with the -cmdfile
option), you must URL-escape any special characters in the starting point. To URL-escape a special character, use "%hex-ASCII-character-number" in place of the character. For example, use /time%26/ instead of /time&/. This allows the operating system to properly process the command string.
If an indexing task halts, you can rerun the task as-is. The persistent store for the specified collection is read, and only those candidate URLs that are in the queue but not yet processed are parsed. Candidate URLs correspond to URLs of the following status, as reported by vsdb:
cand, used, inse, upda, dele, fail
Repository type | Starting point |
---|---|
Web |
The URL or URLs from which Verity Spider is to begin indexing. Use other options, such as the |
File |
The starting directory or directories in which Verity Spider will start indexing. All subdirectories beneath the starting point will be indexed, unless you use the |
Note: By using the -start
option with the -refresh
option, you provide a starting point for Verity Spider and therefore do not need to use at least one of the following options: -host
, -domain
, -nofollow
, or -unlimited
.
Used for updating a collection, specifies that Verity Spider process only those documents that qualify, as follows:
-nooptimize
option with the -refresh
option. In this case, any document deleted from the repository is marked for deletion in the collection. It will be removed from the collection and the persistent store when the next indexing task is run for the collection.
When you rerun an existing indexing job, Verity Spider automatically refreshes the collection. If you add or remove any of the starting points, however, you must manually specify the -refresh
option to refresh existing documents.
Note: You can also use the -start
option to provide a starting point for Verity Spider. If you do not use the -start
option, use at least one of the following options: -host
, -domain
, or -nofollow
. For further control, also see the -refreshtime
option. If you do not use any constraint criteria, Verity Spider operates without limits and will likely index far more than you intended.