Chapter 12. Command Line Tools

Table of Contents

extract
Syntax
Description

extract

Syntax

extract -input [url|urlFile] [OPTIONS]

Description

Extracts features from urls to data files.

Options:

-output [outputFile]

specify a file to write the features to. defaults to writing to console.

-features [featureFile|featureList]

filter the list of features gathered from the input URLs. defaults to using all features.

-format [comma|weka|line]

change the feature output format. defaults to line features to console, comma to file.

-crawl [depth d] [filter f]

uses the input urls to run a breadth-first crawl to collect statistics. depth is how many levels deep the crawl is, defaults to 1. filter is a regex expression to constrain the urls beyond the default crawler settings.