|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
DerbyCrawlURLDatabase
instance.IHttpDocumentFetcher
.IHttpClientInitializer
.IHttpDocumentChecksummer
which
returns a MD5 checksum value of the extracted document content unless
a given field is specified.IHttpHeadersChecksummer
which
simply returns the exact value of the "Last-Modified" HTTP header if no
alternate header is specified.IRobotsTxtProvider
.IURLExtractor
.null
means the headers could not be fetched
and the associated document will be skipped (treated as rejected).
IURLNormalizer
that should satisfy
most URL normalization needs.IHttpCrawlerEventListener
.IHttpHeadersFetcher
.
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |