|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface ICrawlURLDatabase
Database implementation holding necessary information about all URL crawling activities, what crawling stages URLs are in. The few stages a URL can have are:
Method Summary | |
---|---|
int |
getActiveCount()
Gets the number of active URLs (currently being processed). |
CrawlURL |
getCached(String cacheURL)
Gets the cached URL from previous time crawler was run (e.g. |
int |
getProcessedCount()
Gets the number of URLs processed. |
int |
getQueueSize()
Gets the size of the URL queue (number of URLs left to process). |
boolean |
isActive(String url)
Whether the given URL is currently being processed (i.e. |
boolean |
isCacheEmpty()
Whether there are any URLs the the cache from a previous crawler run. |
boolean |
isProcessed(String url)
Whether the given URL has been processed. |
boolean |
isQueued(String url)
Whether the given URL is in the queue or not (waiting to be processed). |
boolean |
isQueueEmpty()
Whether there are any URLs to process in the queue. |
boolean |
isVanished(CrawlURL crawlURL)
Whether a url has been deleted. |
CrawlURL |
next()
Returns the next URL to be processed and marks it as being "active" (i.e. |
void |
processed(CrawlURL crawlURL)
Marks this URL as processed. |
void |
queue(String url,
int depth)
Queues a URL for future processing. |
void |
queueCache()
Queues URLs cached from a previous run so they can be processed again. |
Method Detail |
---|
void queue(String url, int depth)
url
- the URL to eventually be processeddepth
- how many clicks away from starting URL(s)boolean isQueueEmpty()
true
if the queue is emptyint getQueueSize()
boolean isQueued(String url)
url
- url
true
if the URL is in the queueCrawlURL next()
boolean isActive(String url)
url
- the url
true
if activeint getActiveCount()
CrawlURL getCached(String cacheURL)
cacheURL
- URL cached from previous run
boolean isCacheEmpty()
true
if the cache is emptyvoid processed(CrawlURL crawlURL)
crawlURL
- boolean isProcessed(String url)
url
- url
true
if processedint getProcessedCount()
void queueCache()
boolean isVanished(CrawlURL crawlURL)
crawlURL
- the URL
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |