Change limiter logic

Now download tasks are a subset of all tasks.
2023-12-21 10:23:01 +01:00 · 2021-05-15 13:21:38 +02:00
parent b0f9e1e8b4
commit acd674f0a0
3 changed files with 85 additions and 54 deletions
--- a/CONFIG.md
+++ b/CONFIG.md
@ -64,17 +64,14 @@ crawlers:
      remote file is different.
 - `transform`: Rules for renaming and excluding certain files and directories.
  For more details, see [this section](#transformation-rules). (Default: empty)
- `max_concurrent_crawls`: The maximum number of concurrent crawl actions. What
-  constitutes a crawl action might vary from crawler to crawler, but it usually
-  means an HTTP request of a page to analyze. (Default: 1)
- `max_concurrent_downloads`: The maximum number of concurrent download actions.
-  What constitutes a download action might vary from crawler to crawler, but it
-  usually means an HTTP request for a single file. (Default: 1)
- `request_delay`: Time (in seconds) that the crawler should wait between
-  subsequent requests. Can be used to avoid unnecessary strain for the crawl
-  target. Crawl and download actions are handled separately, meaning that a
-  download action might immediately follow a crawl action even if this is set to
-  a nonzero value. (Default: 0)
+- `max_concurrent_tasks`: The maximum number of concurrent tasks (such as
+  crawling or downloading). (Default: 1)
+- `max_concurrent_downloads`: How many of those tasks can be download tasks at
+  the same time. Must not be greater than `max_concurrent_tasks`. When not set,
+  this is the same as `max_concurrent_tasks`. (Optional)
+- `delay_between_tasks`: Time (in seconds) that the crawler should wait between
+  subsequent tasks. Can be used as a sort of rate limit to avoid unnecessary
+  load for the crawl target. (Default: 0.0)

 Some crawlers may also require credentials for authentication. To configure how
 the crawler obtains its credentials, the `auth` option is used. It is set to the