Joscha
b70b62cef5
Make crawler sections start with "crawl:"
...
Also, use only the part of the section name after the "crawl:" as the crawler's
output directory. Now, the implementation matches the documentation again
2021-05-15 17:24:37 +02:00
Joscha
868f486922
Rename local crawler path to target
2021-05-15 17:12:25 +02:00
Joscha
a6fdf05ee9
Allow variable whitespace in arrow rules
2021-05-15 15:25:05 +02:00
Joscha
f897d7c2e1
Add name variants for all arrows
2021-05-15 15:25:05 +02:00
Joscha
302b8c0c34
Fix errors loading local crawler config
...
Apparently getint and getfloat may return a None even though this is not
mentioned in their type annotations.
2021-05-15 15:25:05 +02:00
Joscha
acd674f0a0
Change limiter logic
...
Now download tasks are a subset of all tasks.
2021-05-15 15:25:05 +02:00
Joscha
296a169dd3
Make limiter logic more complex
...
The limiter can now distinguish between crawl and download actions and has a
fancy slot system and delay logic.
2021-05-15 15:25:05 +02:00
Joscha
1591cb9197
Add options to slow down local crawler
...
These options are meant to make the local crawler behave more like a
network-based crawler for purposes of testing and debugging other parts of the
code base.
2021-05-15 15:25:01 +02:00
Joscha
961f40f9a1
Document simple authenticator
2021-05-13 19:55:04 +02:00
Joscha
f9b2fd60e2
Document local crawler and auth
2021-05-09 01:33:47 +02:00
Joscha
fde811ae5a
Document on_conflict option
2021-05-05 12:24:35 +02:00
Joscha
a8dcf941b9
Document possible redownload settings
2021-04-30 15:32:56 +02:00
Joscha
e7a51decb0
Elaborate on transforms and implement changes
2021-04-29 20:24:18 +02:00
Joscha
9ec19be113
Document config file format
2021-04-29 20:24:18 +02:00