Joscha
635caa765d
Fix typo
...
Thanks, burg113
2022-11-15 17:17:57 +01:00
Joscha
07200bbde5
Document ilias web crawler's forums option
2022-10-31 14:12:27 +01:00
Joscha
ed24366aba
Add pass authenticator
2022-06-05 10:04:42 +02:00
Joscha
616b0480f7
Simplify IPD crawler link regex
2022-05-08 18:18:05 +02:00
Joscha
afbd03f777
Fix docs
2022-05-05 14:35:42 +02:00
I-Al-Istannen
13b8c3d9c6
Add regex option to config and CLI parser
2021-11-02 09:30:46 +01:00
Toorero
d6f38a61e1
Fixed minor spelling mistakes
2021-11-02 01:54:00 +01:00
lukasprobst
55ea304ff3
Disable interpolation of ConfigParser
2021-10-25 23:37:42 +02:00
I-Al-Istannen
6673077397
Add kit-ipd crawler
2021-10-21 13:20:21 +02:00
Joscha
70b33ecfd9
Add migration notes to changelog
...
Also clean up some other formatting for consistency
2021-06-13 15:06:50 +02:00
Joscha
a292c4c437
Add example for ">>" arrow heads
2021-06-12 14:57:29 +02:00
Joscha
f28bbe6b0c
Update transform rule documentation
...
It's still missing an example that uses rules with ">>" arrows.
2021-06-09 22:45:52 +02:00
Joscha
df3ad3d890
Add 'skip' option to crawlers
2021-06-04 18:47:13 +02:00
Joscha
1fc8e9eb7a
Document credential file authenticator config options
2021-06-01 10:01:14 +00:00
Joscha
9d5ec84b91
Add credential file authenticator
2021-05-31 18:33:34 +02:00
Joscha
6fa9cfd4c3
Fix error when capturing group is None
2021-05-27 15:41:00 +02:00
Joscha
2c72a9112c
Reword -name->
and -name-re->
docs and remove -name-exact->
2021-05-27 13:20:37 +02:00
Joscha
17207546e9
Document --debug-transforms
2021-05-26 11:47:51 +02:00
Joscha
c665c36d88
Update README, CHANGELOG
2021-05-25 17:18:31 +02:00
Joscha
61430c8739
Overhaul config and CLI option names
2021-05-25 14:23:38 +02:00
Joscha
bce3dc384d
Deduplicate path names in crawler
...
Also rename files so they follow the restrictions for windows file names if
we're on windows.
2021-05-25 12:11:15 +02:00
Joscha
c687d4a51a
Implement cookie sharing
2021-05-24 13:10:44 +02:00
I-Al-Istannen
3ab3581f84
Add timeout for HTTP connection
2021-05-23 23:41:05 +02:00
Joscha
be4b1040f8
Document status and report options
2021-05-23 22:51:42 +02:00
I-Al-Istannen
6e9f8fd391
Add a keyring authenticator
2021-05-23 19:44:12 +02:00
I-Al-Istannen
ecdedfa1cf
Add no-videos flag to ILIAS crawler
2021-05-23 12:37:01 +02:00
Joscha
0d10752b5a
Configure explain log level via cli and config file
2021-05-19 17:50:10 +02:00
I-Al-Istannen
db1219d4a9
Create a link file in ILIAS crawler
...
This allows us to crawl links and represent them in the file system.
Users can choose between an ILIAS-imitation (that optionally
auto-redirects) and a plain text variant.
2021-05-17 21:44:54 +02:00
I-Al-Istannen
467ea3a37e
Document ILIAS-Crawler arguments in CONFIG.md
2021-05-16 13:26:58 +02:00
Joscha
e1104f888d
Add tfa authenticator
2021-05-15 18:27:16 +02:00
Joscha
8c32da7f19
Let authenticators provide username and password separately
2021-05-15 18:27:03 +02:00
Joscha
b70b62cef5
Make crawler sections start with "crawl:"
...
Also, use only the part of the section name after the "crawl:" as the crawler's
output directory. Now, the implementation matches the documentation again
2021-05-15 17:24:37 +02:00
Joscha
868f486922
Rename local crawler path to target
2021-05-15 17:12:25 +02:00
Joscha
a6fdf05ee9
Allow variable whitespace in arrow rules
2021-05-15 15:25:05 +02:00
Joscha
f897d7c2e1
Add name variants for all arrows
2021-05-15 15:25:05 +02:00
Joscha
302b8c0c34
Fix errors loading local crawler config
...
Apparently getint and getfloat may return a None even though this is not
mentioned in their type annotations.
2021-05-15 15:25:05 +02:00
Joscha
acd674f0a0
Change limiter logic
...
Now download tasks are a subset of all tasks.
2021-05-15 15:25:05 +02:00
Joscha
296a169dd3
Make limiter logic more complex
...
The limiter can now distinguish between crawl and download actions and has a
fancy slot system and delay logic.
2021-05-15 15:25:05 +02:00
Joscha
1591cb9197
Add options to slow down local crawler
...
These options are meant to make the local crawler behave more like a
network-based crawler for purposes of testing and debugging other parts of the
code base.
2021-05-15 15:25:01 +02:00
Joscha
961f40f9a1
Document simple authenticator
2021-05-13 19:55:04 +02:00
Joscha
f9b2fd60e2
Document local crawler and auth
2021-05-09 01:33:47 +02:00
Joscha
fde811ae5a
Document on_conflict option
2021-05-05 12:24:35 +02:00
Joscha
a8dcf941b9
Document possible redownload settings
2021-04-30 15:32:56 +02:00
Joscha
e7a51decb0
Elaborate on transforms and implement changes
2021-04-29 20:24:18 +02:00
Joscha
9ec19be113
Document config file format
2021-04-29 20:24:18 +02:00