Commit Graph

47 Commits

Author SHA1 Message Date
Mr. Pine dbc2553b11 Add default `show-not-deleted` option
If set to `no`, PFERD won't print status or report messages for not deleted files
2023-08-26 18:43:01 +02:00
Mr. Pine 443f7fe839 Add `no-delete-prompt-overwrite` crawler conflict resolution option (#75) 2023-07-29 18:36:33 +02:00
Joscha 635caa765d Fix typo
Thanks, burg113
2022-11-15 17:17:57 +01:00
Joscha 07200bbde5 Document ilias web crawler's forums option 2022-10-31 14:12:27 +01:00
Joscha ed24366aba Add pass authenticator 2022-06-05 10:04:42 +02:00
Joscha 616b0480f7 Simplify IPD crawler link regex 2022-05-08 18:18:05 +02:00
Joscha afbd03f777 Fix docs 2022-05-05 14:35:42 +02:00
I-Al-Istannen 13b8c3d9c6 Add regex option to config and CLI parser 2021-11-02 09:30:46 +01:00
Toorero d6f38a61e1 Fixed minor spelling mistakes 2021-11-02 01:54:00 +01:00
lukasprobst 55ea304ff3 Disable interpolation of ConfigParser 2021-10-25 23:37:42 +02:00
I-Al-Istannen 6673077397 Add kit-ipd crawler 2021-10-21 13:20:21 +02:00
Joscha 70b33ecfd9 Add migration notes to changelog
Also clean up some other formatting for consistency
2021-06-13 15:06:50 +02:00
Joscha a292c4c437 Add example for ">>" arrow heads 2021-06-12 14:57:29 +02:00
Joscha f28bbe6b0c Update transform rule documentation
It's still missing an example that uses rules with ">>" arrows.
2021-06-09 22:45:52 +02:00
Joscha df3ad3d890 Add 'skip' option to crawlers 2021-06-04 18:47:13 +02:00
Joscha 1fc8e9eb7a Document credential file authenticator config options 2021-06-01 10:01:14 +00:00
Joscha 9d5ec84b91 Add credential file authenticator 2021-05-31 18:33:34 +02:00
Joscha 6fa9cfd4c3 Fix error when capturing group is None 2021-05-27 15:41:00 +02:00
Joscha 2c72a9112c Reword `-name->` and `-name-re->` docs and remove `-name-exact->` 2021-05-27 13:20:37 +02:00
Joscha 17207546e9 Document --debug-transforms 2021-05-26 11:47:51 +02:00
Joscha c665c36d88 Update README, CHANGELOG 2021-05-25 17:18:31 +02:00
Joscha 61430c8739 Overhaul config and CLI option names 2021-05-25 14:23:38 +02:00
Joscha bce3dc384d Deduplicate path names in crawler
Also rename files so they follow the restrictions for windows file names if
we're on windows.
2021-05-25 12:11:15 +02:00
Joscha c687d4a51a Implement cookie sharing 2021-05-24 13:10:44 +02:00
I-Al-Istannen 3ab3581f84 Add timeout for HTTP connection 2021-05-23 23:41:05 +02:00
Joscha be4b1040f8 Document status and report options 2021-05-23 22:51:42 +02:00
I-Al-Istannen 6e9f8fd391 Add a keyring authenticator 2021-05-23 19:44:12 +02:00
I-Al-Istannen ecdedfa1cf Add no-videos flag to ILIAS crawler 2021-05-23 12:37:01 +02:00
Joscha 0d10752b5a Configure explain log level via cli and config file 2021-05-19 17:50:10 +02:00
I-Al-Istannen db1219d4a9 Create a link file in ILIAS crawler
This allows us to crawl links and represent them in the file system.
Users can choose between an ILIAS-imitation (that optionally
auto-redirects) and a plain text variant.
2021-05-17 21:44:54 +02:00
I-Al-Istannen 467ea3a37e Document ILIAS-Crawler arguments in CONFIG.md 2021-05-16 13:26:58 +02:00
Joscha e1104f888d Add tfa authenticator 2021-05-15 18:27:16 +02:00
Joscha 8c32da7f19 Let authenticators provide username and password separately 2021-05-15 18:27:03 +02:00
Joscha b70b62cef5 Make crawler sections start with "crawl:"
Also, use only the part of the section name after the "crawl:" as the crawler's
output directory. Now, the implementation matches the documentation again
2021-05-15 17:24:37 +02:00
Joscha 868f486922 Rename local crawler path to target 2021-05-15 17:12:25 +02:00
Joscha a6fdf05ee9 Allow variable whitespace in arrow rules 2021-05-15 15:25:05 +02:00
Joscha f897d7c2e1 Add name variants for all arrows 2021-05-15 15:25:05 +02:00
Joscha 302b8c0c34 Fix errors loading local crawler config
Apparently getint and getfloat may return a None even though this is not
mentioned in their type annotations.
2021-05-15 15:25:05 +02:00
Joscha acd674f0a0 Change limiter logic
Now download tasks are a subset of all tasks.
2021-05-15 15:25:05 +02:00
Joscha 296a169dd3 Make limiter logic more complex
The limiter can now distinguish between crawl and download actions and has a
fancy slot system and delay logic.
2021-05-15 15:25:05 +02:00
Joscha 1591cb9197 Add options to slow down local crawler
These options are meant to make the local crawler behave more like a
network-based crawler for purposes of testing and debugging other parts of the
code base.
2021-05-15 15:25:01 +02:00
Joscha 961f40f9a1 Document simple authenticator 2021-05-13 19:55:04 +02:00
Joscha f9b2fd60e2 Document local crawler and auth 2021-05-09 01:33:47 +02:00
Joscha fde811ae5a Document on_conflict option 2021-05-05 12:24:35 +02:00
Joscha a8dcf941b9 Document possible redownload settings 2021-04-30 15:32:56 +02:00
Joscha e7a51decb0 Elaborate on transforms and implement changes 2021-04-29 20:24:18 +02:00
Joscha 9ec19be113 Document config file format 2021-04-29 20:24:18 +02:00