Commit Graph

673 Commits

Author SHA1 Message Date
Joscha 5c76193045 Set up pyinstaller 2021-05-24 15:21:25 +02:00
Joscha 1c1f781be4 Reword some log messages 2021-05-24 13:17:28 +02:00
Joscha c687d4a51a Implement cookie sharing 2021-05-24 13:10:44 +02:00
I-Al-Istannen fca62541ca De-duplicate element names in ILIAS crawler
This prevents any conflicts caused by multiple files with the same name.
Conflicts may still arise due to transforms, but that is out of our
control and a user error.
2021-05-24 00:24:31 +02:00
I-Al-Istannen 3ab3581f84 Add timeout for HTTP connection 2021-05-23 23:41:05 +02:00
I-Al-Istannen 8dd0689420 Add keyring authentication to ILIAS CLI 2021-05-23 23:04:18 +02:00
Joscha be4b1040f8 Document status and report options 2021-05-23 22:51:42 +02:00
Joscha 79be6e1dc5 Switch some other options to BooleanOptionalAction 2021-05-23 22:49:09 +02:00
Joscha edbd92dbbf Add --status and --report flags 2021-05-23 22:41:59 +02:00
Joscha 27b5a8e490 Rename log.action to log.status 2021-05-23 22:40:33 +02:00
Joscha 1f400d5964 Implement BooleanOptionalAction 2021-05-23 22:26:59 +02:00
Joscha 0ca0680165 Simplify --version 2021-05-23 21:40:48 +02:00
Joscha ce1dbda5b4 Overhaul colours
"Crawled" and "Downloaded" are now printed less bright than "Crawling" and
"Downloading" as they're not as important. Explain topics are printed in yellow
to stand out a bit more from the cyan action messages.
2021-05-23 21:33:04 +02:00
Joscha 9cce78669f Print report after all crawlers have finished 2021-05-23 21:17:13 +02:00
Joscha 6ca0ecdf05 Load and store reports 2021-05-23 20:46:29 +02:00
I-Al-Istannen 6e9f8fd391 Add a keyring authenticator 2021-05-23 19:44:12 +02:00
Joscha 2fdf24495b Restructure crawling and auth related modules 2021-05-23 19:16:42 +02:00
Joscha bbf9f8f130 Add -C as alias for --crawler 2021-05-23 19:06:09 +02:00
I-Al-Istannen 37f8d84a9c Output total amount of http requests in HTTP Crawler 2021-05-23 19:00:01 +02:00
Joscha 5edd868d5b Fix always-smart redownloading the wrong files 2021-05-23 18:49:34 +02:00
Joscha e4e5e83be6 Fix downloader using crawl bar
Looks like I made a dumb copy-paste error. Now the download bar shows the proper
progress and speed again.
2021-05-23 18:39:43 +02:00
Joscha 74c7b39dc8 Clean up files in alphabetical order 2021-05-23 18:39:25 +02:00
Joscha 445dffc987 Reword some explanations 2021-05-23 18:35:32 +02:00
I-Al-Istannen d97d6bf147 Fix handling nested ILIAS folders 2021-05-23 18:29:28 +02:00
I-Al-Istannen 79efdb56f7 Adjust ILIAS html explain messages 2021-05-23 18:24:25 +02:00
Joscha a9af56a5e9 Improve specifying crawlers via CLI
Instead of removing the sections of unselected crawlers from the config file,
crawler selection now happens in the Pferd after loading the crawlers and is
more sophisticated. It also has better error messages.
2021-05-23 18:18:50 +02:00
I-Al-Istannen 59f13bb8d6 Explain ILIAS HTML parsing and add some warnings 2021-05-23 18:14:54 +02:00
I-Al-Istannen 463f8830d7 Add warn_contd 2021-05-23 18:14:54 +02:00
I-Al-Istannen 05ad06fbc1 Only enclose get_page in iorepeat in ILIAS crawler
We previously also gathered in there, which could lead to some more
surprises when the method was retried.
2021-05-23 18:14:51 +02:00
Joscha 29d5a40c57 Replace asyncio.gather with custom Crawler function 2021-05-23 17:25:16 +02:00
Joscha c0cecf8363 Log crawl and download actions more extensively 2021-05-23 16:25:44 +02:00
Joscha b998339002 Fix cleanup logging of paths 2021-05-23 16:25:44 +02:00
Joscha 245c9c3dcc Explain output dir decisions and steps 2021-05-23 16:25:44 +02:00
I-Al-Istannen d8f26a789e Implement CLI Command for ilias crawler 2021-05-23 13:30:42 +02:00
I-Al-Istannen e1d18708b3 Rename "no_videos" to videos 2021-05-23 13:30:42 +02:00
Joscha b44b49476d Fix noncritical and anoncritical decorators
I must've forgot to update the anoncritical decorator when I last changed the
noncritical decorator. Also, every exception should make the crawler not
error_free, not just CrawlErrors.
2021-05-23 13:24:53 +02:00
Joscha 7e0bb06259 Clean up TODOs 2021-05-23 12:47:30 +02:00
I-Al-Istannen ecdedfa1cf Add no-videos flag to ILIAS crawler 2021-05-23 12:37:01 +02:00
I-Al-Istannen 3d4b997d4a Retry crawl_url and work around Python's closure handling
Closures capture the scope and not the variables. Therefore, any
type-narrowing performed by mypy on captured variables is lost inside
the closure.
2021-05-23 12:28:15 +02:00
Joscha e81005ae4b Fix CLI arguments 2021-05-23 12:24:21 +02:00
I-Al-Istannen 33a81a5f5c Document authentication in HTTP crawler and rename prepare_request 2021-05-23 11:55:34 +02:00
Joscha 25e2abdb03 Improve transformer explain wording 2021-05-23 11:45:14 +02:00
Joscha 803e5628a2 Clean up logging
Paths are now (hopefully) logged consistently across all crawlers
2021-05-23 11:37:19 +02:00
Joscha c88f20859a Explain config file dumping 2021-05-23 11:04:50 +02:00
Joscha ec3767c545 Create crawler base dir at start of crawl 2021-05-23 10:52:02 +02:00
Joscha 729ff0a4c7 Fix simple authenticator output 2021-05-23 10:45:37 +02:00
Joscha 6fe51e258f Number rules starting at 1 2021-05-23 10:45:37 +02:00
Joscha 44ecb2fbe7 Fix cleanup deleting crawler's base directory 2021-05-23 10:45:37 +02:00
I-Al-Istannen 53e031d9f6 Reuse dl/cl for I/O retries in ILIAS crawler 2021-05-23 00:28:27 +02:00
I-Al-Istannen 8ac85ea0bd Fix a few typos in HttpCrawler 2021-05-22 23:37:34 +02:00