Commit Graph

642 Commits

Author SHA1 Message Date
Joscha
eb8b915813 Fix path prefix on windows
Previously, the path prefix was only set if "windows_paths" was true, regardless
of OS. Now the path prefix is always set on windows and never set on other OSes.
2021-05-25 14:23:38 +02:00
Joscha
22c2259adb Clean up authenticator exceptions
- Renamed to *Error for consistency
- Treating AuthError like CrawlError
2021-05-25 14:23:38 +02:00
Joscha
c15a1aecdf Rename keyring authenticator file for consistency 2021-05-25 14:20:26 +02:00
Joscha
16d50b6626 Document why /pferd.py exists 2021-05-25 13:31:29 +02:00
I-Al-Istannen
651b087932 Use cl/dl deduplication mechanism for ILIAS crawler 2021-05-25 12:15:38 +02:00
Joscha
bce3dc384d Deduplicate path names in crawler
Also rename files so they follow the restrictions for windows file names if
we're on windows.
2021-05-25 12:11:15 +02:00
I-Al-Istannen
c21ddf225b Add a CLI option to configure ILIAS links behaviour 2021-05-25 11:58:41 +02:00
I-Al-Istannen
4fefb98d71 Add a wrapper to pretty-print ValueErrors in argparse parsers 2021-05-25 11:57:59 +02:00
I-Al-Istannen
ffda4e43df Add extension to link files 2021-05-25 11:41:57 +02:00
I-Al-Istannen
69cb2a7734 Add Links option to ilias crawler
This allows you to configure what type the link files should have and
whether to create them at all.
2021-05-25 11:41:57 +02:00
Joscha
c33de233dc Add script for releasing new versions 2021-05-24 20:23:14 +02:00
I-Al-Istannen
85f89a7ff3 Interpret accordions and expandable headers as virtual folders
This allows us to find a file named "Test" in an accordion "Acc" as "Acc/Test".
2021-05-24 18:54:26 +02:00
I-Al-Istannen
9ce20216b5 Do not set a timeout for whole HTTP request
Downloads might take longer!
2021-05-24 18:54:26 +02:00
Joscha
1739c54091 Add checklist for releasing new versions 2021-05-24 17:50:17 +02:00
Joscha
d8bd1f518a Set up build and release workflow 2021-05-24 17:27:39 +02:00
Joscha
86ba47541b Fix cookie loading and saving 2021-05-24 16:55:11 +02:00
I-Al-Istannen
492ec6a932 Detect and skip ILIAS tests 2021-05-24 16:36:15 +02:00
I-Al-Istannen
342076ee0e Handle exercise detail containers in ILIAS html parser 2021-05-24 16:22:51 +02:00
I-Al-Istannen
d44f6966c2 Log authentication attempts in HTTP crawler 2021-05-24 16:22:11 +02:00
Joscha
5c76193045 Set up pyinstaller 2021-05-24 15:21:25 +02:00
Joscha
1c1f781be4 Reword some log messages 2021-05-24 13:17:28 +02:00
Joscha
c687d4a51a Implement cookie sharing 2021-05-24 13:10:44 +02:00
I-Al-Istannen
fca62541ca De-duplicate element names in ILIAS crawler
This prevents any conflicts caused by multiple files with the same name.
Conflicts may still arise due to transforms, but that is out of our
control and a user error.
2021-05-24 00:24:31 +02:00
I-Al-Istannen
3ab3581f84 Add timeout for HTTP connection 2021-05-23 23:41:05 +02:00
I-Al-Istannen
8dd0689420 Add keyring authentication to ILIAS CLI 2021-05-23 23:04:18 +02:00
Joscha
be4b1040f8 Document status and report options 2021-05-23 22:51:42 +02:00
Joscha
79be6e1dc5 Switch some other options to BooleanOptionalAction 2021-05-23 22:49:09 +02:00
Joscha
edbd92dbbf Add --status and --report flags 2021-05-23 22:41:59 +02:00
Joscha
27b5a8e490 Rename log.action to log.status 2021-05-23 22:40:33 +02:00
Joscha
1f400d5964 Implement BooleanOptionalAction 2021-05-23 22:26:59 +02:00
Joscha
0ca0680165 Simplify --version 2021-05-23 21:40:48 +02:00
Joscha
ce1dbda5b4 Overhaul colours
"Crawled" and "Downloaded" are now printed less bright than "Crawling" and
"Downloading" as they're not as important. Explain topics are printed in yellow
to stand out a bit more from the cyan action messages.
2021-05-23 21:33:04 +02:00
Joscha
9cce78669f Print report after all crawlers have finished 2021-05-23 21:17:13 +02:00
Joscha
6ca0ecdf05 Load and store reports 2021-05-23 20:46:29 +02:00
I-Al-Istannen
6e9f8fd391 Add a keyring authenticator 2021-05-23 19:44:12 +02:00
Joscha
2fdf24495b Restructure crawling and auth related modules 2021-05-23 19:16:42 +02:00
Joscha
bbf9f8f130 Add -C as alias for --crawler 2021-05-23 19:06:09 +02:00
I-Al-Istannen
37f8d84a9c Output total amount of http requests in HTTP Crawler 2021-05-23 19:00:01 +02:00
Joscha
5edd868d5b Fix always-smart redownloading the wrong files 2021-05-23 18:49:34 +02:00
Joscha
e4e5e83be6 Fix downloader using crawl bar
Looks like I made a dumb copy-paste error. Now the download bar shows the proper
progress and speed again.
2021-05-23 18:39:43 +02:00
Joscha
74c7b39dc8 Clean up files in alphabetical order 2021-05-23 18:39:25 +02:00
Joscha
445dffc987 Reword some explanations 2021-05-23 18:35:32 +02:00
I-Al-Istannen
d97d6bf147 Fix handling nested ILIAS folders 2021-05-23 18:29:28 +02:00
I-Al-Istannen
79efdb56f7 Adjust ILIAS html explain messages 2021-05-23 18:24:25 +02:00
Joscha
a9af56a5e9 Improve specifying crawlers via CLI
Instead of removing the sections of unselected crawlers from the config file,
crawler selection now happens in the Pferd after loading the crawlers and is
more sophisticated. It also has better error messages.
2021-05-23 18:18:50 +02:00
I-Al-Istannen
59f13bb8d6 Explain ILIAS HTML parsing and add some warnings 2021-05-23 18:14:54 +02:00
I-Al-Istannen
463f8830d7 Add warn_contd 2021-05-23 18:14:54 +02:00
I-Al-Istannen
05ad06fbc1 Only enclose get_page in iorepeat in ILIAS crawler
We previously also gathered in there, which could lead to some more
surprises when the method was retried.
2021-05-23 18:14:51 +02:00
Joscha
29d5a40c57 Replace asyncio.gather with custom Crawler function 2021-05-23 17:25:16 +02:00
Joscha
c0cecf8363 Log crawl and download actions more extensively 2021-05-23 16:25:44 +02:00