Commit Graph

61 Commits

Author SHA1 Message Date
I-Al-Istannen
10d9d74528 Bail out when crawling recursive courses 2022-01-08 20:28:30 +01:00
I-Al-Istannen
43c5453e10 Correctly crawl files on desktop
The files on the desktop do not include a download link, so we need to
rewrite it.
2022-01-08 20:00:53 +01:00
I-Al-Istannen
e32c1f000f Fix mtime for single streams 2022-01-08 18:05:48 +01:00
I-Al-Istannen
5f527bc697 Remove Python 3.9 Pattern typehints 2022-01-08 17:14:40 +01:00
I-Al-Istannen
ced8b9a2d0 Fix some accordions 2022-01-08 16:58:30 +01:00
I-Al-Istannen
6f3cfd4396 Fix personal desktop crawling 2022-01-08 16:58:15 +01:00
I-Al-Istannen
462d993fbc Fix local video path cache (hopefully) 2022-01-08 00:27:48 +01:00
I-Al-Istannen
a99356f2a2 Fix video stream extraction 2022-01-08 00:27:34 +01:00
I-Al-Istannen
eac2e34161 Fix is_logged_in for ILIAS 7 2022-01-07 23:32:31 +01:00
I-Al-Istannen
a82a0b19c2 Collect crawler warnings/errors and include them in the report 2021-11-07 21:48:55 +01:00
I-Al-Istannen
90cb6e989b Do not download single videos if cache does not exist 2021-11-06 23:21:15 +01:00
I-Al-Istannen
6289938d7c Do not stop crawling files when encountering a CrawlWarning 2021-11-06 12:09:51 +01:00
I-Al-Istannen
88afe64a92 Refactor IPD crawler a bit 2021-11-02 01:25:01 +00:00
Julius Rüberg
6b2a657573 Fix IPD crawler for different subpages (#42)
This patch reworks the IPD crawler to support subpages which do not use
"/intern" for links and fetches the folder names from table headings.
2021-11-02 01:25:01 +00:00
I-Al-Istannen
e42ab83d32 Add support for ILIAS cards 2021-10-30 18:13:44 +02:00
I-Al-Istannen
f9a3f9b9f2 Handle multi-stream videos 2021-10-30 18:12:29 +02:00
I-Al-Istannen
6673077397 Add kit-ipd crawler 2021-10-21 13:20:21 +02:00
Joscha
544d45cbc5 Catch non-critical exceptions at crawler top level 2021-07-13 15:42:11 +02:00
I-Al-Istannen
ee67f9f472 Sort elements by ILIAS id to ensure deterministic ordering 2021-07-06 17:45:48 +02:00
I-Al-Istannen
8ec3f41251 Crawl ilias booking objects as links 2021-07-06 16:15:25 +02:00
I-Al-Istannen
89be07d4d3 Use final crawl path in HTML parsing message 2021-07-03 17:05:48 +02:00
I-Al-Istannen
91200f3684 Fix nondeterministic name deduplication 2021-07-03 12:09:55 +02:00
I-Al-Istannen
6e4d423c81 Crawl all video stages in one crawl bar
This ensures folders are not renamed, as they are crawled twice
2021-06-13 17:18:45 +02:00
I-Al-Istannen
70ec64a48b Fix wrong base URL for multi-stage pages 2021-06-13 15:44:47 +02:00
I-Al-Istannen
8ab462fb87 Use the exercise label instead of the button name as path 2021-06-04 19:24:23 +02:00
Joscha
df3ad3d890 Add 'skip' option to crawlers 2021-06-04 18:47:13 +02:00
Joscha
722970a255 Store cookies in text-based format
Using the stdlib's http.cookie module, cookies are now stored as one
"Set-Cookie" header per line. Previously, the aiohttp.CookieJar's save() and
load() methods were used (which use pickling).
2021-05-31 20:18:20 +00:00
Joscha
f40820c41f Warn if using concurrent tasks with kit-ilias-web 2021-05-31 20:18:20 +00:00
I-Al-Istannen
1fba96abcb Fix exercise date parsing for non-group submissions
ILIAS apparently changes the order of the fields as it sees fit, so we
now try to parse *every* column, starting at from the right, as a date.
The first column that parses successfully is then used.
2021-05-31 18:15:12 +02:00
Joscha
7b062883f6 Use raw paths for --debug-transforms
Previously, the already-transformed paths were used, which meant that
--debug-transforms was cumbersome to use (as you had to remove all transforms
and crawl once before getting useful results).
2021-05-31 12:33:37 +02:00
Joscha
64a2960751 Align paths in status messages and progress bars
Also print "Ignored" when paths are ignored due to transforms
2021-05-31 12:32:42 +02:00
I-Al-Istannen
1ca6740e05 Improve log messages when parsing ILIAS HTML
Previously some logs were split around an "await", which isn't a great
idea.
2021-05-27 17:59:22 +02:00
Joscha
474aa7e1cc Use sorted path order when debugging transforms 2021-05-27 15:41:00 +00:00
I-Al-Istannen
5beb4d9a2d Fix renaming conflict with multi-stage video elements 2021-05-27 15:41:00 +02:00
I-Al-Istannen
19eed5bdff Fix authentication logic conflicts with videos 2021-05-27 15:41:00 +02:00
Joscha
533f75ea71 Add --debug-transforms flag 2021-05-26 11:37:32 +02:00
I-Al-Istannen
2d8dcc87ff Send CSRF token in TFA request 2021-05-25 22:50:40 +02:00
I-Al-Istannen
66f0e398a1 Await result in tfa authenticate path 2021-05-25 19:19:51 +02:00
I-Al-Istannen
263780e6a3 Use certifi to ensure CA certificates are bundled in pyinstaller 2021-05-25 18:24:06 +02:00
I-Al-Istannen
a848194601 Rename plaintext link option to "plaintext" 2021-05-25 17:15:13 +02:00
Joscha
aabce764ac Clean up TODOs 2021-05-25 15:54:01 +02:00
I-Al-Istannen
486699cef3 Create anonymous TFA authenticator in ilias crawler
This ensures that *some* TFA authenticator is always present when
authenticating, even if none is specified in the config.

The TfaAuthenticator does not depend on any configured values, so it can
be created on-demand.
2021-05-25 15:11:52 +02:00
Joscha
61430c8739 Overhaul config and CLI option names 2021-05-25 14:23:38 +02:00
Joscha
eb8b915813 Fix path prefix on windows
Previously, the path prefix was only set if "windows_paths" was true, regardless
of OS. Now the path prefix is always set on windows and never set on other OSes.
2021-05-25 14:23:38 +02:00
I-Al-Istannen
651b087932 Use cl/dl deduplication mechanism for ILIAS crawler 2021-05-25 12:15:38 +02:00
Joscha
bce3dc384d Deduplicate path names in crawler
Also rename files so they follow the restrictions for windows file names if
we're on windows.
2021-05-25 12:11:15 +02:00
I-Al-Istannen
ffda4e43df Add extension to link files 2021-05-25 11:41:57 +02:00
I-Al-Istannen
69cb2a7734 Add Links option to ilias crawler
This allows you to configure what type the link files should have and
whether to create them at all.
2021-05-25 11:41:57 +02:00
I-Al-Istannen
85f89a7ff3 Interpret accordions and expandable headers as virtual folders
This allows us to find a file named "Test" in an accordion "Acc" as "Acc/Test".
2021-05-24 18:54:26 +02:00
I-Al-Istannen
9ce20216b5 Do not set a timeout for whole HTTP request
Downloads might take longer!
2021-05-24 18:54:26 +02:00