90cb6e989b
Do not download single videos if cache does not exist
2021-11-06 23:21:15 +01:00
6289938d7c
Do not stop crawling files when encountering a CrawlWarning
2021-11-06 12:09:51 +01:00
88afe64a92
Refactor IPD crawler a bit
2021-11-02 01:25:01 +00:00
6b2a657573
Fix IPD crawler for different subpages ( #42 )
...
This patch reworks the IPD crawler to support subpages which do not use
"/intern" for links and fetches the folder names from table headings.
2021-11-02 01:25:01 +00:00
e42ab83d32
Add support for ILIAS cards
2021-10-30 18:13:44 +02:00
f9a3f9b9f2
Handle multi-stream videos
2021-10-30 18:12:29 +02:00
6673077397
Add kit-ipd crawler
2021-10-21 13:20:21 +02:00
544d45cbc5
Catch non-critical exceptions at crawler top level
2021-07-13 15:42:11 +02:00
ee67f9f472
Sort elements by ILIAS id to ensure deterministic ordering
2021-07-06 17:45:48 +02:00
8ec3f41251
Crawl ilias booking objects as links
2021-07-06 16:15:25 +02:00
89be07d4d3
Use final crawl path in HTML parsing message
2021-07-03 17:05:48 +02:00
91200f3684
Fix nondeterministic name deduplication
2021-07-03 12:09:55 +02:00
6e4d423c81
Crawl all video stages in one crawl bar
...
This ensures folders are not renamed, as they are crawled twice
2021-06-13 17:18:45 +02:00
70ec64a48b
Fix wrong base URL for multi-stage pages
2021-06-13 15:44:47 +02:00
8ab462fb87
Use the exercise label instead of the button name as path
2021-06-04 19:24:23 +02:00
df3ad3d890
Add 'skip' option to crawlers
2021-06-04 18:47:13 +02:00
722970a255
Store cookies in text-based format
...
Using the stdlib's http.cookie module, cookies are now stored as one
"Set-Cookie" header per line. Previously, the aiohttp.CookieJar's save() and
load() methods were used (which use pickling).
2021-05-31 20:18:20 +00:00
f40820c41f
Warn if using concurrent tasks with kit-ilias-web
2021-05-31 20:18:20 +00:00
1fba96abcb
Fix exercise date parsing for non-group submissions
...
ILIAS apparently changes the order of the fields as it sees fit, so we
now try to parse *every* column, starting at from the right, as a date.
The first column that parses successfully is then used.
2021-05-31 18:15:12 +02:00
7b062883f6
Use raw paths for --debug-transforms
...
Previously, the already-transformed paths were used, which meant that
--debug-transforms was cumbersome to use (as you had to remove all transforms
and crawl once before getting useful results).
2021-05-31 12:33:37 +02:00
64a2960751
Align paths in status messages and progress bars
...
Also print "Ignored" when paths are ignored due to transforms
2021-05-31 12:32:42 +02:00
1ca6740e05
Improve log messages when parsing ILIAS HTML
...
Previously some logs were split around an "await", which isn't a great
idea.
2021-05-27 17:59:22 +02:00
474aa7e1cc
Use sorted path order when debugging transforms
2021-05-27 15:41:00 +00:00
5beb4d9a2d
Fix renaming conflict with multi-stage video elements
2021-05-27 15:41:00 +02:00
19eed5bdff
Fix authentication logic conflicts with videos
2021-05-27 15:41:00 +02:00
533f75ea71
Add --debug-transforms flag
2021-05-26 11:37:32 +02:00
2d8dcc87ff
Send CSRF token in TFA request
2021-05-25 22:50:40 +02:00
66f0e398a1
Await result in tfa authenticate path
2021-05-25 19:19:51 +02:00
263780e6a3
Use certifi to ensure CA certificates are bundled in pyinstaller
2021-05-25 18:24:06 +02:00
a848194601
Rename plaintext link option to "plaintext"
2021-05-25 17:15:13 +02:00
aabce764ac
Clean up TODOs
2021-05-25 15:54:01 +02:00
486699cef3
Create anonymous TFA authenticator in ilias crawler
...
This ensures that *some* TFA authenticator is always present when
authenticating, even if none is specified in the config.
The TfaAuthenticator does not depend on any configured values, so it can
be created on-demand.
2021-05-25 15:11:52 +02:00
61430c8739
Overhaul config and CLI option names
2021-05-25 14:23:38 +02:00
eb8b915813
Fix path prefix on windows
...
Previously, the path prefix was only set if "windows_paths" was true, regardless
of OS. Now the path prefix is always set on windows and never set on other OSes.
2021-05-25 14:23:38 +02:00
651b087932
Use cl/dl deduplication mechanism for ILIAS crawler
2021-05-25 12:15:38 +02:00
bce3dc384d
Deduplicate path names in crawler
...
Also rename files so they follow the restrictions for windows file names if
we're on windows.
2021-05-25 12:11:15 +02:00
ffda4e43df
Add extension to link files
2021-05-25 11:41:57 +02:00
69cb2a7734
Add Links option to ilias crawler
...
This allows you to configure what type the link files should have and
whether to create them at all.
2021-05-25 11:41:57 +02:00
85f89a7ff3
Interpret accordions and expandable headers as virtual folders
...
This allows us to find a file named "Test" in an accordion "Acc" as "Acc/Test".
2021-05-24 18:54:26 +02:00
9ce20216b5
Do not set a timeout for whole HTTP request
...
Downloads might take longer!
2021-05-24 18:54:26 +02:00
86ba47541b
Fix cookie loading and saving
2021-05-24 16:55:11 +02:00
492ec6a932
Detect and skip ILIAS tests
2021-05-24 16:36:15 +02:00
342076ee0e
Handle exercise detail containers in ILIAS html parser
2021-05-24 16:22:51 +02:00
d44f6966c2
Log authentication attempts in HTTP crawler
2021-05-24 16:22:11 +02:00
c687d4a51a
Implement cookie sharing
2021-05-24 13:10:44 +02:00
fca62541ca
De-duplicate element names in ILIAS crawler
...
This prevents any conflicts caused by multiple files with the same name.
Conflicts may still arise due to transforms, but that is out of our
control and a user error.
2021-05-24 00:24:31 +02:00
3ab3581f84
Add timeout for HTTP connection
2021-05-23 23:41:05 +02:00
27b5a8e490
Rename log.action to log.status
2021-05-23 22:40:33 +02:00
ce1dbda5b4
Overhaul colours
...
"Crawled" and "Downloaded" are now printed less bright than "Crawling" and
"Downloading" as they're not as important. Explain topics are printed in yellow
to stand out a bit more from the cyan action messages.
2021-05-23 21:33:04 +02:00
6ca0ecdf05
Load and store reports
2021-05-23 20:46:29 +02:00