I-Al-Istannen
a99ddaa0cc
Read and write config in UTF-8
2022-04-27 21:47:51 +02:00
I-Al-Istannen
a2831fbea2
Fix shib authentication
...
Authentication failed previously if the shib session was still valid.
If Shibboleth gets a request and the session is still valid, it directly
responds without a second redirect.
2022-04-27 13:55:24 +02:00
I-Al-Istannen
da72863b47
Placate newer mypy
2022-04-03 13:19:08 +02:00
I-Al-Istannen
86e2e226dc
Notify user when shibboleth presents new entitlements
2022-04-03 11:37:08 +02:00
I-Al-Istannen
7872fe5221
Fix tables with more columns than expected
2022-01-18 22:38:48 +01:00
Joscha
86947e4874
Bump version to 3.3.1
2022-01-15 15:11:22 +01:00
Joscha
4f022e2d19
Reword changelog
2022-01-15 15:06:02 +01:00
I-Al-Istannen
f47e7374d2
Use fixed windows path for video cache
2022-01-15 12:00:30 +01:00
I-Al-Istannen
57ec51e95a
Fix login after shib url parser change
2022-01-14 20:17:27 +01:00
Joscha
0045124a4e
Bump version to 3.3.0
2022-01-09 21:09:09 +01:00
I-Al-Istannen
e467b38d73
Only reject 1970 timestamps on windows
2022-01-09 18:23:00 +01:00
I-Al-Istannen
4bf0c972e6
Update types for rich 11
2022-01-09 11:48:26 +01:00
I-Al-Istannen
4ee919625d
Add rudimentary support for content pages
2022-01-08 20:47:35 +01:00
I-Al-Istannen
d30f25ee97
Detect shib login page as login page
...
And do not assume we are logged in...
2022-01-08 20:28:45 +01:00
I-Al-Istannen
10d9d74528
Bail out when crawling recursive courses
2022-01-08 20:28:30 +01:00
I-Al-Istannen
43c5453e10
Correctly crawl files on desktop
...
The files on the desktop do not include a download link, so we need to
rewrite it.
2022-01-08 20:00:53 +01:00
I-Al-Istannen
eb4de8ae0c
Ignore 1970 dates as windows crashes when calling .timestamp()
2022-01-08 18:14:43 +01:00
I-Al-Istannen
e32c1f000f
Fix mtime for single streams
2022-01-08 18:05:48 +01:00
I-Al-Istannen
5f527bc697
Remove Python 3.9 Pattern typehints
2022-01-08 17:14:40 +01:00
I-Al-Istannen
ced8b9a2d0
Fix some accordions
2022-01-08 16:58:30 +01:00
I-Al-Istannen
6f3cfd4396
Fix personal desktop crawling
2022-01-08 16:58:15 +01:00
I-Al-Istannen
462d993fbc
Fix local video path cache (hopefully)
2022-01-08 00:27:48 +01:00
I-Al-Istannen
a99356f2a2
Fix video stream extraction
2022-01-08 00:27:34 +01:00
I-Al-Istannen
eac2e34161
Fix is_logged_in for ILIAS 7
2022-01-07 23:32:31 +01:00
I-Al-Istannen
a82a0b19c2
Collect crawler warnings/errors and include them in the report
2021-11-07 21:48:55 +01:00
I-Al-Istannen
90cb6e989b
Do not download single videos if cache does not exist
2021-11-06 23:21:15 +01:00
I-Al-Istannen
6289938d7c
Do not stop crawling files when encountering a CrawlWarning
2021-11-06 12:09:51 +01:00
I-Al-Istannen
13b8c3d9c6
Add regex option to config and CLI parser
2021-11-02 09:30:46 +01:00
I-Al-Istannen
88afe64a92
Refactor IPD crawler a bit
2021-11-02 01:25:01 +00:00
Julius Rüberg
6b2a657573
Fix IPD crawler for different subpages ( #42 )
...
This patch reworks the IPD crawler to support subpages which do not use
"/intern" for links and fetches the folder names from table headings.
2021-11-02 01:25:01 +00:00
I-Al-Istannen
e42ab83d32
Add support for ILIAS cards
2021-10-30 18:13:44 +02:00
I-Al-Istannen
f9a3f9b9f2
Handle multi-stream videos
2021-10-30 18:12:29 +02:00
I-Al-Istannen
ef7d5ea2d3
Allow storing crawler-specific data in reports
2021-10-30 18:09:05 +02:00
lukasprobst
55ea304ff3
Disable interpolation of ConfigParser
2021-10-25 23:37:42 +02:00
I-Al-Istannen
6673077397
Add kit-ipd crawler
2021-10-21 13:20:21 +02:00
Joscha
742632ed8d
Bump version to 3.2.0
2021-08-04 18:27:26 +00:00
Joscha
544d45cbc5
Catch non-critical exceptions at crawler top level
2021-07-13 15:42:11 +02:00
I-Al-Istannen
ee67f9f472
Sort elements by ILIAS id to ensure deterministic ordering
2021-07-06 17:45:48 +02:00
I-Al-Istannen
8ec3f41251
Crawl ilias booking objects as links
2021-07-06 16:15:25 +02:00
I-Al-Istannen
89be07d4d3
Use final crawl path in HTML parsing message
2021-07-03 17:05:48 +02:00
I-Al-Istannen
91200f3684
Fix nondeterministic name deduplication
2021-07-03 12:09:55 +02:00
Joscha
9ffd603357
Error when using multiple segments with -name->
...
Previously, PFERD just silently never matched the -name-> arrow. Now, it errors
when loading the config file.
2021-07-01 11:14:50 +02:00
Joscha
80eeb8fe97
Add --skip option
2021-07-01 11:02:21 +02:00
Joscha
75fde870c2
Bump version to 3.1.0
2021-06-13 17:23:18 +02:00
I-Al-Istannen
6e4d423c81
Crawl all video stages in one crawl bar
...
This ensures folders are not renamed, as they are crawled twice
2021-06-13 17:18:45 +02:00
Joscha
57aef26217
Fix name arrows
...
I seem to have (re-)implemented them incorrectly and never tested them.
2021-06-13 16:33:29 +02:00
I-Al-Istannen
70ec64a48b
Fix wrong base URL for multi-stage pages
2021-06-13 15:44:47 +02:00
Joscha
61d902d715
Overhaul transform logic
...
-re-> arrows now rename their parent directories (like -->) and don't require a
full match (like -exact->). Their old behaviour is available as -exact-re->.
Also, this change adds the ">>" arrow head, which modifies the current path and
continues to the next rule when it matches.
2021-06-09 22:45:52 +02:00
I-Al-Istannen
8ab462fb87
Use the exercise label instead of the button name as path
2021-06-04 19:24:23 +02:00
Joscha
df3ad3d890
Add 'skip' option to crawlers
2021-06-04 18:47:13 +02:00
Joscha
fc31100a0f
Always use '/' as path separator for regex rules
...
Previously, regex-matching paths on windows would, in some cases, require four
backslashes ('\\\\') to escape a single path separator. That's just too much.
With this commit, regex transforms now use '/' instead of '\' as path separator,
meaning rules can more easily be shared between platforms (although they are not
guaranteed to be 100% compatible since on Windows, '\' is still recognized as a
path separator).
To make rules more intuitive to write, local relative paths are now also printed
with '/' as path separator on Windows. Since Windows also accepts '/' as path
separator, this change doesn't really affect other rules that parse their sides
as paths.
2021-06-04 18:12:45 +02:00
Joscha
31b6311e99
Remove incorrect tmp file explain message
2021-06-01 19:03:06 +02:00
Joscha
85b9f45085
Bump version to 3.0.1
2021-06-01 09:49:30 +00:00
Joscha
f656e3ff34
Fix credential parsing
2021-06-01 09:18:17 +00:00
Joscha
e1bda94329
Load credential file from correct path
2021-06-01 09:18:08 +00:00
Joscha
f6b26f4ead
Fix unexpected exception when credential file not found
2021-06-01 09:10:58 +00:00
Joscha
722970a255
Store cookies in text-based format
...
Using the stdlib's http.cookie module, cookies are now stored as one
"Set-Cookie" header per line. Previously, the aiohttp.CookieJar's save() and
load() methods were used (which use pickling).
2021-05-31 20:18:20 +00:00
Joscha
f40820c41f
Warn if using concurrent tasks with kit-ilias-web
2021-05-31 20:18:20 +00:00
Joscha
49ad1b6e46
Clean up authenticator code formatting
2021-05-31 18:45:06 +02:00
Joscha
1ce32d2f18
Add CLI option for credential file auth to kit-ilias-web
2021-05-31 18:45:06 +02:00
Joscha
9d5ec84b91
Add credential file authenticator
2021-05-31 18:33:34 +02:00
I-Al-Istannen
1fba96abcb
Fix exercise date parsing for non-group submissions
...
ILIAS apparently changes the order of the fields as it sees fit, so we
now try to parse *every* column, starting at from the right, as a date.
The first column that parses successfully is then used.
2021-05-31 18:15:12 +02:00
Joscha
7b062883f6
Use raw paths for --debug-transforms
...
Previously, the already-transformed paths were used, which meant that
--debug-transforms was cumbersome to use (as you had to remove all transforms
and crawl once before getting useful results).
2021-05-31 12:33:37 +02:00
Joscha
64a2960751
Align paths in status messages and progress bars
...
Also print "Ignored" when paths are ignored due to transforms
2021-05-31 12:32:42 +02:00
Joscha
17879a7f69
Print box around message for unexpected exceptions
2021-05-31 12:05:49 +02:00
Joscha
1dd24551a5
Add link to repo in --version output
2021-05-31 11:44:17 +02:00
Joscha
84f775013f
Use event loop workaround only on windows
...
This avoids an unnecessary one-second sleep on other platforms. However, a
better "fix" for this sleep would be a less ugly workaround on windows.
2021-05-31 11:41:52 +02:00
I-Al-Istannen
1ca6740e05
Improve log messages when parsing ILIAS HTML
...
Previously some logs were split around an "await", which isn't a great
idea.
2021-05-27 17:59:22 +02:00
Joscha
474aa7e1cc
Use sorted path order when debugging transforms
2021-05-27 15:41:00 +00:00
I-Al-Istannen
5beb4d9a2d
Fix renaming conflict with multi-stage video elements
2021-05-27 15:41:00 +02:00
I-Al-Istannen
19eed5bdff
Fix authentication logic conflicts with videos
2021-05-27 15:41:00 +02:00
Joscha
6fa9cfd4c3
Fix error when capturing group is None
2021-05-27 15:41:00 +02:00
Joscha
80acc4b50d
Implement new name arrows
2021-05-27 13:43:02 +02:00
Joscha
533f75ea71
Add --debug-transforms flag
2021-05-26 11:37:32 +02:00
Joscha
adb5d4ade3
Print files that are *not* deleted by cleanup
...
These are files that are not present on the remote source any more, but still
present locally. They also show up in the report.
2021-05-26 10:58:19 +02:00
Joscha
a879c6ab6e
Fix function being printed
2021-05-26 10:54:01 +02:00
Joscha
915e42fd07
Fix report not being printed if pferd exits normally
2021-05-26 10:53:54 +02:00
I-Al-Istannen
2d8dcc87ff
Send CSRF token in TFA request
2021-05-25 22:50:40 +02:00
I-Al-Istannen
66f0e398a1
Await result in tfa authenticate path
2021-05-25 19:19:51 +02:00
Joscha
30be4e29fa
Add workaround for RuntimeError after program finishes on Windows
2021-05-25 16:34:22 +00:00
I-Al-Istannen
263780e6a3
Use certifi to ensure CA certificates are bundled in pyinstaller
2021-05-25 18:24:06 +02:00
Joscha
07a75a37c3
Fix FileNotFoundError on Windows
2021-05-25 15:57:03 +00:00
Joscha
f85b75df8c
Switch from exit() to sys.exit()
...
Pyinstaller doesn't recognize exit().
2021-05-25 17:33:38 +02:00
Joscha
519a7ef435
Split --dump-config into two options
...
--dump-config with its optional argument tended to consume the command name, so
it had to be split up.
2021-05-25 17:17:35 +02:00
I-Al-Istannen
a848194601
Rename plaintext link option to "plaintext"
2021-05-25 17:15:13 +02:00
Joscha
aabce764ac
Clean up TODOs
2021-05-25 15:54:01 +02:00
Joscha
5a331663e4
Rename functions for consistency
2021-05-25 15:49:06 +02:00
Joscha
40144f8bd8
Fix rule error messages
2021-05-25 15:47:09 +02:00
Joscha
f68849c65f
Fix rules not being parsed entirely
2021-05-25 15:42:46 +02:00
Joscha
edb52a989e
Print report even if exiting due to Ctrl+C
2021-05-25 15:35:36 +02:00
Joscha
980578d05a
Avoid downloading in some cases
...
Depending on how on_conflict is set, we can determine a few situations where
downloading is never necessary.
2021-05-25 15:20:30 +02:00
I-Al-Istannen
486699cef3
Create anonymous TFA authenticator in ilias crawler
...
This ensures that *some* TFA authenticator is always present when
authenticating, even if none is specified in the config.
The TfaAuthenticator does not depend on any configured values, so it can
be created on-demand.
2021-05-25 15:11:52 +02:00
I-Al-Istannen
0096a0c077
Remove section and config parameter from Authenticator
2021-05-25 15:11:33 +02:00
I-Al-Istannen
d905e95dbb
Allow invalidation of keyring authenticator
2021-05-25 15:02:35 +02:00
Joscha
61430c8739
Overhaul config and CLI option names
2021-05-25 14:23:38 +02:00
Joscha
eb8b915813
Fix path prefix on windows
...
Previously, the path prefix was only set if "windows_paths" was true, regardless
of OS. Now the path prefix is always set on windows and never set on other OSes.
2021-05-25 14:23:38 +02:00
Joscha
22c2259adb
Clean up authenticator exceptions
...
- Renamed to *Error for consistency
- Treating AuthError like CrawlError
2021-05-25 14:23:38 +02:00
Joscha
c15a1aecdf
Rename keyring authenticator file for consistency
2021-05-25 14:20:26 +02:00
I-Al-Istannen
651b087932
Use cl/dl deduplication mechanism for ILIAS crawler
2021-05-25 12:15:38 +02:00
Joscha
bce3dc384d
Deduplicate path names in crawler
...
Also rename files so they follow the restrictions for windows file names if
we're on windows.
2021-05-25 12:11:15 +02:00