Commit Graph

578 Commits

Author SHA1 Message Date
I-Al-Istannen
6289938d7c Do not stop crawling files when encountering a CrawlWarning 2021-11-06 12:09:51 +01:00
I-Al-Istannen
13b8c3d9c6 Add regex option to config and CLI parser 2021-11-02 09:30:46 +01:00
I-Al-Istannen
88afe64a92 Refactor IPD crawler a bit 2021-11-02 01:25:01 +00:00
Julius Rüberg
6b2a657573 Fix IPD crawler for different subpages (#42)
This patch reworks the IPD crawler to support subpages which do not use
"/intern" for links and fetches the folder names from table headings.
2021-11-02 01:25:01 +00:00
Toorero
d6f38a61e1 Fixed minor spelling mistakes 2021-11-02 01:54:00 +01:00
I-Al-Istannen
ad3f4955f7 Update changelog 2021-10-30 18:14:39 +02:00
I-Al-Istannen
e42ab83d32 Add support for ILIAS cards 2021-10-30 18:13:44 +02:00
I-Al-Istannen
f9a3f9b9f2 Handle multi-stream videos 2021-10-30 18:12:29 +02:00
I-Al-Istannen
ef7d5ea2d3 Allow storing crawler-specific data in reports 2021-10-30 18:09:05 +02:00
lukasprobst
55ea304ff3 Disable interpolation of ConfigParser 2021-10-25 23:37:42 +02:00
Joscha
fee12b3d9e Fix changelog 2021-10-25 17:44:12 +00:00
I-Al-Istannen
6673077397 Add kit-ipd crawler 2021-10-21 13:20:21 +02:00
Joscha
742632ed8d Bump version to 3.2.0 2021-08-04 18:27:26 +00:00
Joscha
544d45cbc5 Catch non-critical exceptions at crawler top level 2021-07-13 15:42:11 +02:00
Joscha
86f79ff1f1 Update changelog 2021-07-07 15:23:58 +02:00
I-Al-Istannen
ee67f9f472 Sort elements by ILIAS id to ensure deterministic ordering 2021-07-06 17:45:48 +02:00
I-Al-Istannen
8ec3f41251 Crawl ilias booking objects as links 2021-07-06 16:15:25 +02:00
I-Al-Istannen
89be07d4d3 Use final crawl path in HTML parsing message 2021-07-03 17:05:48 +02:00
I-Al-Istannen
91200f3684 Fix nondeterministic name deduplication 2021-07-03 12:09:55 +02:00
Joscha
9ffd603357 Error when using multiple segments with -name->
Previously, PFERD just silently never matched the -name-> arrow. Now, it errors
when loading the config file.
2021-07-01 11:14:50 +02:00
Joscha
80eeb8fe97 Add --skip option 2021-07-01 11:02:21 +02:00
Joscha
75fde870c2 Bump version to 3.1.0 2021-06-13 17:23:18 +02:00
I-Al-Istannen
6e4d423c81 Crawl all video stages in one crawl bar
This ensures folders are not renamed, as they are crawled twice
2021-06-13 17:18:45 +02:00
Joscha
57aef26217 Fix name arrows
I seem to have (re-)implemented them incorrectly and never tested them.
2021-06-13 16:33:29 +02:00
I-Al-Istannen
70ec64a48b Fix wrong base URL for multi-stage pages 2021-06-13 15:44:47 +02:00
Joscha
70b33ecfd9 Add migration notes to changelog
Also clean up some other formatting for consistency
2021-06-13 15:06:50 +02:00
Joscha
601e4b936b Use new arrow logic in README example config 2021-06-12 15:00:52 +02:00
Joscha
a292c4c437 Add example for ">>" arrow heads 2021-06-12 14:57:29 +02:00
Joscha
bc65ea7ab6 Fix mypy complaining about missing type hints 2021-06-09 22:45:52 +02:00
Joscha
f28bbe6b0c Update transform rule documentation
It's still missing an example that uses rules with ">>" arrows.
2021-06-09 22:45:52 +02:00
Joscha
61d902d715 Overhaul transform logic
-re-> arrows now rename their parent directories (like -->) and don't require a
full match (like -exact->). Their old behaviour is available as -exact-re->.

Also, this change adds the ">>" arrow head, which modifies the current path and
continues to the next rule when it matches.
2021-06-09 22:45:52 +02:00
I-Al-Istannen
8ab462fb87 Use the exercise label instead of the button name as path 2021-06-04 19:24:23 +02:00
Joscha
df3ad3d890 Add 'skip' option to crawlers 2021-06-04 18:47:13 +02:00
Joscha
fc31100a0f Always use '/' as path separator for regex rules
Previously, regex-matching paths on windows would, in some cases, require four
backslashes ('\\\\') to escape a single path separator. That's just too much.

With this commit, regex transforms now use '/' instead of '\' as path separator,
meaning rules can more easily be shared between platforms (although they are not
guaranteed to be 100% compatible since on Windows, '\' is still recognized as a
path separator).

To make rules more intuitive to write, local relative paths are now also printed
with '/' as path separator on Windows. Since Windows also accepts '/' as path
separator, this change doesn't really affect other rules that parse their sides
as paths.
2021-06-04 18:12:45 +02:00
Joscha
31b6311e99 Remove incorrect tmp file explain message 2021-06-01 19:03:06 +02:00
Joscha
1fc8e9eb7a Document credential file authenticator config options 2021-06-01 10:01:14 +00:00
Joscha
85b9f45085 Bump version to 3.0.1 2021-06-01 09:49:30 +00:00
Joscha
f656e3ff34 Fix credential parsing 2021-06-01 09:18:17 +00:00
Joscha
e1bda94329 Load credential file from correct path 2021-06-01 09:18:08 +00:00
Joscha
f6b26f4ead Fix unexpected exception when credential file not found 2021-06-01 09:10:58 +00:00
Joscha
722970a255 Store cookies in text-based format
Using the stdlib's http.cookie module, cookies are now stored as one
"Set-Cookie" header per line. Previously, the aiohttp.CookieJar's save() and
load() methods were used (which use pickling).
2021-05-31 20:18:20 +00:00
Joscha
f40820c41f Warn if using concurrent tasks with kit-ilias-web 2021-05-31 20:18:20 +00:00
Joscha
49ad1b6e46 Clean up authenticator code formatting 2021-05-31 18:45:06 +02:00
Joscha
1ce32d2f18 Add CLI option for credential file auth to kit-ilias-web 2021-05-31 18:45:06 +02:00
Joscha
9d5ec84b91 Add credential file authenticator 2021-05-31 18:33:34 +02:00
I-Al-Istannen
1fba96abcb Fix exercise date parsing for non-group submissions
ILIAS apparently changes the order of the fields as it sees fit, so we
now try to parse *every* column, starting at from the right, as a date.
The first column that parses successfully is then used.
2021-05-31 18:15:12 +02:00
Joscha
921cec7ddc Bump version to 3.0.0 2021-05-31 12:49:04 +02:00
Joscha
7b062883f6 Use raw paths for --debug-transforms
Previously, the already-transformed paths were used, which meant that
--debug-transforms was cumbersome to use (as you had to remove all transforms
and crawl once before getting useful results).
2021-05-31 12:33:37 +02:00
Joscha
64a2960751 Align paths in status messages and progress bars
Also print "Ignored" when paths are ignored due to transforms
2021-05-31 12:32:42 +02:00
Joscha
17879a7f69 Print box around message for unexpected exceptions 2021-05-31 12:05:49 +02:00