Philipp Fruck
eeebb65405
add client_id and base_url to crawler section
...
those two arguments are required for generic ilias instances
2023-03-23 17:04:09 +01:00
Philipp Fruck
34a96da869
add dedicated ilias-web crawler command
2023-03-22 23:43:00 +01:00
Philipp Fruck
256cd14637
add common ilias config parser
...
common options should be shared between KIT and generic Ilias instances
2023-03-22 23:41:46 +01:00
Joscha
6d44aac278
Bump version to 3.4.3
2022-11-29 18:22:19 +01:00
c0derMo
55a2de6b88
Fix crawling English opencast
2022-11-29 18:13:56 +01:00
Joscha
c0d6d8b229
Use url after redirect for relative links
2022-11-21 18:10:45 +01:00
Joscha
635caa765d
Fix typo
...
Thanks, burg113
2022-11-15 17:17:57 +01:00
Pavel Zwerschke
e69b55b349
Add more unofficial package managers ( #66 )
2022-11-04 12:18:26 +01:00
Joscha
07200bbde5
Document ilias web crawler's forums option
2022-10-31 14:12:27 +01:00
I-Al-Istannen
c020cccc64
Include found paths in "second path found" warning
2022-10-29 14:08:29 +02:00
Joscha
259cfc20cc
Bump version to 3.4.2
2022-10-26 18:26:17 +02:00
Joscha
37b51a66d8
Update changelog
2022-10-26 18:22:37 +02:00
I-Al-Istannen
f47d2f11d8
Append trailing slash to kit-ipd links to ensure urljoin works as expected
2022-10-25 20:28:22 +02:00
I-Al-Istannen
1b6be6bd79
Handle content pages in cards
2022-10-24 18:37:26 +02:00
I-Al-Istannen
e1430e6298
Handle (and ignore) surveys
2022-10-24 18:37:26 +02:00
I-Al-Istannen
5fdd40204b
Unwrap future meetings when ILIAS hides them behind a pagination
2022-10-24 14:33:58 +02:00
I-Al-Istannen
fb4631ba18
Fix ilias background login
2022-10-24 13:13:36 +02:00
I-Al-Istannen
d72fc2760b
Handle empty forums
2022-10-24 13:12:17 +02:00
I-Al-Istannen
4a51aaa4f5
Fix forum crawling crashing for empty threads
2022-10-19 22:59:33 +02:00
Joscha
66a5b1ba02
Bump version to 3.4.1
2022-08-17 13:24:01 +02:00
I-Al-Istannen
aa5a3a10bc
Adjust changelog
2022-08-14 21:48:59 +02:00
I-Al-Istannen
d9b111cec2
Correctly nest description entries
2022-08-14 21:45:33 +02:00
I-Al-Istannen
345f52a1f6
Detect new login button
2022-08-14 21:41:29 +02:00
Joscha
ed24366aba
Add pass authenticator
2022-06-05 10:04:42 +02:00
I-Al-Istannen
46fb782798
Add forum crawling
...
This downloads all forum posts when needed and saves each thread in its
own html file, named after the thread title.
2022-05-24 23:43:53 +02:00
I-Al-Istannen
846c29aee1
Download page descriptions
2022-05-11 21:16:56 +02:00
I-Al-Istannen
a5015fe9b1
Correctly parse day-only meeting dates
...
I failed to recognize the correct format in the previous adjustment, so
this (hopefully) fixes it for good.
Meetings apparently don't always have a time portion.
2022-05-08 23:22:26 +02:00
Joscha
616b0480f7
Simplify IPD crawler link regex
2022-05-08 18:18:05 +02:00
I-Al-Istannen
2f0e04ce13
Adjust changelog
2022-05-05 22:57:55 +02:00
I-Al-Istannen
bcc537468c
Fix crawling of expanded meetings
...
The last meeting on every page is expanded by default.
Its content is then shown inline *and* in the meeting page itself.
We should skip the inline content.
2022-05-05 22:53:37 +02:00
I-Al-Istannen
694ffb4d77
Fix meeting date parsing
...
Apparently the new pattern "<relative time qualifier>: <date>," was
added. This patch adds support for it.
2022-05-05 22:28:30 +02:00
Joscha
af2cc1169a
Mention href for users of link_regex option
2022-05-05 14:36:03 +02:00
Joscha
bc3fa36637
Fix IPD crawler crashing on weird HTML comments
2022-05-05 14:35:42 +02:00
Joscha
afbd03f777
Fix docs
2022-05-05 14:35:42 +02:00
I-Al-Istannen
b8fe25c580
Add .cpp
to ipd link regex
2022-05-04 14:19:26 +02:00
Joscha
a241672726
Bump version to 3.4.0
2022-05-01 22:29:06 +02:00
Joscha
a8f76e9be7
Use utf-8 for credential file
2022-04-29 23:15:12 +02:00
Joscha
b56475450d
Use utf-8 for cookies
2022-04-29 23:12:41 +02:00
Joscha
aa74604d29
Use utf-8 for report
2022-04-29 23:11:27 +02:00
Joscha
d2e6d91880
Make PFERD executable via python -m
2022-04-27 22:52:50 +02:00
Joscha
602044ff1b
Fix mypy errors and add missing await
2022-04-27 22:52:50 +02:00
Joscha
31631fb409
Increase minimum python version to 3.9
2022-04-27 22:52:50 +02:00
I-Al-Istannen
00db348218
Update changelog
2022-04-27 22:03:52 +02:00
I-Al-Istannen
a709280cbf
Try to detect unsupported config file encoding
...
The encoding detection is quite rudimentary, but should detect the
default windows encoding in many cases.
2022-04-27 22:03:47 +02:00
I-Al-Istannen
a99ddaa0cc
Read and write config in UTF-8
2022-04-27 21:47:51 +02:00
Joscha
ba3d299c05
Fix changelog
2022-04-27 21:26:24 +02:00
Joscha
07a21f80a6
Link to unofficial packages
2022-04-27 21:15:33 +02:00
I-Al-Istannen
f17b9b68f4
Add shibboleth authentication fix to changelog
2022-04-27 14:01:40 +02:00
I-Al-Istannen
a2831fbea2
Fix shib authentication
...
Authentication failed previously if the shib session was still valid.
If Shibboleth gets a request and the session is still valid, it directly
responds without a second redirect.
2022-04-27 13:55:24 +02:00
I-Al-Istannen
da72863b47
Placate newer mypy
2022-04-03 13:19:08 +02:00