I-Al-Istannen
df3514cd03
Crawl paginated past meetings
2023-08-29 12:41:21 +02:00
I-Al-Istannen
87b67e9271
Crawl files in the info tab
2023-08-29 12:41:15 +02:00
I-Al-Istannen
b54b3b979c
Remove size suffix for content pages
2023-08-27 11:43:05 +02:00
I-Al-Istannen
2184ac8040
Add support for ILIAS mediacast listings
2023-08-27 11:43:05 +02:00
I-Al-Istannen
68c398f1fe
Add support for ILIAS learning modules
2023-08-02 13:34:54 +02:00
I-Al-Istannen
d204dac8ce
Detect unexpected root page redirects and abort operation
2023-07-29 18:36:33 +02:00
I-Al-Istannen
6f30c6583d
Fix crawling of cards without descriptions
2023-03-21 23:52:33 +01:00
I-Al-Istannen
467fc526e8
Fix crawling of file/video cards
2023-03-21 23:52:24 +01:00
I-Al-Istannen
722d2eb393
Fix crawling of courses with preselected timeline tab
2023-03-21 23:36:47 +01:00
c0derMo
55a2de6b88
Fix crawling English opencast
2022-11-29 18:13:56 +01:00
I-Al-Istannen
c020cccc64
Include found paths in "second path found" warning
2022-10-29 14:08:29 +02:00
I-Al-Istannen
1b6be6bd79
Handle content pages in cards
2022-10-24 18:37:26 +02:00
I-Al-Istannen
e1430e6298
Handle (and ignore) surveys
2022-10-24 18:37:26 +02:00
I-Al-Istannen
5fdd40204b
Unwrap future meetings when ILIAS hides them behind a pagination
2022-10-24 14:33:58 +02:00
I-Al-Istannen
d72fc2760b
Handle empty forums
2022-10-24 13:12:17 +02:00
I-Al-Istannen
4a51aaa4f5
Fix forum crawling crashing for empty threads
2022-10-19 22:59:33 +02:00
I-Al-Istannen
46fb782798
Add forum crawling
...
This downloads all forum posts when needed and saves each thread in its
own html file, named after the thread title.
2022-05-24 23:43:53 +02:00
I-Al-Istannen
846c29aee1
Download page descriptions
2022-05-11 21:16:56 +02:00
I-Al-Istannen
a5015fe9b1
Correctly parse day-only meeting dates
...
I failed to recognize the correct format in the previous adjustment, so
this (hopefully) fixes it for good.
Meetings apparently don't always have a time portion.
2022-05-08 23:22:26 +02:00
I-Al-Istannen
bcc537468c
Fix crawling of expanded meetings
...
The last meeting on every page is expanded by default.
Its content is then shown inline *and* in the meeting page itself.
We should skip the inline content.
2022-05-05 22:53:37 +02:00
I-Al-Istannen
694ffb4d77
Fix meeting date parsing
...
Apparently the new pattern "<relative time qualifier>: <date>," was
added. This patch adds support for it.
2022-05-05 22:28:30 +02:00
I-Al-Istannen
7872fe5221
Fix tables with more columns than expected
2022-01-18 22:38:48 +01:00
I-Al-Istannen
4ee919625d
Add rudimentary support for content pages
2022-01-08 20:47:35 +01:00
I-Al-Istannen
43c5453e10
Correctly crawl files on desktop
...
The files on the desktop do not include a download link, so we need to
rewrite it.
2022-01-08 20:00:53 +01:00
I-Al-Istannen
5f527bc697
Remove Python 3.9 Pattern typehints
2022-01-08 17:14:40 +01:00
I-Al-Istannen
ced8b9a2d0
Fix some accordions
2022-01-08 16:58:30 +01:00
I-Al-Istannen
6f3cfd4396
Fix personal desktop crawling
2022-01-08 16:58:15 +01:00
I-Al-Istannen
a99356f2a2
Fix video stream extraction
2022-01-08 00:27:34 +01:00
I-Al-Istannen
e42ab83d32
Add support for ILIAS cards
2021-10-30 18:13:44 +02:00
I-Al-Istannen
f9a3f9b9f2
Handle multi-stream videos
2021-10-30 18:12:29 +02:00
I-Al-Istannen
ee67f9f472
Sort elements by ILIAS id to ensure deterministic ordering
2021-07-06 17:45:48 +02:00
I-Al-Istannen
8ec3f41251
Crawl ilias booking objects as links
2021-07-06 16:15:25 +02:00
I-Al-Istannen
6e4d423c81
Crawl all video stages in one crawl bar
...
This ensures folders are not renamed, as they are crawled twice
2021-06-13 17:18:45 +02:00
I-Al-Istannen
70ec64a48b
Fix wrong base URL for multi-stage pages
2021-06-13 15:44:47 +02:00
I-Al-Istannen
8ab462fb87
Use the exercise label instead of the button name as path
2021-06-04 19:24:23 +02:00
I-Al-Istannen
1fba96abcb
Fix exercise date parsing for non-group submissions
...
ILIAS apparently changes the order of the fields as it sees fit, so we
now try to parse *every* column, starting at from the right, as a date.
The first column that parses successfully is then used.
2021-05-31 18:15:12 +02:00
I-Al-Istannen
1ca6740e05
Improve log messages when parsing ILIAS HTML
...
Previously some logs were split around an "await", which isn't a great
idea.
2021-05-27 17:59:22 +02:00
I-Al-Istannen
5beb4d9a2d
Fix renaming conflict with multi-stage video elements
2021-05-27 15:41:00 +02:00
Joscha
aabce764ac
Clean up TODOs
2021-05-25 15:54:01 +02:00
I-Al-Istannen
651b087932
Use cl/dl deduplication mechanism for ILIAS crawler
2021-05-25 12:15:38 +02:00
I-Al-Istannen
85f89a7ff3
Interpret accordions and expandable headers as virtual folders
...
This allows us to find a file named "Test" in an accordion "Acc" as "Acc/Test".
2021-05-24 18:54:26 +02:00
I-Al-Istannen
492ec6a932
Detect and skip ILIAS tests
2021-05-24 16:36:15 +02:00
I-Al-Istannen
342076ee0e
Handle exercise detail containers in ILIAS html parser
2021-05-24 16:22:51 +02:00
I-Al-Istannen
fca62541ca
De-duplicate element names in ILIAS crawler
...
This prevents any conflicts caused by multiple files with the same name.
Conflicts may still arise due to transforms, but that is out of our
control and a user error.
2021-05-24 00:24:31 +02:00
Joscha
2fdf24495b
Restructure crawling and auth related modules
2021-05-23 19:16:42 +02:00