I-Al-Istannen
81301f3a76
Rename the ilias crawler to ilias web crawler
2021-05-19 21:41:17 +02:00
I-Al-Istannen
2976b4d352
Move ILIAS file templates to own file
2021-05-19 21:37:10 +02:00
I-Al-Istannen
9f03702e69
Split up ilias crawler in multiple files
...
The ilias crawler contained a crawler and an HTML parser, now they are
split in two.
2021-05-19 21:34:36 +02:00
Joscha
3300886120
Explain config file loading
2021-05-19 18:11:43 +02:00
Joscha
0d10752b5a
Configure explain log level via cli and config file
2021-05-19 17:50:10 +02:00
Joscha
92886fb8d8
Implement --version flag
2021-05-19 17:33:36 +02:00
Joscha
5916626399
Make noqua comment more specific
2021-05-19 17:16:59 +02:00
Joscha
a7c025fd86
Implement reusable FileSinkToken for OutputDirectory
2021-05-19 17:16:23 +02:00
Joscha
b7a999bc2e
Clean up crawler exceptions and (a)noncritical
2021-05-19 13:25:57 +02:00
Joscha
3851065500
Fix local crawler's download bars
...
Display the pure path instead of the local path.
2021-05-18 23:23:40 +02:00
Joscha
4b68fa771f
Move logging logic to singleton
...
- Renamed module and class because "conductor" didn't make a lot of sense
- Used singleton approach (there's only one stdout after all)
- Redesigned progress bars (now with download speed!)
2021-05-18 22:45:19 +02:00
I-Al-Istannen
1525aa15a6
Fix link template error and use indeterminate progress bar
2021-05-18 22:40:28 +02:00
I-Al-Istannen
db1219d4a9
Create a link file in ILIAS crawler
...
This allows us to crawl links and represent them in the file system.
Users can choose between an ILIAS-imitation (that optionally
auto-redirects) and a plain text variant.
2021-05-17 21:44:54 +02:00
I-Al-Istannen
b8efcc2ca5
Respect filters in ILIAS crawler
2021-05-17 21:30:26 +02:00
Joscha
0bae009189
Run formatting tools
2021-05-16 14:32:53 +02:00
I-Al-Istannen
8b76ebb3ef
Rename IliasCrawler to KitIliasCrawler
2021-05-16 13:28:06 +02:00
I-Al-Istannen
2b6235dc78
Fix pylint warnings (and 2 found bugs) in ILIAS crawler
2021-05-16 13:17:12 +02:00
I-Al-Istannen
1c226c31aa
Add some repeat annotations to the ILIAS crawler
2021-05-16 13:01:56 +02:00
I-Al-Istannen
9ec0d3e16a
Implement date-demangling in ILIAS crawler
2021-05-16 13:01:56 +02:00
I-Al-Istannen
cf6903d109
Retry crawling on I/O failure
2021-05-16 13:01:56 +02:00
Joscha
9fd356d290
Ensure tmp files are deleted
...
This doesn't seem to fix the case where an exception bubbles up to the top of
the event loop. It also doesn't seem to fix the case when a KeyboardInterrupt is
thrown, since that never makes its way into the event loop in the first place.
Both of these cases lead to the event loop stopping, which means that the tmp
file cleanup doesn't get executed even though it's inside a "with" or "finally".
2021-05-15 23:00:40 +02:00
Joscha
989032fe0c
Fix cookies getting deleted
2021-05-15 22:25:48 +02:00
Joscha
05573ccc53
Add fancy CLI options
2021-05-15 22:22:01 +02:00
I-Al-Istannen
c454fabc9d
Add support for exercises in ILIAS crawler
2021-05-15 21:40:17 +02:00
I-Al-Istannen
7d323ec62b
Implement video downloads in ilias crawler
2021-05-15 21:32:32 +02:00
I-Al-Istannen
c7494e32ce
Start implementing crawling in ILIAS crawler
...
The ilias crawler can now crawl quite a few filetypes, splits off
folders and crawls them concurrently.
2021-05-15 20:42:18 +02:00
I-Al-Istannen
1123c8884d
Implement an IliasPage
...
This allows PFERD to semantically understand ILIAS HTML and is the
foundation for the ILIAS crawler. This patch extends the ILIAS crawler
to crawl the personal desktop and print the elements on it.
2021-05-15 18:59:23 +02:00
Joscha
e1104f888d
Add tfa authenticator
2021-05-15 18:27:16 +02:00
Joscha
8c32da7f19
Let authenticators provide username and password separately
2021-05-15 18:27:03 +02:00
Joscha
d63494908d
Properly invalidate exceptions
...
The simple authenticator now properly invalidates its credentials. Also, the
invalidation functions have been given better names and documentation.
2021-05-15 17:37:05 +02:00
Joscha
b70b62cef5
Make crawler sections start with "crawl:"
...
Also, use only the part of the section name after the "crawl:" as the crawler's
output directory. Now, the implementation matches the documentation again
2021-05-15 17:24:37 +02:00
Joscha
868f486922
Rename local crawler path to target
2021-05-15 17:12:25 +02:00
I-Al-Istannen
b2a2b5999b
Implement ILIAS auth and crawl home page
...
This commit introduces the necessary machinery to authenticate with
ILIAS and crawl the home page.
It can't do much yet and just silently fetches the homepage.
2021-05-15 15:25:05 +02:00
Joscha
595de88d96
Fix authenticator and crawler names
...
Now, the "auth:" and "crawl:" parts are considered part of the name. This fixes
crawlers not being able to find their authenticators.
2021-05-15 15:25:05 +02:00
Joscha
a6fdf05ee9
Allow variable whitespace in arrow rules
2021-05-15 15:25:05 +02:00
Joscha
f897d7c2e1
Add name variants for all arrows
2021-05-15 15:25:05 +02:00
Joscha
b0f731bf84
Make crawlers use transformers
2021-05-15 15:25:05 +02:00
Joscha
302b8c0c34
Fix errors loading local crawler config
...
Apparently getint and getfloat may return a None even though this is not
mentioned in their type annotations.
2021-05-15 15:25:05 +02:00
Joscha
acd674f0a0
Change limiter logic
...
Now download tasks are a subset of all tasks.
2021-05-15 15:25:05 +02:00
Joscha
ed2e19a150
Add reasons for invalid values
2021-05-15 15:25:05 +02:00
Joscha
296a169dd3
Make limiter logic more complex
...
The limiter can now distinguish between crawl and download actions and has a
fancy slot system and delay logic.
2021-05-15 15:25:05 +02:00
Joscha
1591cb9197
Add options to slow down local crawler
...
These options are meant to make the local crawler behave more like a
network-based crawler for purposes of testing and debugging other parts of the
code base.
2021-05-15 15:25:01 +02:00
Joscha
0c9167512c
Fix output dir
...
I missed these while renaming the resolve function. Shame on me for not running
mypy earlier.
2021-05-14 21:28:38 +02:00
Joscha
a673ab0fae
Delete old files
...
I should've done this earlier
2021-05-14 21:27:44 +02:00
Joscha
6e5fdf4e9e
Set user agent to "pferd/<version>"
2021-05-14 21:27:44 +02:00
Joscha
93a5a94dab
Single-source version number
2021-05-14 21:27:44 +02:00
Joscha
d565df27b3
Add HttpCrawler
2021-05-13 22:28:14 +02:00
Joscha
e3ee4e515d
Disable highlighting of primitives
...
This commit prevents rich from highlighting python-looking syntax like numbers,
arrays, 'None' etc.
2021-05-13 19:47:44 +02:00
Joscha
94d6a01cca
Use file mtime in local crawler
2021-05-13 19:42:40 +02:00
Joscha
38bb66a776
Update file metadata in more cases
...
PFERD now not only updates file metadata when a file is successfully added or
changed, but also when a file is downloaded and then detected to be unchanged.
This could occur for example if a remote file's modification time was bumped,
possibly because somebody touched the file without changing it.
2021-05-13 19:40:10 +02:00