Joscha
f9b2fd60e2
Document local crawler and auth
2021-05-09 01:33:47 +02:00
Joscha
60cd9873bc
Add local file crawler
2021-05-06 01:02:40 +02:00
Joscha
273d56c39a
Properly load crawler config
2021-05-05 23:45:10 +02:00
Joscha
5497dd2827
Add @noncritical and @repeat decorators
2021-05-05 23:36:54 +02:00
Joscha
bbfdadc463
Implement output directory
2021-05-05 18:08:34 +02:00
Joscha
07e831218e
Add sync report
2021-05-02 00:56:10 +02:00
Joscha
91c33596da
Load crawlers from config file
2021-04-30 16:22:14 +02:00
Joscha
e7a51decb0
Elaborate on transforms and implement changes
2021-04-29 20:24:18 +02:00
Joscha
f776186480
Use PurePath instead of Path
...
Path should only be used when we need to access the file system. For all other
purposes (mainly crawling), we use PurePath instead since the paths don't
correspond to paths in the local file system.
2021-04-29 20:20:25 +02:00
Joscha
0096d83387
Simplify Limiter implementation
2021-04-29 20:20:25 +02:00
Joscha
502654d853
Fix mypy errors
2021-04-29 15:47:52 +02:00
Joscha
d2103d7c44
Document crawler
2021-04-29 15:43:20 +02:00
Joscha
d96a361325
Test and fix exclusive output
2021-04-29 15:27:16 +02:00
Joscha
2e85d26b6b
Use conductor via context manager
2021-04-29 14:23:28 +02:00
Joscha
6431a3fb3d
Fix some mypy errors
2021-04-29 14:23:09 +02:00
Joscha
ac3bfd7388
Make progress bars easier to use
...
The crawler now supports two types of progress bars
2021-04-29 13:53:16 +02:00
Joscha
3ea86d18a0
Jerry-rig DummyCrawler to run
2021-04-29 13:45:04 +02:00
Joscha
bbc792f9fb
Implement Crawler and DummyCrawler
2021-04-29 13:44:29 +02:00
Joscha
7e127cd5cc
Clean up and fix conductor and limiter
...
Turns out you have to await an async lock, who knew...
2021-04-29 13:44:04 +02:00
Joscha
c4fb92c658
Make type hints compatible with Python 3.8
2021-04-29 13:11:58 +02:00
Joscha
a18db57e6f
Implement terminal conductor
2021-04-29 11:44:47 +02:00
Joscha
b915e393dd
Implement limiter
2021-04-29 10:24:28 +02:00
Joscha
3a74c23d09
Implement transformer
2021-04-29 09:51:50 +02:00
Joscha
fbebc46c58
Load and dump config
2021-04-29 09:51:50 +02:00
Joscha
5595a908d8
Configure entry point
2021-04-27 00:32:21 +02:00
I-Al-Istannen
29cd5d1a3c
Reflect totality of sanitize_windows_path in return type
2021-04-19 11:10:02 +02:00
I-Al-Istannen
1f2af3a290
Retry on more I/O Errors
2021-04-13 11:43:22 +02:00
I-Al-Istannen
14cdfb6a69
Fix typo in date demangler doc
2021-04-13 11:19:51 +02:00
I-Al-Istannen
946b7a7931
Also crawl .c/.java/.zip from IPD page
2021-02-09 12:30:59 +01:00
I-Al-Istannen
fb78a6e98e
Retry ILIAS downloads a few times and only fail that file
2021-01-06 13:08:10 +01:00
I-Al-Istannen
f0562049b6
Remove Python 3.9 method in crawler
2020-12-30 17:18:04 +01:00
I-Al-Istannen
c978e9edf4
Resolve a few pylint warnings
2020-12-30 14:45:46 +01:00
I-Al-Istannen
2714ac6be6
Send CSRF token to Shibboleth
2020-12-30 14:34:11 +01:00
I-Al-Istannen
9b048a9cfc
Canonize meeting names to a properly formatted date
2020-12-30 14:32:59 +01:00
I-Al-Istannen
f47b137b59
Fix ILIAS init.py and Pferd.py authenticators
2020-12-06 13:15:32 +01:00
Scriptim
83ea15ee83
Use system keyring service for password auth
2020-12-06 13:15:30 +01:00
I-Al-Istannen
0f5e55648b
Tell user when the conflict resolver kept existing files
2020-12-05 14:12:45 +01:00
I-Al-Istannen
4ce385b262
Treat file overwrite and marked file overwrite differently
2020-12-05 14:03:43 +01:00
I-Al-Istannen
fcb3884a8f
Add --remote-first, --local-first and --no-delete flags
2020-12-05 13:49:05 +01:00
I-Al-Istannen
9f6dc56a7b
Use a strategy to decide conflict resolution
2020-12-02 19:32:57 +01:00
Christophe
f3a4663491
Add passive/no_prompt flag
2020-12-02 18:24:07 +01:00
I-Al-Istannen
ba3c7f85fa
Replace "\" in ILIAS paths as well
...
I am not sure whether anybody really uses a backslash in their names,
but I guess it can't hurt to do this for windows users.
2020-11-19 19:37:28 +01:00
I-Al-Istannen
8ebf0eab16
Sort download summary
2020-11-17 21:36:04 +01:00
I-Al-Istannen
cd90a60dee
Move "sanitize_windows_path" to PFERD.transform
2020-11-12 20:52:46 +01:00
I-Al-Istannen
55e9e719ad
Sanitize "/" in ilias path names
2020-11-12 20:21:24 +01:00
I-Al-Istannen
316b9d7bf4
Prevent too many retries when fetching an ILIAS page
2020-11-04 22:23:56 +01:00
I-Al-Istannen
f830b42a36
Fix duplicate files in download summary
2020-11-04 21:49:35 +01:00
I-Al-Istannen
ef343dec7c
Merge organizer download summaries
2020-11-04 15:06:58 +01:00
I-Al-Istannen
0da2fafcd8
Fix links outside tables
2020-11-04 14:46:15 +01:00
I-Al-Istannen
f4abe3197c
Add ipd crawler
2020-11-03 21:15:40 +01:00