Joscha
cec0a8e1fc
Fix mymy errors
2021-05-09 01:45:01 +02:00
Joscha
f9b2fd60e2
Document local crawler and auth
2021-05-09 01:33:47 +02:00
Joscha
60cd9873bc
Add local file crawler
2021-05-06 01:02:40 +02:00
Joscha
273d56c39a
Properly load crawler config
2021-05-05 23:45:10 +02:00
Joscha
5497dd2827
Add @noncritical and @repeat decorators
2021-05-05 23:36:54 +02:00
Joscha
bbfdadc463
Implement output directory
2021-05-05 18:08:34 +02:00
Joscha
07e831218e
Add sync report
2021-05-02 00:56:10 +02:00
Joscha
91c33596da
Load crawlers from config file
2021-04-30 16:22:14 +02:00
Joscha
e7a51decb0
Elaborate on transforms and implement changes
2021-04-29 20:24:18 +02:00
Joscha
f776186480
Use PurePath instead of Path
...
Path should only be used when we need to access the file system. For all other
purposes (mainly crawling), we use PurePath instead since the paths don't
correspond to paths in the local file system.
2021-04-29 20:20:25 +02:00
Joscha
0096d83387
Simplify Limiter implementation
2021-04-29 20:20:25 +02:00
Joscha
502654d853
Fix mypy errors
2021-04-29 15:47:52 +02:00
Joscha
d2103d7c44
Document crawler
2021-04-29 15:43:20 +02:00
Joscha
d96a361325
Test and fix exclusive output
2021-04-29 15:27:16 +02:00
Joscha
2e85d26b6b
Use conductor via context manager
2021-04-29 14:23:28 +02:00
Joscha
6431a3fb3d
Fix some mypy errors
2021-04-29 14:23:09 +02:00
Joscha
ac3bfd7388
Make progress bars easier to use
...
The crawler now supports two types of progress bars
2021-04-29 13:53:16 +02:00
Joscha
3ea86d18a0
Jerry-rig DummyCrawler to run
2021-04-29 13:45:04 +02:00
Joscha
bbc792f9fb
Implement Crawler and DummyCrawler
2021-04-29 13:44:29 +02:00
Joscha
7e127cd5cc
Clean up and fix conductor and limiter
...
Turns out you have to await an async lock, who knew...
2021-04-29 13:44:04 +02:00
Joscha
c4fb92c658
Make type hints compatible with Python 3.8
2021-04-29 13:11:58 +02:00
Joscha
a18db57e6f
Implement terminal conductor
2021-04-29 11:44:47 +02:00
Joscha
b915e393dd
Implement limiter
2021-04-29 10:24:28 +02:00
Joscha
3a74c23d09
Implement transformer
2021-04-29 09:51:50 +02:00
Joscha
fbebc46c58
Load and dump config
2021-04-29 09:51:50 +02:00
Joscha
5595a908d8
Configure entry point
2021-04-27 00:32:21 +02:00
I-Al-Istannen
29cd5d1a3c
Reflect totality of sanitize_windows_path in return type
2021-04-19 11:10:02 +02:00
I-Al-Istannen
1f2af3a290
Retry on more I/O Errors
2021-04-13 11:43:22 +02:00
I-Al-Istannen
14cdfb6a69
Fix typo in date demangler doc
2021-04-13 11:19:51 +02:00
I-Al-Istannen
946b7a7931
Also crawl .c/.java/.zip from IPD page
2021-02-09 12:30:59 +01:00
I-Al-Istannen
fb78a6e98e
Retry ILIAS downloads a few times and only fail that file
2021-01-06 13:08:10 +01:00
I-Al-Istannen
f0562049b6
Remove Python 3.9 method in crawler
2020-12-30 17:18:04 +01:00
I-Al-Istannen
c978e9edf4
Resolve a few pylint warnings
2020-12-30 14:45:46 +01:00
I-Al-Istannen
2714ac6be6
Send CSRF token to Shibboleth
2020-12-30 14:34:11 +01:00
I-Al-Istannen
9b048a9cfc
Canonize meeting names to a properly formatted date
2020-12-30 14:32:59 +01:00
I-Al-Istannen
f47b137b59
Fix ILIAS init.py and Pferd.py authenticators
2020-12-06 13:15:32 +01:00
Scriptim
83ea15ee83
Use system keyring service for password auth
2020-12-06 13:15:30 +01:00
I-Al-Istannen
0f5e55648b
Tell user when the conflict resolver kept existing files
2020-12-05 14:12:45 +01:00
I-Al-Istannen
4ce385b262
Treat file overwrite and marked file overwrite differently
2020-12-05 14:03:43 +01:00
I-Al-Istannen
fcb3884a8f
Add --remote-first, --local-first and --no-delete flags
2020-12-05 13:49:05 +01:00
I-Al-Istannen
9f6dc56a7b
Use a strategy to decide conflict resolution
2020-12-02 19:32:57 +01:00
Christophe
f3a4663491
Add passive/no_prompt flag
2020-12-02 18:24:07 +01:00
I-Al-Istannen
ba3c7f85fa
Replace "\" in ILIAS paths as well
...
I am not sure whether anybody really uses a backslash in their names,
but I guess it can't hurt to do this for windows users.
2020-11-19 19:37:28 +01:00
I-Al-Istannen
8ebf0eab16
Sort download summary
2020-11-17 21:36:04 +01:00
I-Al-Istannen
cd90a60dee
Move "sanitize_windows_path" to PFERD.transform
2020-11-12 20:52:46 +01:00
I-Al-Istannen
55e9e719ad
Sanitize "/" in ilias path names
2020-11-12 20:21:24 +01:00
I-Al-Istannen
316b9d7bf4
Prevent too many retries when fetching an ILIAS page
2020-11-04 22:23:56 +01:00
I-Al-Istannen
f830b42a36
Fix duplicate files in download summary
2020-11-04 21:49:35 +01:00
I-Al-Istannen
ef343dec7c
Merge organizer download summaries
2020-11-04 15:06:58 +01:00
I-Al-Istannen
0da2fafcd8
Fix links outside tables
2020-11-04 14:46:15 +01:00
I-Al-Istannen
f4abe3197c
Add ipd crawler
2020-11-03 21:15:40 +01:00
I-Al-Istannen
38d4f5b4c9
Do not fail only empty courses
2020-11-03 20:09:54 +01:00
I-Al-Istannen
73c3eb0984
Add option to skip videos in sync_url
2020-10-06 17:20:47 +02:00
I-Al-Istannen
c1ccb6c53e
Allow crawling videos with sync_url
2020-10-06 10:46:06 +02:00
I-Al-Istannen
51a713fa04
Allow crawling courses or folders with sync_url
...
Video folders do not work, if they are passed directly. Their containing
folder must be specified instead.
2020-09-28 20:00:01 +02:00
I-Al-Istannen
e32a49480b
Expose methods to look up course/element names by id / url
2020-09-28 19:16:52 +02:00
I-Al-Istannen
3f0ae729d6
Expand "is course" check to not download magazines or other weird things
2020-09-28 16:43:58 +02:00
I-Al-Istannen
55678d7fee
Pass string down to FileCookieJar
...
Some python versions just can't handle it *despite the documentation
stating they should*.
2020-08-12 09:09:14 +02:00
I-Al-Istannen
a57ee8b96b
Add timeout to video downloads to work around requests IPv6 bug
2020-08-11 14:40:30 +02:00
Joscha
77a109bb7e
Fix ilias shibboleth authenticator
...
The shibboleth site got a visual overhaul that slightly changed the classes of a
form we need.
2020-07-28 19:13:51 +00:00
I-Al-Istannen
a3e1864a26
Allow long paths on windows
...
If you start PFERD a few folders deep in your home directory, it is
quite easy to reach the maximum path length limit on Windows (260
chars). This patch opts in to long paths ("\\?\" prefix) which lift that
restriction at the cost of ugly path names.
2020-07-25 13:44:49 +02:00
I-Al-Istannen
77874b432b
Also add personal_desktop to download summary
2020-07-15 22:47:44 +02:00
I-Al-Istannen
5c4c785e60
Fix HTML file downloading
...
Previously PFERD thought any HTML file was a "Error, no access" page
when downloading. Now it checks whether ILIAS sends a
content-disposition header, telling the browser to download the file. If
that is the case, it was just a HTML file uploaded to ILIAS. If it has
no header, it is probably an error message.
2020-07-15 15:12:14 +02:00
I-Al-Istannen
2aed4f6d1f
Only query the dir_filter for directories
2020-07-13 13:36:12 +02:00
I-Al-Istannen
34152fbe54
Set mtime and atime to ILIAS dates where possible
2020-07-13 13:29:18 +02:00
I-Al-Istannen
c26c9352f1
Make DownloadSummary private, provide property accessors
2020-06-26 17:30:45 +02:00
I-Al-Istannen
d9ea688145
Use pretty logger for summaries
2020-06-26 17:24:36 +02:00
I-Al-Istannen
e4b1fac045
Satisfy pylint
2020-06-26 15:38:22 +02:00
Joscha
402ae81335
Fix type hints
2020-06-26 13:17:44 +00:00
Daniel Augustin
52f31e2783
Add type hints to DownloadSummary
2020-06-26 13:02:37 +02:00
Daniel Augustin
739522a151
Move download summary into a separate class
2020-06-25 23:07:11 +02:00
Daniel Augustin
6c034209b6
Add deleted files to summary
2020-06-25 22:00:28 +02:00
Daniel Augustin
f6fbd5e4bb
Add download summary
2020-06-25 19:19:34 +02:00
I-Al-Istannen
7024db1f13
Use transient progessbar
...
This will ensure no pesky newline ends up in the output, even on
windows.
2020-06-25 18:03:12 +02:00
I-Al-Istannen
23bfa42a0d
Never use the direct download button, as it is currently broken
2020-06-11 13:31:01 +02:00
I-Al-Istannen
fdb57884ed
Touch files with same content to update timestamps
2020-05-31 20:27:15 +02:00
I-Al-Istannen
8198c9ecaa
Reorder methods a bit
2020-05-30 19:06:36 +02:00
I-Al-Istannen
086b15d10f
Crawl a bit more iteratively
2020-05-30 15:47:15 +02:00
I-Al-Istannen
9d6ce331a5
Use IliasCrawlerEntry entries in the ilias scraper
2020-05-30 15:20:51 +02:00
I-Al-Istannen
821c7ade26
Move video url extraction logic to crawler
2020-05-30 00:22:31 +02:00
I-Al-Istannen
b969a1854a
Remove unneeded whitespace
2020-05-30 00:22:31 +02:00
I-Al-Istannen
62535b4452
Unpack videos in ILIAS downloader
2020-05-21 22:12:52 +02:00
I-Al-Istannen
c0056e5669
Correctly crawl video pages with multiple pages
2020-05-21 21:38:07 +02:00
I-Al-Istannen
03a801eecc
Correctly type hint swallow_and_print_errors decorator
2020-05-12 21:03:53 +02:00
Joscha
072c6630bf
Avoid logging import in config
2020-05-12 18:19:23 +00:00
I-Al-Istannen
4f56c8f192
Pass element type to ilias directory filter
2020-05-12 14:41:13 +02:00
I-Al-Istannen
4fdb67128d
Fetch correct diva playlist id
2020-05-11 00:25:34 +02:00
I-Al-Istannen
a0f9d31d94
Use PrettyLogger warning everywhere
2020-05-10 21:56:12 +02:00
I-Al-Istannen
e7b08420ba
Warn when a marked file is added again
2020-05-10 21:42:30 +02:00
I-Al-Istannen
c1b21f7772
Only remove a progress task when we added it
2020-05-10 12:28:30 +02:00
I-Al-Istannen
9850ab1d73
Allow crawling the ILIAS Personal Desktop
2020-05-10 12:16:42 +02:00
I-Al-Istannen
9950144e97
Allow passing a playlist URL to diva instead of an id
2020-05-10 11:17:13 +02:00
I-Al-Istannen
f6faacabb0
Move FatalException to errors.py
2020-05-09 00:11:21 +02:00
I-Al-Istannen
19c1e3ac6f
Fail on invalid ILIAS course ids
2020-05-09 00:11:20 +02:00
I-Al-Istannen
afa48c2d2d
Swallow and print errors instead of crashing
2020-05-09 00:10:54 +02:00
I-Al-Istannen
a4c518bf4c
Update date find regex
2020-05-08 22:17:58 +02:00
I-Al-Istannen
057135022f
Try to accept that life sometimes is in English
2020-05-08 22:10:43 +02:00
I-Al-Istannen
755e9aa0d3
Try to add support for Shibboleth TFA token
2020-05-08 21:52:51 +02:00
I-Al-Istannen
c9deca19ca
Remove walrus to lower needed python version
2020-05-08 21:21:33 +02:00
I-Al-Istannen
a0c5572b59
Fix progress bars swallowing a line when they shouldn't
2020-05-08 19:55:53 +02:00