Joscha
9889ce6b57
Improve PFERD error handling
2021-05-22 21:13:25 +02:00
Joscha
b4d97cd545
Improve output dir and report error handling
2021-05-22 20:54:42 +02:00
Joscha
afac22c562
Handle abort in exclusive output state correctly
...
If the event loop is stopped while something holds the exclusive output, the
"log" singleton is now reset so the main thread can print a few more messages
before exiting.
2021-05-22 18:58:19 +02:00
Joscha
552cd82802
Run async input and password getters in daemon thread
...
Previously, it ran in the event loop's default executor, which would block until
all its workers were done working.
If Ctrl+C was pressed while input or a password were being read, the
asyncio.run() call in the main thread would be interrupted however, not the
input thread. This meant that multiple key presses (either enter or a second
Ctrl+C) were necessary to stop a running PFERD in some circumstances.
This change instead runs the input functions in daemon threads so they exit as
soon as the main thread exits.
2021-05-22 18:37:53 +02:00
Joscha
dfde0e2310
Improve reporting of unexpected exceptions
2021-05-22 18:36:25 +02:00
Joscha
54dd2f8337
Clean up main and improve error handling
2021-05-22 16:47:24 +02:00
Joscha
b5785f260e
Extract CLI argument parsing to separate module
2021-05-22 15:03:45 +02:00
Joscha
98b8ca31fa
Add some todos
2021-05-22 14:45:46 +02:00
I-Al-Istannen
4b104b6252
Try out some HTTP authentication handling
...
This is by no means final yet and will change a bit once the dl and cl
are changed, but it might serve as a first try. It is also wholly
untested.
2021-05-21 12:02:51 +02:00
I-Al-Istannen
83d12fcf2d
Add some explains to ilias crawler and use crawler exceptions
2021-05-20 14:58:54 +02:00
I-Al-Istannen
e4f9560655
Only retry on aiohttp errors in ILIAS crawler
...
This patch removes quite a few retries and now only retries the ilias
element method. Every other HTTP-interacting method (except for the root
requests) is called from there and should be covered.
In the future we also want to retry the root a few times, but that
will be done after the download sink API is adjusted.
2021-05-19 22:01:09 +02:00
I-Al-Istannen
8cfa818f04
Only call should_crawl once
2021-05-19 21:57:55 +02:00
I-Al-Istannen
81301f3a76
Rename the ilias crawler to ilias web crawler
2021-05-19 21:41:17 +02:00
I-Al-Istannen
2976b4d352
Move ILIAS file templates to own file
2021-05-19 21:37:10 +02:00
I-Al-Istannen
9f03702e69
Split up ilias crawler in multiple files
...
The ilias crawler contained a crawler and an HTML parser, now they are
split in two.
2021-05-19 21:34:36 +02:00
Joscha
3300886120
Explain config file loading
2021-05-19 18:11:43 +02:00
Joscha
0d10752b5a
Configure explain log level via cli and config file
2021-05-19 17:50:10 +02:00
Joscha
92886fb8d8
Implement --version flag
2021-05-19 17:33:36 +02:00
Joscha
5916626399
Make noqua comment more specific
2021-05-19 17:16:59 +02:00
Joscha
a7c025fd86
Implement reusable FileSinkToken for OutputDirectory
2021-05-19 17:16:23 +02:00
Joscha
b7a999bc2e
Clean up crawler exceptions and (a)noncritical
2021-05-19 13:25:57 +02:00
Joscha
3851065500
Fix local crawler's download bars
...
Display the pure path instead of the local path.
2021-05-18 23:23:40 +02:00
Joscha
4b68fa771f
Move logging logic to singleton
...
- Renamed module and class because "conductor" didn't make a lot of sense
- Used singleton approach (there's only one stdout after all)
- Redesigned progress bars (now with download speed!)
2021-05-18 22:45:19 +02:00
I-Al-Istannen
1525aa15a6
Fix link template error and use indeterminate progress bar
2021-05-18 22:40:28 +02:00
I-Al-Istannen
db1219d4a9
Create a link file in ILIAS crawler
...
This allows us to crawl links and represent them in the file system.
Users can choose between an ILIAS-imitation (that optionally
auto-redirects) and a plain text variant.
2021-05-17 21:44:54 +02:00
I-Al-Istannen
b8efcc2ca5
Respect filters in ILIAS crawler
2021-05-17 21:30:26 +02:00
Joscha
0bae009189
Run formatting tools
2021-05-16 14:32:53 +02:00
I-Al-Istannen
8b76ebb3ef
Rename IliasCrawler to KitIliasCrawler
2021-05-16 13:28:06 +02:00
I-Al-Istannen
2b6235dc78
Fix pylint warnings (and 2 found bugs) in ILIAS crawler
2021-05-16 13:17:12 +02:00
I-Al-Istannen
1c226c31aa
Add some repeat annotations to the ILIAS crawler
2021-05-16 13:01:56 +02:00
I-Al-Istannen
9ec0d3e16a
Implement date-demangling in ILIAS crawler
2021-05-16 13:01:56 +02:00
I-Al-Istannen
cf6903d109
Retry crawling on I/O failure
2021-05-16 13:01:56 +02:00
Joscha
9fd356d290
Ensure tmp files are deleted
...
This doesn't seem to fix the case where an exception bubbles up to the top of
the event loop. It also doesn't seem to fix the case when a KeyboardInterrupt is
thrown, since that never makes its way into the event loop in the first place.
Both of these cases lead to the event loop stopping, which means that the tmp
file cleanup doesn't get executed even though it's inside a "with" or "finally".
2021-05-15 23:00:40 +02:00
Joscha
989032fe0c
Fix cookies getting deleted
2021-05-15 22:25:48 +02:00
Joscha
05573ccc53
Add fancy CLI options
2021-05-15 22:22:01 +02:00
I-Al-Istannen
c454fabc9d
Add support for exercises in ILIAS crawler
2021-05-15 21:40:17 +02:00
I-Al-Istannen
7d323ec62b
Implement video downloads in ilias crawler
2021-05-15 21:32:32 +02:00
I-Al-Istannen
c7494e32ce
Start implementing crawling in ILIAS crawler
...
The ilias crawler can now crawl quite a few filetypes, splits off
folders and crawls them concurrently.
2021-05-15 20:42:18 +02:00
I-Al-Istannen
1123c8884d
Implement an IliasPage
...
This allows PFERD to semantically understand ILIAS HTML and is the
foundation for the ILIAS crawler. This patch extends the ILIAS crawler
to crawl the personal desktop and print the elements on it.
2021-05-15 18:59:23 +02:00
Joscha
e1104f888d
Add tfa authenticator
2021-05-15 18:27:16 +02:00
Joscha
8c32da7f19
Let authenticators provide username and password separately
2021-05-15 18:27:03 +02:00
Joscha
d63494908d
Properly invalidate exceptions
...
The simple authenticator now properly invalidates its credentials. Also, the
invalidation functions have been given better names and documentation.
2021-05-15 17:37:05 +02:00
Joscha
b70b62cef5
Make crawler sections start with "crawl:"
...
Also, use only the part of the section name after the "crawl:" as the crawler's
output directory. Now, the implementation matches the documentation again
2021-05-15 17:24:37 +02:00
Joscha
868f486922
Rename local crawler path to target
2021-05-15 17:12:25 +02:00
I-Al-Istannen
b2a2b5999b
Implement ILIAS auth and crawl home page
...
This commit introduces the necessary machinery to authenticate with
ILIAS and crawl the home page.
It can't do much yet and just silently fetches the homepage.
2021-05-15 15:25:05 +02:00
Joscha
595de88d96
Fix authenticator and crawler names
...
Now, the "auth:" and "crawl:" parts are considered part of the name. This fixes
crawlers not being able to find their authenticators.
2021-05-15 15:25:05 +02:00
Joscha
a6fdf05ee9
Allow variable whitespace in arrow rules
2021-05-15 15:25:05 +02:00
Joscha
f897d7c2e1
Add name variants for all arrows
2021-05-15 15:25:05 +02:00
Joscha
b0f731bf84
Make crawlers use transformers
2021-05-15 15:25:05 +02:00
Joscha
302b8c0c34
Fix errors loading local crawler config
...
Apparently getint and getfloat may return a None even though this is not
mentioned in their type annotations.
2021-05-15 15:25:05 +02:00
Joscha
acd674f0a0
Change limiter logic
...
Now download tasks are a subset of all tasks.
2021-05-15 15:25:05 +02:00
Joscha
ed2e19a150
Add reasons for invalid values
2021-05-15 15:25:05 +02:00
Joscha
296a169dd3
Make limiter logic more complex
...
The limiter can now distinguish between crawl and download actions and has a
fancy slot system and delay logic.
2021-05-15 15:25:05 +02:00
Joscha
1591cb9197
Add options to slow down local crawler
...
These options are meant to make the local crawler behave more like a
network-based crawler for purposes of testing and debugging other parts of the
code base.
2021-05-15 15:25:01 +02:00
Joscha
0c9167512c
Fix output dir
...
I missed these while renaming the resolve function. Shame on me for not running
mypy earlier.
2021-05-14 21:28:38 +02:00
Joscha
a673ab0fae
Delete old files
...
I should've done this earlier
2021-05-14 21:27:44 +02:00
Joscha
6e5fdf4e9e
Set user agent to "pferd/<version>"
2021-05-14 21:27:44 +02:00
Joscha
93a5a94dab
Single-source version number
2021-05-14 21:27:44 +02:00
Joscha
d565df27b3
Add HttpCrawler
2021-05-13 22:28:14 +02:00
Joscha
e3ee4e515d
Disable highlighting of primitives
...
This commit prevents rich from highlighting python-looking syntax like numbers,
arrays, 'None' etc.
2021-05-13 19:47:44 +02:00
Joscha
94d6a01cca
Use file mtime in local crawler
2021-05-13 19:42:40 +02:00
Joscha
38bb66a776
Update file metadata in more cases
...
PFERD now not only updates file metadata when a file is successfully added or
changed, but also when a file is downloaded and then detected to be unchanged.
This could occur for example if a remote file's modification time was bumped,
possibly because somebody touched the file without changing it.
2021-05-13 19:40:10 +02:00
Joscha
68781a88ab
Fix asynchronous methods being not awaited
2021-05-13 19:39:49 +02:00
Joscha
910462bb72
Log stuff happening to files
2021-05-13 19:37:27 +02:00
Joscha
6bd6adb977
Fix tmp file names
2021-05-13 19:36:46 +02:00
Joscha
0acdee15a0
Let crawlers obtain authenticators
2021-05-13 18:57:20 +02:00
Joscha
c3ce6bb31c
Fix crawler cleanup not being awaited
2021-05-11 00:28:45 +02:00
Joscha
0459ed093e
Add simple authenticator
...
... including some required authenticator infrastructure
2021-05-11 00:28:03 +02:00
Joscha
d5f29f01c5
Use global conductor instance
...
The switch from crawler-local conductors to a single pferd-global conductor was
made to prepare for auth section credential providers.
2021-05-11 00:05:04 +02:00
Joscha
595ba8b7ab
Remove dummy crawler
2021-05-10 23:47:46 +02:00
Joscha
cec0a8e1fc
Fix mymy errors
2021-05-09 01:45:01 +02:00
Joscha
f9b2fd60e2
Document local crawler and auth
2021-05-09 01:33:47 +02:00
Joscha
60cd9873bc
Add local file crawler
2021-05-06 01:02:40 +02:00
Joscha
273d56c39a
Properly load crawler config
2021-05-05 23:45:10 +02:00
Joscha
5497dd2827
Add @noncritical and @repeat decorators
2021-05-05 23:36:54 +02:00
Joscha
bbfdadc463
Implement output directory
2021-05-05 18:08:34 +02:00
Joscha
07e831218e
Add sync report
2021-05-02 00:56:10 +02:00
Joscha
91c33596da
Load crawlers from config file
2021-04-30 16:22:14 +02:00
Joscha
e7a51decb0
Elaborate on transforms and implement changes
2021-04-29 20:24:18 +02:00
Joscha
f776186480
Use PurePath instead of Path
...
Path should only be used when we need to access the file system. For all other
purposes (mainly crawling), we use PurePath instead since the paths don't
correspond to paths in the local file system.
2021-04-29 20:20:25 +02:00
Joscha
0096d83387
Simplify Limiter implementation
2021-04-29 20:20:25 +02:00
Joscha
502654d853
Fix mypy errors
2021-04-29 15:47:52 +02:00
Joscha
d2103d7c44
Document crawler
2021-04-29 15:43:20 +02:00
Joscha
d96a361325
Test and fix exclusive output
2021-04-29 15:27:16 +02:00
Joscha
2e85d26b6b
Use conductor via context manager
2021-04-29 14:23:28 +02:00
Joscha
6431a3fb3d
Fix some mypy errors
2021-04-29 14:23:09 +02:00
Joscha
ac3bfd7388
Make progress bars easier to use
...
The crawler now supports two types of progress bars
2021-04-29 13:53:16 +02:00
Joscha
3ea86d18a0
Jerry-rig DummyCrawler to run
2021-04-29 13:45:04 +02:00
Joscha
bbc792f9fb
Implement Crawler and DummyCrawler
2021-04-29 13:44:29 +02:00
Joscha
7e127cd5cc
Clean up and fix conductor and limiter
...
Turns out you have to await an async lock, who knew...
2021-04-29 13:44:04 +02:00
Joscha
c4fb92c658
Make type hints compatible with Python 3.8
2021-04-29 13:11:58 +02:00
Joscha
a18db57e6f
Implement terminal conductor
2021-04-29 11:44:47 +02:00
Joscha
b915e393dd
Implement limiter
2021-04-29 10:24:28 +02:00
Joscha
3a74c23d09
Implement transformer
2021-04-29 09:51:50 +02:00
Joscha
fbebc46c58
Load and dump config
2021-04-29 09:51:50 +02:00
Joscha
5595a908d8
Configure entry point
2021-04-27 00:32:21 +02:00
I-Al-Istannen
29cd5d1a3c
Reflect totality of sanitize_windows_path in return type
2021-04-19 11:10:02 +02:00
I-Al-Istannen
1f2af3a290
Retry on more I/O Errors
2021-04-13 11:43:22 +02:00
I-Al-Istannen
14cdfb6a69
Fix typo in date demangler doc
2021-04-13 11:19:51 +02:00
I-Al-Istannen
946b7a7931
Also crawl .c/.java/.zip from IPD page
2021-02-09 12:30:59 +01:00
I-Al-Istannen
fb78a6e98e
Retry ILIAS downloads a few times and only fail that file
2021-01-06 13:08:10 +01:00
I-Al-Istannen
f0562049b6
Remove Python 3.9 method in crawler
2020-12-30 17:18:04 +01:00
I-Al-Istannen
c978e9edf4
Resolve a few pylint warnings
2020-12-30 14:45:46 +01:00
I-Al-Istannen
2714ac6be6
Send CSRF token to Shibboleth
2020-12-30 14:34:11 +01:00
I-Al-Istannen
9b048a9cfc
Canonize meeting names to a properly formatted date
2020-12-30 14:32:59 +01:00
I-Al-Istannen
f47b137b59
Fix ILIAS init.py and Pferd.py authenticators
2020-12-06 13:15:32 +01:00
Scriptim
83ea15ee83
Use system keyring service for password auth
2020-12-06 13:15:30 +01:00
I-Al-Istannen
0f5e55648b
Tell user when the conflict resolver kept existing files
2020-12-05 14:12:45 +01:00
I-Al-Istannen
4ce385b262
Treat file overwrite and marked file overwrite differently
2020-12-05 14:03:43 +01:00
I-Al-Istannen
fcb3884a8f
Add --remote-first, --local-first and --no-delete flags
2020-12-05 13:49:05 +01:00
I-Al-Istannen
9f6dc56a7b
Use a strategy to decide conflict resolution
2020-12-02 19:32:57 +01:00
Christophe
f3a4663491
Add passive/no_prompt flag
2020-12-02 18:24:07 +01:00
I-Al-Istannen
ba3c7f85fa
Replace "\" in ILIAS paths as well
...
I am not sure whether anybody really uses a backslash in their names,
but I guess it can't hurt to do this for windows users.
2020-11-19 19:37:28 +01:00
I-Al-Istannen
8ebf0eab16
Sort download summary
2020-11-17 21:36:04 +01:00
I-Al-Istannen
cd90a60dee
Move "sanitize_windows_path" to PFERD.transform
2020-11-12 20:52:46 +01:00
I-Al-Istannen
55e9e719ad
Sanitize "/" in ilias path names
2020-11-12 20:21:24 +01:00
I-Al-Istannen
316b9d7bf4
Prevent too many retries when fetching an ILIAS page
2020-11-04 22:23:56 +01:00
I-Al-Istannen
f830b42a36
Fix duplicate files in download summary
2020-11-04 21:49:35 +01:00
I-Al-Istannen
ef343dec7c
Merge organizer download summaries
2020-11-04 15:06:58 +01:00
I-Al-Istannen
0da2fafcd8
Fix links outside tables
2020-11-04 14:46:15 +01:00
I-Al-Istannen
f4abe3197c
Add ipd crawler
2020-11-03 21:15:40 +01:00
I-Al-Istannen
38d4f5b4c9
Do not fail only empty courses
2020-11-03 20:09:54 +01:00
I-Al-Istannen
73c3eb0984
Add option to skip videos in sync_url
2020-10-06 17:20:47 +02:00
I-Al-Istannen
c1ccb6c53e
Allow crawling videos with sync_url
2020-10-06 10:46:06 +02:00
I-Al-Istannen
51a713fa04
Allow crawling courses or folders with sync_url
...
Video folders do not work, if they are passed directly. Their containing
folder must be specified instead.
2020-09-28 20:00:01 +02:00
I-Al-Istannen
e32a49480b
Expose methods to look up course/element names by id / url
2020-09-28 19:16:52 +02:00
I-Al-Istannen
3f0ae729d6
Expand "is course" check to not download magazines or other weird things
2020-09-28 16:43:58 +02:00
I-Al-Istannen
55678d7fee
Pass string down to FileCookieJar
...
Some python versions just can't handle it *despite the documentation
stating they should*.
2020-08-12 09:09:14 +02:00
I-Al-Istannen
a57ee8b96b
Add timeout to video downloads to work around requests IPv6 bug
2020-08-11 14:40:30 +02:00
Joscha
77a109bb7e
Fix ilias shibboleth authenticator
...
The shibboleth site got a visual overhaul that slightly changed the classes of a
form we need.
2020-07-28 19:13:51 +00:00
I-Al-Istannen
a3e1864a26
Allow long paths on windows
...
If you start PFERD a few folders deep in your home directory, it is
quite easy to reach the maximum path length limit on Windows (260
chars). This patch opts in to long paths ("\\?\" prefix) which lift that
restriction at the cost of ugly path names.
2020-07-25 13:44:49 +02:00
I-Al-Istannen
77874b432b
Also add personal_desktop to download summary
2020-07-15 22:47:44 +02:00
I-Al-Istannen
5c4c785e60
Fix HTML file downloading
...
Previously PFERD thought any HTML file was a "Error, no access" page
when downloading. Now it checks whether ILIAS sends a
content-disposition header, telling the browser to download the file. If
that is the case, it was just a HTML file uploaded to ILIAS. If it has
no header, it is probably an error message.
2020-07-15 15:12:14 +02:00
I-Al-Istannen
2aed4f6d1f
Only query the dir_filter for directories
2020-07-13 13:36:12 +02:00
I-Al-Istannen
34152fbe54
Set mtime and atime to ILIAS dates where possible
2020-07-13 13:29:18 +02:00
I-Al-Istannen
c26c9352f1
Make DownloadSummary private, provide property accessors
2020-06-26 17:30:45 +02:00
I-Al-Istannen
d9ea688145
Use pretty logger for summaries
2020-06-26 17:24:36 +02:00
I-Al-Istannen
e4b1fac045
Satisfy pylint
2020-06-26 15:38:22 +02:00
Joscha
402ae81335
Fix type hints
2020-06-26 13:17:44 +00:00
Daniel Augustin
52f31e2783
Add type hints to DownloadSummary
2020-06-26 13:02:37 +02:00
Daniel Augustin
739522a151
Move download summary into a separate class
2020-06-25 23:07:11 +02:00
Daniel Augustin
6c034209b6
Add deleted files to summary
2020-06-25 22:00:28 +02:00
Daniel Augustin
f6fbd5e4bb
Add download summary
2020-06-25 19:19:34 +02:00
I-Al-Istannen
7024db1f13
Use transient progessbar
...
This will ensure no pesky newline ends up in the output, even on
windows.
2020-06-25 18:03:12 +02:00
I-Al-Istannen
23bfa42a0d
Never use the direct download button, as it is currently broken
2020-06-11 13:31:01 +02:00
I-Al-Istannen
fdb57884ed
Touch files with same content to update timestamps
2020-05-31 20:27:15 +02:00
I-Al-Istannen
8198c9ecaa
Reorder methods a bit
2020-05-30 19:06:36 +02:00
I-Al-Istannen
086b15d10f
Crawl a bit more iteratively
2020-05-30 15:47:15 +02:00
I-Al-Istannen
9d6ce331a5
Use IliasCrawlerEntry entries in the ilias scraper
2020-05-30 15:20:51 +02:00
I-Al-Istannen
821c7ade26
Move video url extraction logic to crawler
2020-05-30 00:22:31 +02:00
I-Al-Istannen
b969a1854a
Remove unneeded whitespace
2020-05-30 00:22:31 +02:00
I-Al-Istannen
62535b4452
Unpack videos in ILIAS downloader
2020-05-21 22:12:52 +02:00
I-Al-Istannen
c0056e5669
Correctly crawl video pages with multiple pages
2020-05-21 21:38:07 +02:00
I-Al-Istannen
03a801eecc
Correctly type hint swallow_and_print_errors decorator
2020-05-12 21:03:53 +02:00
Joscha
072c6630bf
Avoid logging import in config
2020-05-12 18:19:23 +00:00
I-Al-Istannen
4f56c8f192
Pass element type to ilias directory filter
2020-05-12 14:41:13 +02:00
I-Al-Istannen
4fdb67128d
Fetch correct diva playlist id
2020-05-11 00:25:34 +02:00
I-Al-Istannen
a0f9d31d94
Use PrettyLogger warning everywhere
2020-05-10 21:56:12 +02:00
I-Al-Istannen
e7b08420ba
Warn when a marked file is added again
2020-05-10 21:42:30 +02:00
I-Al-Istannen
c1b21f7772
Only remove a progress task when we added it
2020-05-10 12:28:30 +02:00
I-Al-Istannen
9850ab1d73
Allow crawling the ILIAS Personal Desktop
2020-05-10 12:16:42 +02:00
I-Al-Istannen
9950144e97
Allow passing a playlist URL to diva instead of an id
2020-05-10 11:17:13 +02:00
I-Al-Istannen
f6faacabb0
Move FatalException to errors.py
2020-05-09 00:11:21 +02:00
I-Al-Istannen
19c1e3ac6f
Fail on invalid ILIAS course ids
2020-05-09 00:11:20 +02:00
I-Al-Istannen
afa48c2d2d
Swallow and print errors instead of crashing
2020-05-09 00:10:54 +02:00
I-Al-Istannen
a4c518bf4c
Update date find regex
2020-05-08 22:17:58 +02:00
I-Al-Istannen
057135022f
Try to accept that life sometimes is in English
2020-05-08 22:10:43 +02:00
I-Al-Istannen
755e9aa0d3
Try to add support for Shibboleth TFA token
2020-05-08 21:52:51 +02:00
I-Al-Istannen
c9deca19ca
Remove walrus to lower needed python version
2020-05-08 21:21:33 +02:00
I-Al-Istannen
a0c5572b59
Fix progress bars swallowing a line when they shouldn't
2020-05-08 19:55:53 +02:00
I-Al-Istannen
2d20d2934c
Color warning differently
2020-05-08 19:52:45 +02:00
I-Al-Istannen
2c48ab66d4
Use rich for log colorization
2020-05-08 19:31:54 +02:00
I-Al-Istannen
56f2394001
Add a download progress bar
2020-05-08 17:09:56 +02:00
I-Al-Istannen
bee3d70998
Added a diva playlist downloader
2020-04-30 17:18:45 +02:00
I-Al-Istannen
42345ecc61
Demangle "Morgen" too
2020-04-30 12:05:25 +02:00
I-Al-Istannen
920d521d68
Change PrettyLogger.warn to PrettyLogger.warning
2020-04-25 20:11:51 +02:00
I-Al-Istannen
e0b46a306a
Use warn method in IliasCrawler
2020-04-25 20:07:40 +02:00
I-Al-Istannen
8a42a2a396
Move logging into its own file
2020-04-25 20:02:01 +02:00
I-Al-Istannen
80247400a4
Debug log when starting an ilias download
2020-04-25 13:02:07 +02:00
Joscha
1aaa6e7ab5
Use PathLike everywhere
2020-04-24 18:41:14 +00:00
Joscha
7f53543324
Satisfy pylint and add todo
2020-04-24 18:26:28 +00:00
Joscha
292e516297
Change crawler and downloader output
2020-04-24 18:24:44 +00:00
Joscha
8258fa8919
Add test run option to PFERD
2020-04-24 18:00:21 +00:00
Joscha
5b929f09a2
Move download strategies to downloader
...
Also fixes an issue where the downloader didn't mark files that were not
downloaded due to the strategy used.
2020-04-24 14:27:40 +00:00
Joscha
4d32f863bc
Clean up organizer after synchronizing
2020-04-24 14:17:23 +00:00
Joscha
4e7333b396
Allow specifying paths as strings in Pferd
2020-04-24 11:50:40 +00:00
I-Al-Istannen
4c0e3b493a
Use download_modified_or_new as default strategy
2020-04-24 13:48:06 +02:00
Joscha
2de079a5d3
Add a few Transform combinators
2020-04-24 11:35:46 +00:00
I-Al-Istannen
509e624d47
Satisfy pyling. Useful docstrings? Not quite sure.
2020-04-23 20:35:59 +02:00
I-Al-Istannen
980f69b5af
Fix organizer marking itself causing an error
2020-04-23 20:02:05 +02:00
I-Al-Istannen
0b00a9c26b
Log when starting to synchronize
2020-04-23 19:56:37 +02:00
Joscha
1ef85c45e5
Switch Transform to PurePath
2020-04-23 17:40:43 +00:00
Joscha
5ef5a56e69
Extract Location into separate file
2020-04-23 17:38:28 +00:00
I-Al-Istannen
f3f4be2690
More free functions
2020-04-23 19:21:49 +02:00
I-Al-Istannen
076b8c5a1f
Add download strategies to save bandwith
...
Only download files that are newer than the local version.
2020-04-23 18:29:20 +02:00
I-Al-Istannen
13bc78c889
Display reason for ignoring an element in ilias crawler
2020-04-23 13:54:58 +02:00
I-Al-Istannen
dc964a9d98
Remove finished TODOs
2020-04-23 13:30:34 +02:00
I-Al-Istannen
c2b14f3db9
ilias crawler: Use direct download link if possible
2020-04-23 13:08:12 +02:00
Joscha
4b59a7c375
Move around TODOs
2020-04-23 10:49:01 +00:00
I-Al-Istannen
bef210ae77
Rename and implement IliasDirectoryFilter
2020-04-23 12:35:18 +02:00
I-Al-Istannen
ea005517cf
Only remove folders if they exist in tmpdir
2020-04-23 12:09:45 +02:00
Joscha
df0eb84a44
Fix TmpDir and Location
...
TmpDir: Clean up before and after, not just after
Location: Resolve path so that parent check works properly
2020-04-23 09:50:32 +00:00
Joscha
2de4255a78
Add Pferd class
2020-04-23 09:50:32 +00:00
Joscha
3c808879c9
Add Transforms and Transformables
2020-04-22 18:25:09 +00:00
I-Al-Istannen
a051e3bcca
ilias crawler: Add some unhelpful documentation
2020-04-22 17:58:19 +02:00
I-Al-Istannen
eb7df036df
WIP: ilias crawler: Also crawl assignments
2020-04-22 14:32:20 +02:00
I-Al-Istannen
23db59e733
WIP: ilias-crawler: Demangle dates
2020-04-22 12:58:44 +02:00
I-Al-Istannen
ac65b06a8e
Satisfy pylint a bit
2020-04-22 01:37:34 +02:00
I-Al-Istannen
8891041069
WIP: crawler: Add opencast video crawler
2020-04-21 23:01:19 +02:00
I-Al-Istannen
70d63e3e90
WIP: Start small ILIAS crawler
2020-04-21 13:32:03 +02:00
I-Al-Istannen
b2a7af2e3e
Store modification_date in IliasDownloadInfo, remove parameters
2020-04-21 13:31:50 +02:00
I-Al-Istannen
23bed48c8c
Satisfy autopep8
2020-04-21 13:30:42 +02:00
Joscha
0926d33798
Use downloader-specific data classes
2020-04-20 18:07:45 +00:00
I-Al-Istannen
55ba2f4070
Fix pylint in downloaders
2020-04-20 19:49:15 +02:00
I-Al-Istannen
d18b48aaf4
Stream in http downloader
2020-04-20 19:45:25 +02:00
Joscha
4ef0ffe3bf
Listen to pylint and mypy
2020-04-20 17:44:58 +00:00
Joscha
ce77995c8f
Rename http downloader module
2020-04-20 17:08:51 +00:00
I-Al-Istannen
ed9245c14d
Remove old organizer
2020-04-20 18:50:23 +02:00
I-Al-Istannen
01e6972c96
Add ilias downloader
2020-04-20 18:49:01 +02:00
I-Al-Istannen
8181ae5b17
Guard http response in context manager
2020-04-20 18:47:46 +02:00
Joscha
6407190ae0
Soupify requests responses properly
2020-04-20 16:38:30 +00:00
I-Al-Istannen
87395faac2
Add base for simple HTTP downloader
2020-04-20 17:43:59 +02:00
I-Al-Istannen
a9e6e7883d
Create temp dir folder in constructor
2020-04-20 17:43:59 +02:00
Joscha
154d6b29dd
Listen to pylint
2020-04-20 15:16:22 +00:00
I-Al-Istannen
62ac569ec4
Revert "Add proposed crawler entry type"
...
This reverts commit 9f1a0a58ab
.
Each crawler will have its own data class.
2020-04-20 16:59:20 +02:00
I-Al-Istannen
9f1a0a58ab
Add proposed crawler entry type
2020-04-20 16:54:47 +02:00
Joscha
879a2c7c80
Rewrite ILIAS authenticator
2020-04-20 14:26:30 +00:00
Joscha
ff06c5215e
Fix authenticator
2020-04-20 14:26:29 +00:00
I-Al-Istannen
135a8dce4b
Fix resolve_path allowing paths outside its folder
...
This happened if the directory name was a prefix of the offending file name.
2020-04-20 16:07:14 +02:00
I-Al-Istannen
63bbcad918
Add resolve method to tmp_dir
2020-04-20 15:40:07 +02:00
I-Al-Istannen
6584d6a905
Elaborate accept_file in new_organizer
2020-04-20 15:40:07 +02:00
Joscha
5990098ef8
Add UserPassAuthenticator
2020-04-20 13:26:45 +00:00
I-Al-Istannen
f3d3d6bb65
Add some docs to cookie_jar
2020-04-20 14:38:03 +02:00
I-Al-Istannen
b2fe7cc064
Add preliminary logging to organizer and tmp_dir
2020-04-20 14:37:44 +02:00
I-Al-Istannen
930d821dd7
Add a simple organizer
2020-04-20 14:29:48 +02:00
I-Al-Istannen
5c2ff14839
Add "prompt_yes_no" to utils
2020-04-20 14:29:48 +02:00
I-Al-Istannen
a3d6dc7873
Clean up temp_folder
2020-04-20 14:29:48 +02:00
Joscha
53ad1c924b
Add cookie jar
2020-04-20 11:35:26 +00:00
I-Al-Istannen
8c431c7d81
Add a simple temporary folder
2020-04-20 12:08:52 +02:00
Joscha
d5dd5aac06
Fix some mypy errors
2020-04-20 01:54:47 +00:00
Joscha
25043a4aaa
Remove unnecessary files
...
Also document some plans for the new program structure in REWRITE.md
2020-04-19 19:49:43 +00:00
I-Al-Istannen
cf3553175f
Add OS_Exams synchronizer
2020-02-27 14:51:29 +01:00
I-Al-Istannen
bf8b3cf9f7
Hack in support for TI exams
...
This just adds an additional crawl check for AlteKlausuren. This is not
present on the root site but at the suffix `/Klausuren`.
Example config:
```py
# The "Klausur" needs to be copied verbatim!
ti.synchronize("Klausur", "sync dir name",
transform=ro_19_klausur_transform, filter=ro_19_klausur_filter)
```
2020-02-24 20:58:27 +01:00
I-Al-Istannen
f5bc49160f
Lose 50 minutes of my life (and fix the TGI tut)
2019-12-12 12:50:16 +01:00
I-Al-Istannen
4433696509
[TGI] Add TGi tut
2019-11-18 09:58:16 +01:00
I-Al-Istannen
1407c6d264
Download all TGI files and not just lectures
2019-10-17 22:14:32 +02:00
I-Al-Istannen
1973c931bd
Add support for other years in TGI downloader
2019-10-15 15:37:52 +02:00
I-Al-Istannen
458cc1c6d6
Add support for TGI website
2019-10-15 15:34:59 +02:00
Joscha
f94629a7fa
Fix exceptions with weird content types
...
(hopefully)
2019-09-22 11:55:47 +00:00
I-Al-Istannen
2752e98621
Fix relative url joining in ti downloader
2019-07-26 10:06:01 +02:00