Commit Graph

330 Commits

Author SHA1 Message Date
273d56c39a Properly load crawler config 2021-05-05 23:45:10 +02:00
5497dd2827 Add @noncritical and @repeat decorators 2021-05-05 23:36:54 +02:00
bbfdadc463 Implement output directory 2021-05-05 18:08:34 +02:00
fde811ae5a Document on_conflict option 2021-05-05 12:24:35 +02:00
07e831218e Add sync report 2021-05-02 00:56:10 +02:00
91c33596da Load crawlers from config file 2021-04-30 16:22:14 +02:00
a8dcf941b9 Document possible redownload settings 2021-04-30 15:32:56 +02:00
e7a51decb0 Elaborate on transforms and implement changes 2021-04-29 20:24:18 +02:00
9ec19be113 Document config file format 2021-04-29 20:24:18 +02:00
f776186480 Use PurePath instead of Path
Path should only be used when we need to access the file system. For all other
purposes (mainly crawling), we use PurePath instead since the paths don't
correspond to paths in the local file system.
2021-04-29 20:20:25 +02:00
0096d83387 Simplify Limiter implementation 2021-04-29 20:20:25 +02:00
20a24dbcbf Add changelog 2021-04-29 20:20:25 +02:00
502654d853 Fix mypy errors 2021-04-29 15:47:52 +02:00
d2103d7c44 Document crawler 2021-04-29 15:43:20 +02:00
d96a361325 Test and fix exclusive output 2021-04-29 15:27:16 +02:00
2e85d26b6b Use conductor via context manager 2021-04-29 14:23:28 +02:00
6431a3fb3d Fix some mypy errors 2021-04-29 14:23:09 +02:00
ac3bfd7388 Make progress bars easier to use
The crawler now supports two types of progress bars
2021-04-29 13:53:16 +02:00
3ea86d18a0 Jerry-rig DummyCrawler to run 2021-04-29 13:45:04 +02:00
bbc792f9fb Implement Crawler and DummyCrawler 2021-04-29 13:44:29 +02:00
7e127cd5cc Clean up and fix conductor and limiter
Turns out you have to await an async lock, who knew...
2021-04-29 13:44:04 +02:00
c4fb92c658 Make type hints compatible with Python 3.8 2021-04-29 13:11:58 +02:00
8da1ac6cee Extend mypy config 2021-04-29 11:44:47 +02:00
a18db57e6f Implement terminal conductor 2021-04-29 11:44:47 +02:00
b915e393dd Implement limiter 2021-04-29 10:24:28 +02:00
3a74c23d09 Implement transformer 2021-04-29 09:51:50 +02:00
fbebc46c58 Load and dump config 2021-04-29 09:51:50 +02:00
5595a908d8 Configure entry point 2021-04-27 00:32:21 +02:00
27e4abcfa3 Do project setup from scratch
Following guidelines from the Python Packaging User Guide [1].

This commit intentionally breaks the .gitignore, project dependencies, GitHub
Actions and other stuff. It also removes almost the entire README. The intention
behind this is to get rid of all cruft that as accumulated over time and to have
a fresh start. Only necessary things will be re-added as they're needed.

From now on, I also plan on adding documentation for every feature at the same
time that the feature is implemented. This is to ensure that the documentation
does not become outdated.

[1]: https://packaging.python.org/
2021-04-27 00:07:54 +02:00
c1ab7485e2 Bump version to 2.6.1 v2.6.1 2021-04-19 11:21:56 +02:00
29cd5d1a3c Reflect totality of sanitize_windows_path in return type 2021-04-19 11:10:02 +02:00
6d5d9333ad Force folder to be file-system path 2021-04-19 11:07:25 +02:00
7cc40595dc Allow synchronizing to directory "." 2021-04-14 20:25:25 +02:00
80ae5ddfaa Bump version to v2.6.0 v2.6.0 2021-04-14 19:47:41 +02:00
4f480d117e Install keyring in CI 2021-04-14 19:24:05 +02:00
1f2af3a290 Retry on more I/O Errors 2021-04-13 11:43:22 +02:00
14cdfb6a69 Fix typo in date demangler doc 2021-04-13 11:19:51 +02:00
e2bf84392b [sync_url] Properly declare "no-videos" as flag 2021-04-08 18:12:27 +02:00
946b7a7931 Also crawl .c/.java/.zip from IPD page 2021-02-09 12:30:59 +01:00
9a9018751e Bump version v2.5.4 2021-02-06 22:54:05 +01:00
83b75e8254 syncurl: Sanitize element name on windows if it is used as folder name
Otherwise the name of the course might not be a invalid file name.
2021-02-06 22:53:26 +01:00
35c3fa205d Fixed description of activating venv (#22)
Add 'source' to the venv activate command in the readme

`source` was picked over `.` to conform to the python recommendation
(https://docs.python.org/3/library/venv.html#module-venv).

This patch also adds the `egg-info` you get when building to the
gitignore.
2021-01-28 21:24:09 +01:00
0b606f02fa Bump version v2.5.3 2021-01-17 10:33:10 +01:00
fb78a6e98e Retry ILIAS downloads a few times and only fail that file 2021-01-06 13:08:10 +01:00
5de68a0400 Bump version v2.5.2 2020-12-30 17:20:30 +01:00
f0562049b6 Remove Python 3.9 method in crawler 2020-12-30 17:18:04 +01:00
0e1077bb50 Bump version v2.5.1 2020-12-30 14:50:49 +01:00
c978e9edf4 Resolve a few pylint warnings 2020-12-30 14:45:46 +01:00
2714ac6be6 Send CSRF token to Shibboleth 2020-12-30 14:34:11 +01:00
9b048a9cfc Canonize meeting names to a properly formatted date 2020-12-30 14:32:59 +01:00