Commit Graph

489 Commits

Author SHA1 Message Date
Joscha
6bd6adb977 Fix tmp file names 2021-05-13 19:36:46 +02:00
Joscha
0acdee15a0 Let crawlers obtain authenticators 2021-05-13 18:57:20 +02:00
Joscha
c3ce6bb31c Fix crawler cleanup not being awaited 2021-05-11 00:28:45 +02:00
Joscha
0459ed093e Add simple authenticator
... including some required authenticator infrastructure
2021-05-11 00:28:03 +02:00
Joscha
d5f29f01c5 Use global conductor instance
The switch from crawler-local conductors to a single pferd-global conductor was
made to prepare for auth section credential providers.
2021-05-11 00:05:04 +02:00
Joscha
595ba8b7ab Remove dummy crawler 2021-05-10 23:47:46 +02:00
Joscha
cec0a8e1fc Fix mymy errors 2021-05-09 01:45:01 +02:00
Joscha
f9b2fd60e2 Document local crawler and auth 2021-05-09 01:33:47 +02:00
Joscha
60cd9873bc Add local file crawler 2021-05-06 01:02:40 +02:00
Joscha
273d56c39a Properly load crawler config 2021-05-05 23:45:10 +02:00
Joscha
5497dd2827 Add @noncritical and @repeat decorators 2021-05-05 23:36:54 +02:00
Joscha
bbfdadc463 Implement output directory 2021-05-05 18:08:34 +02:00
Joscha
fde811ae5a Document on_conflict option 2021-05-05 12:24:35 +02:00
Joscha
07e831218e Add sync report 2021-05-02 00:56:10 +02:00
Joscha
91c33596da Load crawlers from config file 2021-04-30 16:22:14 +02:00
Joscha
a8dcf941b9 Document possible redownload settings 2021-04-30 15:32:56 +02:00
Joscha
e7a51decb0 Elaborate on transforms and implement changes 2021-04-29 20:24:18 +02:00
Joscha
9ec19be113 Document config file format 2021-04-29 20:24:18 +02:00
Joscha
f776186480 Use PurePath instead of Path
Path should only be used when we need to access the file system. For all other
purposes (mainly crawling), we use PurePath instead since the paths don't
correspond to paths in the local file system.
2021-04-29 20:20:25 +02:00
Joscha
0096d83387 Simplify Limiter implementation 2021-04-29 20:20:25 +02:00
Joscha
20a24dbcbf Add changelog 2021-04-29 20:20:25 +02:00
Joscha
502654d853 Fix mypy errors 2021-04-29 15:47:52 +02:00
Joscha
d2103d7c44 Document crawler 2021-04-29 15:43:20 +02:00
Joscha
d96a361325 Test and fix exclusive output 2021-04-29 15:27:16 +02:00
Joscha
2e85d26b6b Use conductor via context manager 2021-04-29 14:23:28 +02:00
Joscha
6431a3fb3d Fix some mypy errors 2021-04-29 14:23:09 +02:00
Joscha
ac3bfd7388 Make progress bars easier to use
The crawler now supports two types of progress bars
2021-04-29 13:53:16 +02:00
Joscha
3ea86d18a0 Jerry-rig DummyCrawler to run 2021-04-29 13:45:04 +02:00
Joscha
bbc792f9fb Implement Crawler and DummyCrawler 2021-04-29 13:44:29 +02:00
Joscha
7e127cd5cc Clean up and fix conductor and limiter
Turns out you have to await an async lock, who knew...
2021-04-29 13:44:04 +02:00
Joscha
c4fb92c658 Make type hints compatible with Python 3.8 2021-04-29 13:11:58 +02:00
Joscha
8da1ac6cee Extend mypy config 2021-04-29 11:44:47 +02:00
Joscha
a18db57e6f Implement terminal conductor 2021-04-29 11:44:47 +02:00
Joscha
b915e393dd Implement limiter 2021-04-29 10:24:28 +02:00
Joscha
3a74c23d09 Implement transformer 2021-04-29 09:51:50 +02:00
Joscha
fbebc46c58 Load and dump config 2021-04-29 09:51:50 +02:00
Joscha
5595a908d8 Configure entry point 2021-04-27 00:32:21 +02:00
Joscha
27e4abcfa3 Do project setup from scratch
Following guidelines from the Python Packaging User Guide [1].

This commit intentionally breaks the .gitignore, project dependencies, GitHub
Actions and other stuff. It also removes almost the entire README. The intention
behind this is to get rid of all cruft that as accumulated over time and to have
a fresh start. Only necessary things will be re-added as they're needed.

From now on, I also plan on adding documentation for every feature at the same
time that the feature is implemented. This is to ensure that the documentation
does not become outdated.

[1]: https://packaging.python.org/
2021-04-27 00:07:54 +02:00
I-Al-Istannen
c1ab7485e2 Bump version to 2.6.1 2021-04-19 11:21:56 +02:00
I-Al-Istannen
29cd5d1a3c Reflect totality of sanitize_windows_path in return type 2021-04-19 11:10:02 +02:00
I-Al-Istannen
6d5d9333ad Force folder to be file-system path 2021-04-19 11:07:25 +02:00
I-Al-Istannen
7cc40595dc Allow synchronizing to directory "." 2021-04-14 20:25:25 +02:00
I-Al-Istannen
80ae5ddfaa Bump version to v2.6.0 2021-04-14 19:47:41 +02:00
I-Al-Istannen
4f480d117e Install keyring in CI 2021-04-14 19:24:05 +02:00
I-Al-Istannen
1f2af3a290 Retry on more I/O Errors 2021-04-13 11:43:22 +02:00
I-Al-Istannen
14cdfb6a69 Fix typo in date demangler doc 2021-04-13 11:19:51 +02:00
I-Al-Istannen
e2bf84392b [sync_url] Properly declare "no-videos" as flag 2021-04-08 18:12:27 +02:00
I-Al-Istannen
946b7a7931 Also crawl .c/.java/.zip from IPD page 2021-02-09 12:30:59 +01:00
I-Al-Istannen
9a9018751e Bump version 2021-02-06 22:54:05 +01:00
I-Al-Istannen
83b75e8254 syncurl: Sanitize element name on windows if it is used as folder name
Otherwise the name of the course might not be a invalid file name.
2021-02-06 22:53:26 +01:00