Commit Graph

344 Commits

Author SHA1 Message Date
e3ee4e515d Disable highlighting of primitives
This commit prevents rich from highlighting python-looking syntax like numbers,
arrays, 'None' etc.
2021-05-13 19:47:44 +02:00
94d6a01cca Use file mtime in local crawler 2021-05-13 19:42:40 +02:00
38bb66a776 Update file metadata in more cases
PFERD now not only updates file metadata when a file is successfully added or
changed, but also when a file is downloaded and then detected to be unchanged.

This could occur for example if a remote file's modification time was bumped,
possibly because somebody touched the file without changing it.
2021-05-13 19:40:10 +02:00
68781a88ab Fix asynchronous methods being not awaited 2021-05-13 19:39:49 +02:00
910462bb72 Log stuff happening to files 2021-05-13 19:37:27 +02:00
6bd6adb977 Fix tmp file names 2021-05-13 19:36:46 +02:00
0acdee15a0 Let crawlers obtain authenticators 2021-05-13 18:57:20 +02:00
c3ce6bb31c Fix crawler cleanup not being awaited 2021-05-11 00:28:45 +02:00
0459ed093e Add simple authenticator
... including some required authenticator infrastructure
2021-05-11 00:28:03 +02:00
d5f29f01c5 Use global conductor instance
The switch from crawler-local conductors to a single pferd-global conductor was
made to prepare for auth section credential providers.
2021-05-11 00:05:04 +02:00
595ba8b7ab Remove dummy crawler 2021-05-10 23:47:46 +02:00
cec0a8e1fc Fix mymy errors 2021-05-09 01:45:01 +02:00
f9b2fd60e2 Document local crawler and auth 2021-05-09 01:33:47 +02:00
60cd9873bc Add local file crawler 2021-05-06 01:02:40 +02:00
273d56c39a Properly load crawler config 2021-05-05 23:45:10 +02:00
5497dd2827 Add @noncritical and @repeat decorators 2021-05-05 23:36:54 +02:00
bbfdadc463 Implement output directory 2021-05-05 18:08:34 +02:00
fde811ae5a Document on_conflict option 2021-05-05 12:24:35 +02:00
07e831218e Add sync report 2021-05-02 00:56:10 +02:00
91c33596da Load crawlers from config file 2021-04-30 16:22:14 +02:00
a8dcf941b9 Document possible redownload settings 2021-04-30 15:32:56 +02:00
e7a51decb0 Elaborate on transforms and implement changes 2021-04-29 20:24:18 +02:00
9ec19be113 Document config file format 2021-04-29 20:24:18 +02:00
f776186480 Use PurePath instead of Path
Path should only be used when we need to access the file system. For all other
purposes (mainly crawling), we use PurePath instead since the paths don't
correspond to paths in the local file system.
2021-04-29 20:20:25 +02:00
0096d83387 Simplify Limiter implementation 2021-04-29 20:20:25 +02:00
20a24dbcbf Add changelog 2021-04-29 20:20:25 +02:00
502654d853 Fix mypy errors 2021-04-29 15:47:52 +02:00
d2103d7c44 Document crawler 2021-04-29 15:43:20 +02:00
d96a361325 Test and fix exclusive output 2021-04-29 15:27:16 +02:00
2e85d26b6b Use conductor via context manager 2021-04-29 14:23:28 +02:00
6431a3fb3d Fix some mypy errors 2021-04-29 14:23:09 +02:00
ac3bfd7388 Make progress bars easier to use
The crawler now supports two types of progress bars
2021-04-29 13:53:16 +02:00
3ea86d18a0 Jerry-rig DummyCrawler to run 2021-04-29 13:45:04 +02:00
bbc792f9fb Implement Crawler and DummyCrawler 2021-04-29 13:44:29 +02:00
7e127cd5cc Clean up and fix conductor and limiter
Turns out you have to await an async lock, who knew...
2021-04-29 13:44:04 +02:00
c4fb92c658 Make type hints compatible with Python 3.8 2021-04-29 13:11:58 +02:00
8da1ac6cee Extend mypy config 2021-04-29 11:44:47 +02:00
a18db57e6f Implement terminal conductor 2021-04-29 11:44:47 +02:00
b915e393dd Implement limiter 2021-04-29 10:24:28 +02:00
3a74c23d09 Implement transformer 2021-04-29 09:51:50 +02:00
fbebc46c58 Load and dump config 2021-04-29 09:51:50 +02:00
5595a908d8 Configure entry point 2021-04-27 00:32:21 +02:00
27e4abcfa3 Do project setup from scratch
Following guidelines from the Python Packaging User Guide [1].

This commit intentionally breaks the .gitignore, project dependencies, GitHub
Actions and other stuff. It also removes almost the entire README. The intention
behind this is to get rid of all cruft that as accumulated over time and to have
a fresh start. Only necessary things will be re-added as they're needed.

From now on, I also plan on adding documentation for every feature at the same
time that the feature is implemented. This is to ensure that the documentation
does not become outdated.

[1]: https://packaging.python.org/
2021-04-27 00:07:54 +02:00
c1ab7485e2 Bump version to 2.6.1 v2.6.1 2021-04-19 11:21:56 +02:00
29cd5d1a3c Reflect totality of sanitize_windows_path in return type 2021-04-19 11:10:02 +02:00
6d5d9333ad Force folder to be file-system path 2021-04-19 11:07:25 +02:00
7cc40595dc Allow synchronizing to directory "." 2021-04-14 20:25:25 +02:00
80ae5ddfaa Bump version to v2.6.0 v2.6.0 2021-04-14 19:47:41 +02:00
4f480d117e Install keyring in CI 2021-04-14 19:24:05 +02:00
1f2af3a290 Retry on more I/O Errors 2021-04-13 11:43:22 +02:00