595de88d96
Fix authenticator and crawler names
...
Now, the "auth:" and "crawl:" parts are considered part of the name. This fixes
crawlers not being able to find their authenticators.
2021-05-15 15:25:05 +02:00
a6fdf05ee9
Allow variable whitespace in arrow rules
2021-05-15 15:25:05 +02:00
f897d7c2e1
Add name variants for all arrows
2021-05-15 15:25:05 +02:00
b0f731bf84
Make crawlers use transformers
2021-05-15 15:25:05 +02:00
302b8c0c34
Fix errors loading local crawler config
...
Apparently getint and getfloat may return a None even though this is not
mentioned in their type annotations.
2021-05-15 15:25:05 +02:00
acd674f0a0
Change limiter logic
...
Now download tasks are a subset of all tasks.
2021-05-15 15:25:05 +02:00
ed2e19a150
Add reasons for invalid values
2021-05-15 15:25:05 +02:00
296a169dd3
Make limiter logic more complex
...
The limiter can now distinguish between crawl and download actions and has a
fancy slot system and delay logic.
2021-05-15 15:25:05 +02:00
1591cb9197
Add options to slow down local crawler
...
These options are meant to make the local crawler behave more like a
network-based crawler for purposes of testing and debugging other parts of the
code base.
2021-05-15 15:25:01 +02:00
0c9167512c
Fix output dir
...
I missed these while renaming the resolve function. Shame on me for not running
mypy earlier.
2021-05-14 21:28:38 +02:00
a673ab0fae
Delete old files
...
I should've done this earlier
2021-05-14 21:27:44 +02:00
6e5fdf4e9e
Set user agent to "pferd/<version>"
2021-05-14 21:27:44 +02:00
93a5a94dab
Single-source version number
2021-05-14 21:27:44 +02:00
d565df27b3
Add HttpCrawler
2021-05-13 22:28:14 +02:00
e3ee4e515d
Disable highlighting of primitives
...
This commit prevents rich from highlighting python-looking syntax like numbers,
arrays, 'None' etc.
2021-05-13 19:47:44 +02:00
94d6a01cca
Use file mtime in local crawler
2021-05-13 19:42:40 +02:00
38bb66a776
Update file metadata in more cases
...
PFERD now not only updates file metadata when a file is successfully added or
changed, but also when a file is downloaded and then detected to be unchanged.
This could occur for example if a remote file's modification time was bumped,
possibly because somebody touched the file without changing it.
2021-05-13 19:40:10 +02:00
68781a88ab
Fix asynchronous methods being not awaited
2021-05-13 19:39:49 +02:00
910462bb72
Log stuff happening to files
2021-05-13 19:37:27 +02:00
6bd6adb977
Fix tmp file names
2021-05-13 19:36:46 +02:00
0acdee15a0
Let crawlers obtain authenticators
2021-05-13 18:57:20 +02:00
c3ce6bb31c
Fix crawler cleanup not being awaited
2021-05-11 00:28:45 +02:00
0459ed093e
Add simple authenticator
...
... including some required authenticator infrastructure
2021-05-11 00:28:03 +02:00
d5f29f01c5
Use global conductor instance
...
The switch from crawler-local conductors to a single pferd-global conductor was
made to prepare for auth section credential providers.
2021-05-11 00:05:04 +02:00
595ba8b7ab
Remove dummy crawler
2021-05-10 23:47:46 +02:00
cec0a8e1fc
Fix mymy errors
2021-05-09 01:45:01 +02:00
f9b2fd60e2
Document local crawler and auth
2021-05-09 01:33:47 +02:00
60cd9873bc
Add local file crawler
2021-05-06 01:02:40 +02:00
273d56c39a
Properly load crawler config
2021-05-05 23:45:10 +02:00
5497dd2827
Add @noncritical and @repeat decorators
2021-05-05 23:36:54 +02:00
bbfdadc463
Implement output directory
2021-05-05 18:08:34 +02:00
07e831218e
Add sync report
2021-05-02 00:56:10 +02:00
91c33596da
Load crawlers from config file
2021-04-30 16:22:14 +02:00
e7a51decb0
Elaborate on transforms and implement changes
2021-04-29 20:24:18 +02:00
f776186480
Use PurePath instead of Path
...
Path should only be used when we need to access the file system. For all other
purposes (mainly crawling), we use PurePath instead since the paths don't
correspond to paths in the local file system.
2021-04-29 20:20:25 +02:00
0096d83387
Simplify Limiter implementation
2021-04-29 20:20:25 +02:00
502654d853
Fix mypy errors
2021-04-29 15:47:52 +02:00
d2103d7c44
Document crawler
2021-04-29 15:43:20 +02:00
d96a361325
Test and fix exclusive output
2021-04-29 15:27:16 +02:00
2e85d26b6b
Use conductor via context manager
2021-04-29 14:23:28 +02:00
6431a3fb3d
Fix some mypy errors
2021-04-29 14:23:09 +02:00
ac3bfd7388
Make progress bars easier to use
...
The crawler now supports two types of progress bars
2021-04-29 13:53:16 +02:00
3ea86d18a0
Jerry-rig DummyCrawler to run
2021-04-29 13:45:04 +02:00
bbc792f9fb
Implement Crawler and DummyCrawler
2021-04-29 13:44:29 +02:00
7e127cd5cc
Clean up and fix conductor and limiter
...
Turns out you have to await an async lock, who knew...
2021-04-29 13:44:04 +02:00
c4fb92c658
Make type hints compatible with Python 3.8
2021-04-29 13:11:58 +02:00
a18db57e6f
Implement terminal conductor
2021-04-29 11:44:47 +02:00
b915e393dd
Implement limiter
2021-04-29 10:24:28 +02:00
3a74c23d09
Implement transformer
2021-04-29 09:51:50 +02:00
fbebc46c58
Load and dump config
2021-04-29 09:51:50 +02:00