Joscha
0d10752b5a
Configure explain log level via cli and config file
2021-05-19 17:50:10 +02:00
Joscha
92886fb8d8
Implement --version flag
2021-05-19 17:33:36 +02:00
Joscha
b7a999bc2e
Clean up crawler exceptions and (a)noncritical
2021-05-19 13:25:57 +02:00
Joscha
4b68fa771f
Move logging logic to singleton
...
- Renamed module and class because "conductor" didn't make a lot of sense
- Used singleton approach (there's only one stdout after all)
- Redesigned progress bars (now with download speed!)
2021-05-18 22:45:19 +02:00
Joscha
0bae009189
Run formatting tools
2021-05-16 14:32:53 +02:00
Joscha
05573ccc53
Add fancy CLI options
2021-05-15 22:22:01 +02:00
Joscha
b70b62cef5
Make crawler sections start with "crawl:"
...
Also, use only the part of the section name after the "crawl:" as the crawler's
output directory. Now, the implementation matches the documentation again
2021-05-15 17:24:37 +02:00
Joscha
595de88d96
Fix authenticator and crawler names
...
Now, the "auth:" and "crawl:" parts are considered part of the name. This fixes
crawlers not being able to find their authenticators.
2021-05-15 15:25:05 +02:00
Joscha
b0f731bf84
Make crawlers use transformers
2021-05-15 15:25:05 +02:00
Joscha
acd674f0a0
Change limiter logic
...
Now download tasks are a subset of all tasks.
2021-05-15 15:25:05 +02:00
Joscha
ed2e19a150
Add reasons for invalid values
2021-05-15 15:25:05 +02:00
Joscha
296a169dd3
Make limiter logic more complex
...
The limiter can now distinguish between crawl and download actions and has a
fancy slot system and delay logic.
2021-05-15 15:25:05 +02:00
Joscha
6e5fdf4e9e
Set user agent to "pferd/<version>"
2021-05-14 21:27:44 +02:00
Joscha
d565df27b3
Add HttpCrawler
2021-05-13 22:28:14 +02:00
Joscha
68781a88ab
Fix asynchronous methods being not awaited
2021-05-13 19:39:49 +02:00
Joscha
0acdee15a0
Let crawlers obtain authenticators
2021-05-13 18:57:20 +02:00
Joscha
d5f29f01c5
Use global conductor instance
...
The switch from crawler-local conductors to a single pferd-global conductor was
made to prepare for auth section credential providers.
2021-05-11 00:05:04 +02:00
Joscha
cec0a8e1fc
Fix mymy errors
2021-05-09 01:45:01 +02:00
Joscha
60cd9873bc
Add local file crawler
2021-05-06 01:02:40 +02:00
Joscha
273d56c39a
Properly load crawler config
2021-05-05 23:45:10 +02:00
Joscha
5497dd2827
Add @noncritical and @repeat decorators
2021-05-05 23:36:54 +02:00
Joscha
bbfdadc463
Implement output directory
2021-05-05 18:08:34 +02:00
Joscha
91c33596da
Load crawlers from config file
2021-04-30 16:22:14 +02:00
Joscha
f776186480
Use PurePath instead of Path
...
Path should only be used when we need to access the file system. For all other
purposes (mainly crawling), we use PurePath instead since the paths don't
correspond to paths in the local file system.
2021-04-29 20:20:25 +02:00
Joscha
502654d853
Fix mypy errors
2021-04-29 15:47:52 +02:00
Joscha
d2103d7c44
Document crawler
2021-04-29 15:43:20 +02:00
Joscha
d96a361325
Test and fix exclusive output
2021-04-29 15:27:16 +02:00
Joscha
2e85d26b6b
Use conductor via context manager
2021-04-29 14:23:28 +02:00
Joscha
6431a3fb3d
Fix some mypy errors
2021-04-29 14:23:09 +02:00
Joscha
ac3bfd7388
Make progress bars easier to use
...
The crawler now supports two types of progress bars
2021-04-29 13:53:16 +02:00
Joscha
bbc792f9fb
Implement Crawler and DummyCrawler
2021-04-29 13:44:29 +02:00