Mr. Pine
dbc2553b11
Add default show-not-deleted
option
...
If set to `no`, PFERD won't print status or report messages for not deleted files
2023-08-26 18:43:01 +02:00
Mr. Pine
443f7fe839
Add no-delete-prompt-overwrite
crawler conflict resolution option ( #75 )
2023-07-29 18:36:33 +02:00
Joscha
aa74604d29
Use utf-8 for report
2022-04-29 23:11:27 +02:00
I-Al-Istannen
e467b38d73
Only reject 1970 timestamps on windows
2022-01-09 18:23:00 +01:00
I-Al-Istannen
eb4de8ae0c
Ignore 1970 dates as windows crashes when calling .timestamp()
2022-01-08 18:14:43 +01:00
Joscha
64a2960751
Align paths in status messages and progress bars
...
Also print "Ignored" when paths are ignored due to transforms
2021-05-31 12:32:42 +02:00
Joscha
adb5d4ade3
Print files that are *not* deleted by cleanup
...
These are files that are not present on the remote source any more, but still
present locally. They also show up in the report.
2021-05-26 10:58:19 +02:00
Joscha
07a75a37c3
Fix FileNotFoundError on Windows
2021-05-25 15:57:03 +00:00
Joscha
980578d05a
Avoid downloading in some cases
...
Depending on how on_conflict is set, we can determine a few situations where
downloading is never necessary.
2021-05-25 15:20:30 +02:00
Joscha
eb8b915813
Fix path prefix on windows
...
Previously, the path prefix was only set if "windows_paths" was true, regardless
of OS. Now the path prefix is always set on windows and never set on other OSes.
2021-05-25 14:23:38 +02:00
Joscha
bce3dc384d
Deduplicate path names in crawler
...
Also rename files so they follow the restrictions for windows file names if
we're on windows.
2021-05-25 12:11:15 +02:00
Joscha
27b5a8e490
Rename log.action to log.status
2021-05-23 22:40:33 +02:00
Joscha
ce1dbda5b4
Overhaul colours
...
"Crawled" and "Downloaded" are now printed less bright than "Crawling" and
"Downloading" as they're not as important. Explain topics are printed in yellow
to stand out a bit more from the cyan action messages.
2021-05-23 21:33:04 +02:00
Joscha
6ca0ecdf05
Load and store reports
2021-05-23 20:46:29 +02:00
Joscha
5edd868d5b
Fix always-smart redownloading the wrong files
2021-05-23 18:49:34 +02:00
Joscha
74c7b39dc8
Clean up files in alphabetical order
2021-05-23 18:39:25 +02:00
Joscha
445dffc987
Reword some explanations
2021-05-23 18:35:32 +02:00
Joscha
c0cecf8363
Log crawl and download actions more extensively
2021-05-23 16:25:44 +02:00
Joscha
b998339002
Fix cleanup logging of paths
2021-05-23 16:25:44 +02:00
Joscha
245c9c3dcc
Explain output dir decisions and steps
2021-05-23 16:25:44 +02:00
Joscha
803e5628a2
Clean up logging
...
Paths are now (hopefully) logged consistently across all crawlers
2021-05-23 11:37:19 +02:00
Joscha
ec3767c545
Create crawler base dir at start of crawl
2021-05-23 10:52:02 +02:00
Joscha
44ecb2fbe7
Fix cleanup deleting crawler's base directory
2021-05-23 10:45:37 +02:00
Joscha
ec95dda18f
Unify crawling and downloading steps
...
Now, the progress bar, limiter etc. for downloading and crawling are all handled
via the reusable CrawlToken and DownloadToken context managers.
2021-05-22 21:36:53 +02:00
Joscha
b4d97cd545
Improve output dir and report error handling
2021-05-22 20:54:42 +02:00
Joscha
a7c025fd86
Implement reusable FileSinkToken for OutputDirectory
2021-05-19 17:16:23 +02:00
Joscha
4b68fa771f
Move logging logic to singleton
...
- Renamed module and class because "conductor" didn't make a lot of sense
- Used singleton approach (there's only one stdout after all)
- Redesigned progress bars (now with download speed!)
2021-05-18 22:45:19 +02:00
Joscha
0bae009189
Run formatting tools
2021-05-16 14:32:53 +02:00
Joscha
9fd356d290
Ensure tmp files are deleted
...
This doesn't seem to fix the case where an exception bubbles up to the top of
the event loop. It also doesn't seem to fix the case when a KeyboardInterrupt is
thrown, since that never makes its way into the event loop in the first place.
Both of these cases lead to the event loop stopping, which means that the tmp
file cleanup doesn't get executed even though it's inside a "with" or "finally".
2021-05-15 23:00:40 +02:00
Joscha
989032fe0c
Fix cookies getting deleted
2021-05-15 22:25:48 +02:00
Joscha
05573ccc53
Add fancy CLI options
2021-05-15 22:22:01 +02:00
Joscha
0c9167512c
Fix output dir
...
I missed these while renaming the resolve function. Shame on me for not running
mypy earlier.
2021-05-14 21:28:38 +02:00
Joscha
d565df27b3
Add HttpCrawler
2021-05-13 22:28:14 +02:00
Joscha
38bb66a776
Update file metadata in more cases
...
PFERD now not only updates file metadata when a file is successfully added or
changed, but also when a file is downloaded and then detected to be unchanged.
This could occur for example if a remote file's modification time was bumped,
possibly because somebody touched the file without changing it.
2021-05-13 19:40:10 +02:00
Joscha
68781a88ab
Fix asynchronous methods being not awaited
2021-05-13 19:39:49 +02:00
Joscha
910462bb72
Log stuff happening to files
2021-05-13 19:37:27 +02:00
Joscha
6bd6adb977
Fix tmp file names
2021-05-13 19:36:46 +02:00
Joscha
60cd9873bc
Add local file crawler
2021-05-06 01:02:40 +02:00
Joscha
bbfdadc463
Implement output directory
2021-05-05 18:08:34 +02:00