739dd95850
Use Last-Modified and ETag headers to determine KIT-IPD file versions ( #95 )
...
Co-authored-by: I-Al-Istannen <i-al-istannen@users.noreply.github.com >
2024-10-27 19:03:47 +01:00
dbc2553b11
Add default show-not-deleted
option
...
If set to `no`, PFERD won't print status or report messages for not deleted files
2023-08-26 18:43:01 +02:00
443f7fe839
Add no-delete-prompt-overwrite
crawler conflict resolution option ( #75 )
2023-07-29 18:36:33 +02:00
aa74604d29
Use utf-8 for report
2022-04-29 23:11:27 +02:00
e467b38d73
Only reject 1970 timestamps on windows
2022-01-09 18:23:00 +01:00
eb4de8ae0c
Ignore 1970 dates as windows crashes when calling .timestamp()
2022-01-08 18:14:43 +01:00
64a2960751
Align paths in status messages and progress bars
...
Also print "Ignored" when paths are ignored due to transforms
2021-05-31 12:32:42 +02:00
adb5d4ade3
Print files that are *not* deleted by cleanup
...
These are files that are not present on the remote source any more, but still
present locally. They also show up in the report.
2021-05-26 10:58:19 +02:00
07a75a37c3
Fix FileNotFoundError on Windows
2021-05-25 15:57:03 +00:00
980578d05a
Avoid downloading in some cases
...
Depending on how on_conflict is set, we can determine a few situations where
downloading is never necessary.
2021-05-25 15:20:30 +02:00
eb8b915813
Fix path prefix on windows
...
Previously, the path prefix was only set if "windows_paths" was true, regardless
of OS. Now the path prefix is always set on windows and never set on other OSes.
2021-05-25 14:23:38 +02:00
bce3dc384d
Deduplicate path names in crawler
...
Also rename files so they follow the restrictions for windows file names if
we're on windows.
2021-05-25 12:11:15 +02:00
27b5a8e490
Rename log.action to log.status
2021-05-23 22:40:33 +02:00
ce1dbda5b4
Overhaul colours
...
"Crawled" and "Downloaded" are now printed less bright than "Crawling" and
"Downloading" as they're not as important. Explain topics are printed in yellow
to stand out a bit more from the cyan action messages.
2021-05-23 21:33:04 +02:00
6ca0ecdf05
Load and store reports
2021-05-23 20:46:29 +02:00
5edd868d5b
Fix always-smart redownloading the wrong files
2021-05-23 18:49:34 +02:00
74c7b39dc8
Clean up files in alphabetical order
2021-05-23 18:39:25 +02:00
445dffc987
Reword some explanations
2021-05-23 18:35:32 +02:00
c0cecf8363
Log crawl and download actions more extensively
2021-05-23 16:25:44 +02:00
b998339002
Fix cleanup logging of paths
2021-05-23 16:25:44 +02:00
245c9c3dcc
Explain output dir decisions and steps
2021-05-23 16:25:44 +02:00
803e5628a2
Clean up logging
...
Paths are now (hopefully) logged consistently across all crawlers
2021-05-23 11:37:19 +02:00
ec3767c545
Create crawler base dir at start of crawl
2021-05-23 10:52:02 +02:00
44ecb2fbe7
Fix cleanup deleting crawler's base directory
2021-05-23 10:45:37 +02:00
ec95dda18f
Unify crawling and downloading steps
...
Now, the progress bar, limiter etc. for downloading and crawling are all handled
via the reusable CrawlToken and DownloadToken context managers.
2021-05-22 21:36:53 +02:00
b4d97cd545
Improve output dir and report error handling
2021-05-22 20:54:42 +02:00
a7c025fd86
Implement reusable FileSinkToken for OutputDirectory
2021-05-19 17:16:23 +02:00
4b68fa771f
Move logging logic to singleton
...
- Renamed module and class because "conductor" didn't make a lot of sense
- Used singleton approach (there's only one stdout after all)
- Redesigned progress bars (now with download speed!)
2021-05-18 22:45:19 +02:00
0bae009189
Run formatting tools
2021-05-16 14:32:53 +02:00
9fd356d290
Ensure tmp files are deleted
...
This doesn't seem to fix the case where an exception bubbles up to the top of
the event loop. It also doesn't seem to fix the case when a KeyboardInterrupt is
thrown, since that never makes its way into the event loop in the first place.
Both of these cases lead to the event loop stopping, which means that the tmp
file cleanup doesn't get executed even though it's inside a "with" or "finally".
2021-05-15 23:00:40 +02:00
989032fe0c
Fix cookies getting deleted
2021-05-15 22:25:48 +02:00
05573ccc53
Add fancy CLI options
2021-05-15 22:22:01 +02:00
0c9167512c
Fix output dir
...
I missed these while renaming the resolve function. Shame on me for not running
mypy earlier.
2021-05-14 21:28:38 +02:00
d565df27b3
Add HttpCrawler
2021-05-13 22:28:14 +02:00
38bb66a776
Update file metadata in more cases
...
PFERD now not only updates file metadata when a file is successfully added or
changed, but also when a file is downloaded and then detected to be unchanged.
This could occur for example if a remote file's modification time was bumped,
possibly because somebody touched the file without changing it.
2021-05-13 19:40:10 +02:00
68781a88ab
Fix asynchronous methods being not awaited
2021-05-13 19:39:49 +02:00
910462bb72
Log stuff happening to files
2021-05-13 19:37:27 +02:00
6bd6adb977
Fix tmp file names
2021-05-13 19:36:46 +02:00
60cd9873bc
Add local file crawler
2021-05-06 01:02:40 +02:00
bbfdadc463
Implement output directory
2021-05-05 18:08:34 +02:00