Commit Graph

11 Commits

Author SHA1 Message Date
I-Al-Istannen
ef7d5ea2d3 Allow storing crawler-specific data in reports 2021-10-30 18:09:05 +02:00
Joscha
7b062883f6 Use raw paths for --debug-transforms
Previously, the already-transformed paths were used, which meant that
--debug-transforms was cumbersome to use (as you had to remove all transforms
and crawl once before getting useful results).
2021-05-31 12:33:37 +02:00
Joscha
adb5d4ade3 Print files that are *not* deleted by cleanup
These are files that are not present on the remote source any more, but still
present locally. They also show up in the report.
2021-05-26 10:58:19 +02:00
Joscha
bce3dc384d Deduplicate path names in crawler
Also rename files so they follow the restrictions for windows file names if
we're on windows.
2021-05-25 12:11:15 +02:00
Joscha
6ca0ecdf05 Load and store reports 2021-05-23 20:46:29 +02:00
Joscha
b4d97cd545 Improve output dir and report error handling 2021-05-22 20:54:42 +02:00
Joscha
989032fe0c Fix cookies getting deleted 2021-05-15 22:25:48 +02:00
Joscha
d565df27b3 Add HttpCrawler 2021-05-13 22:28:14 +02:00
Joscha
60cd9873bc Add local file crawler 2021-05-06 01:02:40 +02:00
Joscha
bbfdadc463 Implement output directory 2021-05-05 18:08:34 +02:00
Joscha
07e831218e Add sync report 2021-05-02 00:56:10 +02:00