Joscha
1c1f781be4
Reword some log messages
2021-05-24 13:17:28 +02:00
Joscha
c687d4a51a
Implement cookie sharing
2021-05-24 13:10:44 +02:00
I-Al-Istannen
fca62541ca
De-duplicate element names in ILIAS crawler
...
This prevents any conflicts caused by multiple files with the same name.
Conflicts may still arise due to transforms, but that is out of our
control and a user error.
2021-05-24 00:24:31 +02:00
I-Al-Istannen
3ab3581f84
Add timeout for HTTP connection
2021-05-23 23:41:05 +02:00
I-Al-Istannen
8dd0689420
Add keyring authentication to ILIAS CLI
2021-05-23 23:04:18 +02:00
Joscha
79be6e1dc5
Switch some other options to BooleanOptionalAction
2021-05-23 22:49:09 +02:00
Joscha
edbd92dbbf
Add --status and --report flags
2021-05-23 22:41:59 +02:00
Joscha
27b5a8e490
Rename log.action to log.status
2021-05-23 22:40:33 +02:00
Joscha
1f400d5964
Implement BooleanOptionalAction
2021-05-23 22:26:59 +02:00
Joscha
0ca0680165
Simplify --version
2021-05-23 21:40:48 +02:00
Joscha
ce1dbda5b4
Overhaul colours
...
"Crawled" and "Downloaded" are now printed less bright than "Crawling" and
"Downloading" as they're not as important. Explain topics are printed in yellow
to stand out a bit more from the cyan action messages.
2021-05-23 21:33:04 +02:00
Joscha
9cce78669f
Print report after all crawlers have finished
2021-05-23 21:17:13 +02:00
Joscha
6ca0ecdf05
Load and store reports
2021-05-23 20:46:29 +02:00
I-Al-Istannen
6e9f8fd391
Add a keyring authenticator
2021-05-23 19:44:12 +02:00
Joscha
2fdf24495b
Restructure crawling and auth related modules
2021-05-23 19:16:42 +02:00
Joscha
bbf9f8f130
Add -C as alias for --crawler
2021-05-23 19:06:09 +02:00
I-Al-Istannen
37f8d84a9c
Output total amount of http requests in HTTP Crawler
2021-05-23 19:00:01 +02:00
Joscha
5edd868d5b
Fix always-smart redownloading the wrong files
2021-05-23 18:49:34 +02:00
Joscha
e4e5e83be6
Fix downloader using crawl bar
...
Looks like I made a dumb copy-paste error. Now the download bar shows the proper
progress and speed again.
2021-05-23 18:39:43 +02:00
Joscha
74c7b39dc8
Clean up files in alphabetical order
2021-05-23 18:39:25 +02:00
Joscha
445dffc987
Reword some explanations
2021-05-23 18:35:32 +02:00
I-Al-Istannen
d97d6bf147
Fix handling nested ILIAS folders
2021-05-23 18:29:28 +02:00
I-Al-Istannen
79efdb56f7
Adjust ILIAS html explain messages
2021-05-23 18:24:25 +02:00
Joscha
a9af56a5e9
Improve specifying crawlers via CLI
...
Instead of removing the sections of unselected crawlers from the config file,
crawler selection now happens in the Pferd after loading the crawlers and is
more sophisticated. It also has better error messages.
2021-05-23 18:18:50 +02:00
I-Al-Istannen
59f13bb8d6
Explain ILIAS HTML parsing and add some warnings
2021-05-23 18:14:54 +02:00
I-Al-Istannen
463f8830d7
Add warn_contd
2021-05-23 18:14:54 +02:00
I-Al-Istannen
05ad06fbc1
Only enclose get_page in iorepeat in ILIAS crawler
...
We previously also gathered in there, which could lead to some more
surprises when the method was retried.
2021-05-23 18:14:51 +02:00
Joscha
29d5a40c57
Replace asyncio.gather with custom Crawler function
2021-05-23 17:25:16 +02:00
Joscha
c0cecf8363
Log crawl and download actions more extensively
2021-05-23 16:25:44 +02:00
Joscha
b998339002
Fix cleanup logging of paths
2021-05-23 16:25:44 +02:00
Joscha
245c9c3dcc
Explain output dir decisions and steps
2021-05-23 16:25:44 +02:00
I-Al-Istannen
d8f26a789e
Implement CLI Command for ilias crawler
2021-05-23 13:30:42 +02:00
I-Al-Istannen
e1d18708b3
Rename "no_videos" to videos
2021-05-23 13:30:42 +02:00
Joscha
b44b49476d
Fix noncritical and anoncritical decorators
...
I must've forgot to update the anoncritical decorator when I last changed the
noncritical decorator. Also, every exception should make the crawler not
error_free, not just CrawlErrors.
2021-05-23 13:24:53 +02:00
Joscha
7e0bb06259
Clean up TODOs
2021-05-23 12:47:30 +02:00
I-Al-Istannen
ecdedfa1cf
Add no-videos flag to ILIAS crawler
2021-05-23 12:37:01 +02:00
I-Al-Istannen
3d4b997d4a
Retry crawl_url and work around Python's closure handling
...
Closures capture the scope and not the variables. Therefore, any
type-narrowing performed by mypy on captured variables is lost inside
the closure.
2021-05-23 12:28:15 +02:00
Joscha
e81005ae4b
Fix CLI arguments
2021-05-23 12:24:21 +02:00
I-Al-Istannen
33a81a5f5c
Document authentication in HTTP crawler and rename prepare_request
2021-05-23 11:55:34 +02:00
Joscha
25e2abdb03
Improve transformer explain wording
2021-05-23 11:45:14 +02:00
Joscha
803e5628a2
Clean up logging
...
Paths are now (hopefully) logged consistently across all crawlers
2021-05-23 11:37:19 +02:00
Joscha
c88f20859a
Explain config file dumping
2021-05-23 11:04:50 +02:00
Joscha
ec3767c545
Create crawler base dir at start of crawl
2021-05-23 10:52:02 +02:00
Joscha
729ff0a4c7
Fix simple authenticator output
2021-05-23 10:45:37 +02:00
Joscha
6fe51e258f
Number rules starting at 1
2021-05-23 10:45:37 +02:00
Joscha
44ecb2fbe7
Fix cleanup deleting crawler's base directory
2021-05-23 10:45:37 +02:00
I-Al-Istannen
53e031d9f6
Reuse dl/cl for I/O retries in ILIAS crawler
2021-05-23 00:28:27 +02:00
I-Al-Istannen
8ac85ea0bd
Fix a few typos in HttpCrawler
2021-05-22 23:37:34 +02:00
I-Al-Istannen
adfdc302d7
Save cookies after successful authentication in HTTP crawler
2021-05-22 23:30:32 +02:00
I-Al-Istannen
3053278721
Move HTTP crawler to own file
2021-05-22 23:23:21 +02:00