Commit Graph

  • 4fefb98d71 Add a wrapper to pretty-print ValueErrors in argparse parsers I-Al-Istannen 2021-05-25 11:57:59 +02:00
  • ffda4e43df Add extension to link files I-Al-Istannen 2021-05-25 11:40:41 +02:00
  • 69cb2a7734 Add Links option to ilias crawler I-Al-Istannen 2021-05-25 11:33:45 +02:00
  • c33de233dc Add script for releasing new versions Joscha 2021-05-24 20:08:49 +02:00
  • 85f89a7ff3 Interpret accordions and expandable headers as virtual folders I-Al-Istannen 2021-05-24 18:53:00 +02:00
  • 9ce20216b5 Do not set a timeout for whole HTTP request I-Al-Istannen 2021-05-24 18:32:18 +02:00
  • 1739c54091 Add checklist for releasing new versions Joscha 2021-05-24 17:50:17 +02:00
  • d8bd1f518a Set up build and release workflow Joscha 2021-05-24 15:43:53 +02:00
  • 86ba47541b Fix cookie loading and saving Joscha 2021-05-24 16:53:50 +02:00
  • 492ec6a932 Detect and skip ILIAS tests I-Al-Istannen 2021-05-24 16:32:29 +02:00
  • 342076ee0e Handle exercise detail containers in ILIAS html parser I-Al-Istannen 2021-05-24 16:22:51 +02:00
  • d44f6966c2 Log authentication attempts in HTTP crawler I-Al-Istannen 2021-05-24 16:22:11 +02:00
  • 5c76193045 Set up pyinstaller Joscha 2021-05-24 15:21:25 +02:00
  • 1c1f781be4 Reword some log messages Joscha 2021-05-24 13:17:28 +02:00
  • c687d4a51a Implement cookie sharing Joscha 2021-05-24 13:10:19 +02:00
  • fca62541ca De-duplicate element names in ILIAS crawler I-Al-Istannen 2021-05-24 00:24:31 +02:00
  • 3ab3581f84 Add timeout for HTTP connection I-Al-Istannen 2021-05-23 23:40:28 +02:00
  • 8dd0689420 Add keyring authentication to ILIAS CLI I-Al-Istannen 2021-05-23 23:04:18 +02:00
  • be4b1040f8 Document status and report options Joscha 2021-05-23 22:51:42 +02:00
  • 79be6e1dc5 Switch some other options to BooleanOptionalAction Joscha 2021-05-23 22:49:09 +02:00
  • edbd92dbbf Add --status and --report flags Joscha 2021-05-23 22:41:59 +02:00
  • 27b5a8e490 Rename log.action to log.status Joscha 2021-05-23 22:39:07 +02:00
  • 1f400d5964 Implement BooleanOptionalAction Joscha 2021-05-23 22:26:41 +02:00
  • 0ca0680165 Simplify --version Joscha 2021-05-23 21:40:48 +02:00
  • ce1dbda5b4 Overhaul colours Joscha 2021-05-23 21:27:37 +02:00
  • 9cce78669f Print report after all crawlers have finished Joscha 2021-05-23 21:13:06 +02:00
  • 6ca0ecdf05 Load and store reports Joscha 2021-05-23 20:46:12 +02:00
  • 6e9f8fd391 Add a keyring authenticator I-Al-Istannen 2021-05-23 19:44:12 +02:00
  • 2fdf24495b Restructure crawling and auth related modules Joscha 2021-05-23 19:16:42 +02:00
  • bbf9f8f130 Add -C as alias for --crawler Joscha 2021-05-23 19:05:56 +02:00
  • 37f8d84a9c Output total amount of http requests in HTTP Crawler I-Al-Istannen 2021-05-23 19:00:01 +02:00
  • 5edd868d5b Fix always-smart redownloading the wrong files Joscha 2021-05-23 18:49:34 +02:00
  • e4e5e83be6 Fix downloader using crawl bar Joscha 2021-05-23 18:39:43 +02:00
  • 74c7b39dc8 Clean up files in alphabetical order Joscha 2021-05-23 18:39:25 +02:00
  • 445dffc987 Reword some explanations Joscha 2021-05-23 18:35:32 +02:00
  • d97d6bf147 Fix handling nested ILIAS folders I-Al-Istannen 2021-05-23 18:29:28 +02:00
  • 79efdb56f7 Adjust ILIAS html explain messages I-Al-Istannen 2021-05-23 18:22:29 +02:00
  • a9af56a5e9 Improve specifying crawlers via CLI Joscha 2021-05-23 18:16:25 +02:00
  • 59f13bb8d6 Explain ILIAS HTML parsing and add some warnings I-Al-Istannen 2021-05-23 18:12:51 +02:00
  • 463f8830d7 Add warn_contd I-Al-Istannen 2021-05-23 18:12:34 +02:00
  • 05ad06fbc1 Only enclose get_page in iorepeat in ILIAS crawler I-Al-Istannen 2021-05-23 17:24:05 +02:00
  • 29d5a40c57 Replace asyncio.gather with custom Crawler function Joscha 2021-05-23 17:25:16 +02:00
  • c0cecf8363 Log crawl and download actions more extensively Joscha 2021-05-23 16:22:58 +02:00
  • b998339002 Fix cleanup logging of paths Joscha 2021-05-23 16:22:38 +02:00
  • 245c9c3dcc Explain output dir decisions and steps Joscha 2021-05-23 16:22:14 +02:00
  • d8f26a789e Implement CLI Command for ilias crawler I-Al-Istannen 2021-05-23 13:26:40 +02:00
  • e1d18708b3 Rename "no_videos" to videos I-Al-Istannen 2021-05-23 13:26:23 +02:00
  • b44b49476d Fix noncritical and anoncritical decorators Joscha 2021-05-23 13:23:28 +02:00
  • 7e0bb06259 Clean up TODOs Joscha 2021-05-23 12:47:30 +02:00
  • ecdedfa1cf Add no-videos flag to ILIAS crawler I-Al-Istannen 2021-05-23 12:36:09 +02:00
  • 3d4b997d4a Retry crawl_url and work around Python's closure handling I-Al-Istannen 2021-05-23 12:24:10 +02:00
  • e81005ae4b Fix CLI arguments Joscha 2021-05-23 11:57:59 +02:00
  • 33a81a5f5c Document authentication in HTTP crawler and rename prepare_request I-Al-Istannen 2021-05-23 11:55:34 +02:00
  • 25e2abdb03 Improve transformer explain wording Joscha 2021-05-23 11:45:14 +02:00
  • 803e5628a2 Clean up logging Joscha 2021-05-23 11:30:16 +02:00
  • c88f20859a Explain config file dumping Joscha 2021-05-23 11:04:50 +02:00
  • ec3767c545 Create crawler base dir at start of crawl Joscha 2021-05-23 10:52:02 +02:00
  • 729ff0a4c7 Fix simple authenticator output Joscha 2021-05-23 10:44:59 +02:00
  • 6fe51e258f Number rules starting at 1 Joscha 2021-05-23 10:44:18 +02:00
  • 44ecb2fbe7 Fix cleanup deleting crawler's base directory Joscha 2021-05-23 10:44:04 +02:00
  • 53e031d9f6 Reuse dl/cl for I/O retries in ILIAS crawler I-Al-Istannen 2021-05-23 00:28:27 +02:00
  • 8ac85ea0bd Fix a few typos in HttpCrawler I-Al-Istannen 2021-05-22 23:37:34 +02:00
  • adfdc302d7 Save cookies after successful authentication in HTTP crawler I-Al-Istannen 2021-05-22 23:30:32 +02:00
  • 3053278721 Move HTTP crawler to own file I-Al-Istannen 2021-05-22 23:23:21 +02:00
  • 4d07de0d71 Adjust forum log message in ilias crawler I-Al-Istannen 2021-05-22 23:20:21 +02:00
  • 953a1bba93 Adjust to new crawl / download names I-Al-Istannen 2021-05-22 23:18:05 +02:00
  • e724ff7c93 Fix normal arrow Joscha 2021-05-22 20:44:59 +00:00
  • 62f0f7bfc5 Explain crawling and partially explain downloading Joscha 2021-05-22 20:39:57 +00:00
  • 9cb2b68f09 Fix arrow parsing error messages Joscha 2021-05-22 20:39:29 +00:00
  • 1bbc0b705f Improve transformer error handling Joscha 2021-05-22 20:38:56 +00:00
  • 662191eca9 Fix crash as soon as first cl or dl token was acquired Joscha 2021-05-22 20:25:58 +00:00
  • 8fad8edc1e Remove duplicated beautifulsoup4 dependency Joscha 2021-05-22 20:02:15 +00:00
  • ae3d80664c Update local crawler to new crawler structure Joscha 2021-05-22 21:46:05 +02:00
  • e21795ee35 Make file cleanup part of default crawler behaviour Joscha 2021-05-22 21:45:51 +02:00
  • ec95dda18f Unify crawling and downloading steps Joscha 2021-05-22 21:36:53 +02:00
  • 098ac45758 Remove deprecated repeat decorators Joscha 2021-05-22 21:06:13 +02:00
  • 9889ce6b57 Improve PFERD error handling Joscha 2021-05-22 21:05:32 +02:00
  • b4d97cd545 Improve output dir and report error handling Joscha 2021-05-22 20:54:42 +02:00
  • afac22c562 Handle abort in exclusive output state correctly Joscha 2021-05-22 18:58:00 +02:00
  • 552cd82802 Run async input and password getters in daemon thread Joscha 2021-05-22 18:37:53 +02:00
  • dfde0e2310 Improve reporting of unexpected exceptions Joscha 2021-05-22 18:36:25 +02:00
  • 54dd2f8337 Clean up main and improve error handling Joscha 2021-05-22 16:47:24 +02:00
  • b5785f260e Extract CLI argument parsing to separate module Joscha 2021-05-22 15:03:45 +02:00
  • 98b8ca31fa Add some todos Joscha 2021-05-22 14:45:32 +02:00
  • 4b104b6252 Try out some HTTP authentication handling I-Al-Istannen 2021-05-21 12:02:51 +02:00
  • 83d12fcf2d Add some explains to ilias crawler and use crawler exceptions I-Al-Istannen 2021-05-20 14:58:54 +02:00
  • e4f9560655 Only retry on aiohttp errors in ILIAS crawler I-Al-Istannen 2021-05-19 22:01:09 +02:00
  • 8cfa818f04 Only call should_crawl once I-Al-Istannen 2021-05-19 21:57:55 +02:00
  • 81301f3a76 Rename the ilias crawler to ilias web crawler I-Al-Istannen 2021-05-19 21:41:17 +02:00
  • 2976b4d352 Move ILIAS file templates to own file I-Al-Istannen 2021-05-19 21:37:10 +02:00
  • 9f03702e69 Split up ilias crawler in multiple files I-Al-Istannen 2021-05-19 21:34:36 +02:00
  • 3300886120 Explain config file loading Joscha 2021-05-19 18:10:17 +02:00
  • 0d10752b5a Configure explain log level via cli and config file Joscha 2021-05-19 17:48:51 +02:00
  • 92886fb8d8 Implement --version flag Joscha 2021-05-19 17:32:23 +02:00
  • 5916626399 Make noqua comment more specific Joscha 2021-05-19 17:16:59 +02:00
  • a7c025fd86 Implement reusable FileSinkToken for OutputDirectory Joscha 2021-05-19 17:16:23 +02:00
  • b7a999bc2e Clean up crawler exceptions and (a)noncritical Joscha 2021-05-19 13:25:57 +02:00
  • 3851065500 Fix local crawler's download bars Joscha 2021-05-18 23:23:40 +02:00
  • 4b68fa771f Move logging logic to singleton Joscha 2021-05-18 22:43:46 +02:00
  • 1525aa15a6 Fix link template error and use indeterminate progress bar I-Al-Istannen 2021-05-18 22:40:28 +02:00