Commit Graph

  • 6fe51e258f Number rules starting at 1 Joscha 2021-05-23 10:44:18 +0200
  • 44ecb2fbe7 Fix cleanup deleting crawler's base directory Joscha 2021-05-23 10:44:04 +0200
  • 53e031d9f6 Reuse dl/cl for I/O retries in ILIAS crawler I-Al-Istannen 2021-05-23 00:28:27 +0200
  • 8ac85ea0bd Fix a few typos in HttpCrawler I-Al-Istannen 2021-05-22 23:37:34 +0200
  • adfdc302d7 Save cookies after successful authentication in HTTP crawler I-Al-Istannen 2021-05-22 23:30:32 +0200
  • 3053278721 Move HTTP crawler to own file I-Al-Istannen 2021-05-22 23:23:21 +0200
  • 4d07de0d71 Adjust forum log message in ilias crawler I-Al-Istannen 2021-05-22 23:20:21 +0200
  • 953a1bba93 Adjust to new crawl / download names I-Al-Istannen 2021-05-22 23:18:05 +0200
  • e724ff7c93 Fix normal arrow Joscha 2021-05-22 20:44:59 +0000
  • 62f0f7bfc5 Explain crawling and partially explain downloading Joscha 2021-05-22 20:39:57 +0000
  • 9cb2b68f09 Fix arrow parsing error messages Joscha 2021-05-22 20:39:29 +0000
  • 1bbc0b705f Improve transformer error handling Joscha 2021-05-22 20:38:56 +0000
  • 662191eca9 Fix crash as soon as first cl or dl token was acquired Joscha 2021-05-22 20:25:58 +0000
  • 8fad8edc1e Remove duplicated beautifulsoup4 dependency Joscha 2021-05-22 20:02:15 +0000
  • ae3d80664c Update local crawler to new crawler structure Joscha 2021-05-22 21:46:05 +0200
  • e21795ee35 Make file cleanup part of default crawler behaviour Joscha 2021-05-22 21:45:51 +0200
  • ec95dda18f Unify crawling and downloading steps Joscha 2021-05-22 21:36:53 +0200
  • 098ac45758 Remove deprecated repeat decorators Joscha 2021-05-22 21:06:13 +0200
  • 9889ce6b57 Improve PFERD error handling Joscha 2021-05-22 21:05:32 +0200
  • b4d97cd545 Improve output dir and report error handling Joscha 2021-05-22 20:54:42 +0200
  • afac22c562 Handle abort in exclusive output state correctly Joscha 2021-05-22 18:58:00 +0200
  • 552cd82802 Run async input and password getters in daemon thread Joscha 2021-05-22 18:37:53 +0200
  • dfde0e2310 Improve reporting of unexpected exceptions Joscha 2021-05-22 18:36:25 +0200
  • 54dd2f8337 Clean up main and improve error handling Joscha 2021-05-22 16:47:24 +0200
  • b5785f260e Extract CLI argument parsing to separate module Joscha 2021-05-22 15:03:45 +0200
  • 98b8ca31fa Add some todos Joscha 2021-05-22 14:45:32 +0200
  • 4b104b6252 Try out some HTTP authentication handling I-Al-Istannen 2021-05-21 12:02:51 +0200
  • 83d12fcf2d Add some explains to ilias crawler and use crawler exceptions I-Al-Istannen 2021-05-20 14:58:54 +0200
  • e4f9560655 Only retry on aiohttp errors in ILIAS crawler I-Al-Istannen 2021-05-19 22:01:09 +0200
  • 8cfa818f04 Only call should_crawl once I-Al-Istannen 2021-05-19 21:57:55 +0200
  • 81301f3a76 Rename the ilias crawler to ilias web crawler I-Al-Istannen 2021-05-19 21:41:17 +0200
  • 2976b4d352 Move ILIAS file templates to own file I-Al-Istannen 2021-05-19 21:37:10 +0200
  • 9f03702e69 Split up ilias crawler in multiple files I-Al-Istannen 2021-05-19 21:34:36 +0200
  • 3300886120 Explain config file loading Joscha 2021-05-19 18:10:17 +0200
  • 0d10752b5a Configure explain log level via cli and config file Joscha 2021-05-19 17:48:51 +0200
  • 92886fb8d8 Implement --version flag Joscha 2021-05-19 17:32:23 +0200
  • 5916626399 Make noqua comment more specific Joscha 2021-05-19 17:16:59 +0200
  • a7c025fd86 Implement reusable FileSinkToken for OutputDirectory Joscha 2021-05-19 17:16:23 +0200
  • b7a999bc2e Clean up crawler exceptions and (a)noncritical Joscha 2021-05-19 13:25:57 +0200
  • 3851065500 Fix local crawler's download bars Joscha 2021-05-18 23:23:40 +0200
  • 4b68fa771f Move logging logic to singleton Joscha 2021-05-18 22:43:46 +0200
  • 1525aa15a6 Fix link template error and use indeterminate progress bar I-Al-Istannen 2021-05-18 22:40:28 +0200
  • db1219d4a9 Create a link file in ILIAS crawler I-Al-Istannen 2021-05-17 21:31:22 +0200
  • b8efcc2ca5 Respect filters in ILIAS crawler I-Al-Istannen 2021-05-17 21:30:26 +0200
  • 0bae009189 Run formatting tools Joscha 2021-05-16 14:32:53 +0200
  • 3efec53f51 Configure code checking and formatting tools Joscha 2021-05-16 14:31:43 +0200
  • 8b76ebb3ef Rename IliasCrawler to KitIliasCrawler I-Al-Istannen 2021-05-16 13:28:06 +0200
  • 467ea3a37e Document ILIAS-Crawler arguments in CONFIG.md I-Al-Istannen 2021-05-16 13:26:58 +0200
  • 2b6235dc78 Fix pylint warnings (and 2 found bugs) in ILIAS crawler I-Al-Istannen 2021-05-16 13:17:12 +0200
  • cd5aa61834 Set max line length for pylint I-Al-Istannen 2021-05-16 13:17:01 +0200
  • 5ccb17622e Configure pycodestyle to use a max line length of 110 I-Al-Istannen 2021-05-16 13:01:41 +0200
  • 1c226c31aa Add some repeat annotations to the ILIAS crawler I-Al-Istannen 2021-05-16 13:01:30 +0200
  • 9ec0d3e16a Implement date-demangling in ILIAS crawler I-Al-Istannen 2021-05-16 11:54:42 +0200
  • cf6903d109 Retry crawling on I/O failure I-Al-Istannen 2021-05-15 22:46:26 +0200
  • 9fd356d290 Ensure tmp files are deleted Joscha 2021-05-15 23:00:40 +0200
  • 989032fe0c Fix cookies getting deleted Joscha 2021-05-15 22:25:41 +0200
  • 05573ccc53 Add fancy CLI options Joscha 2021-05-15 21:33:51 +0200
  • c454fabc9d Add support for exercises in ILIAS crawler I-Al-Istannen 2021-05-15 21:40:17 +0200
  • 7d323ec62b Implement video downloads in ilias crawler I-Al-Istannen 2021-05-15 21:29:43 +0200
  • c7494e32ce Start implementing crawling in ILIAS crawler I-Al-Istannen 2021-05-15 20:42:18 +0200
  • 1123c8884d Implement an IliasPage I-Al-Istannen 2021-05-15 18:57:17 +0200
  • e1104f888d Add tfa authenticator Joscha 2021-05-15 18:27:16 +0200
  • 8c32da7f19 Let authenticators provide username and password separately Joscha 2021-05-15 18:24:03 +0200
  • d63494908d Properly invalidate exceptions Joscha 2021-05-15 17:37:05 +0200
  • b70b62cef5 Make crawler sections start with "crawl:" Joscha 2021-05-15 17:23:33 +0200
  • 868f486922 Rename local crawler path to target Joscha 2021-05-15 17:12:25 +0200
  • b2a2b5999b Implement ILIAS auth and crawl home page I-Al-Istannen 2021-05-15 15:18:51 +0200
  • 595de88d96 Fix authenticator and crawler names Joscha 2021-05-15 15:18:16 +0200
  • a6fdf05ee9 Allow variable whitespace in arrow rules Joscha 2021-05-15 15:13:34 +0200
  • f897d7c2e1 Add name variants for all arrows Joscha 2021-05-15 15:06:45 +0200
  • b0f731bf84 Make crawlers use transformers Joscha 2021-05-15 14:03:15 +0200
  • 302b8c0c34 Fix errors loading local crawler config Joscha 2021-05-15 13:32:13 +0200
  • acd674f0a0 Change limiter logic Joscha 2021-05-15 13:21:38 +0200
  • b0f9e1e8b4 Add vscode directory to gitignore I-Al-Istannen 2021-05-15 11:20:20 +0200
  • ed2e19a150 Add reasons for invalid values Joscha 2021-05-15 00:39:55 +0200
  • 296a169dd3 Make limiter logic more complex Joscha 2021-05-15 00:38:46 +0200
  • 1591cb9197 Add options to slow down local crawler Joscha 2021-05-14 21:41:24 +0200
  • 0c9167512c Fix output dir Joscha 2021-05-14 21:28:38 +0200
  • a673ab0fae Delete old files Joscha 2021-05-14 00:20:59 +0200
  • 6e5fdf4e9e Set user agent to "pferd/<version>" Joscha 2021-05-14 00:09:58 +0200
  • 93a5a94dab Single-source version number Joscha 2021-05-13 23:52:46 +0200
  • d565df27b3 Add HttpCrawler Joscha 2021-05-13 22:28:14 +0200
  • 961f40f9a1 Document simple authenticator Joscha 2021-05-13 19:55:04 +0200
  • e3ee4e515d Disable highlighting of primitives Joscha 2021-05-13 19:47:44 +0200
  • 94d6a01cca Use file mtime in local crawler Joscha 2021-05-13 19:42:40 +0200
  • 38bb66a776 Update file metadata in more cases Joscha 2021-05-13 19:40:10 +0200
  • 68781a88ab Fix asynchronous methods being not awaited Joscha 2021-05-13 19:39:49 +0200
  • 910462bb72 Log stuff happening to files Joscha 2021-05-13 19:37:27 +0200
  • 6bd6adb977 Fix tmp file names Joscha 2021-05-13 19:36:46 +0200
  • 0acdee15a0 Let crawlers obtain authenticators Joscha 2021-05-13 18:57:20 +0200
  • c3ce6bb31c Fix crawler cleanup not being awaited Joscha 2021-05-11 00:28:45 +0200
  • 0459ed093e Add simple authenticator Joscha 2021-05-11 00:27:43 +0200
  • d5f29f01c5 Use global conductor instance Joscha 2021-05-10 23:50:16 +0200
  • 595ba8b7ab Remove dummy crawler Joscha 2021-05-10 23:47:46 +0200
  • cec0a8e1fc Fix mymy errors Joscha 2021-05-09 01:45:01 +0200
  • f9b2fd60e2 Document local crawler and auth Joscha 2021-05-09 01:33:47 +0200
  • 60cd9873bc Add local file crawler Joscha 2021-05-06 01:02:40 +0200
  • 273d56c39a Properly load crawler config Joscha 2021-05-05 23:45:10 +0200
  • 5497dd2827 Add @noncritical and @repeat decorators Joscha 2021-05-05 23:36:54 +0200
  • bbfdadc463 Implement output directory Joscha 2021-05-05 18:08:34 +0200