Commit Graph

  • db1219d4a9 Create a link file in ILIAS crawler I-Al-Istannen 2021-05-17 21:31:22 +02:00
  • b8efcc2ca5 Respect filters in ILIAS crawler I-Al-Istannen 2021-05-17 21:30:26 +02:00
  • 0bae009189 Run formatting tools Joscha 2021-05-16 14:32:53 +02:00
  • 3efec53f51 Configure code checking and formatting tools Joscha 2021-05-16 14:31:43 +02:00
  • 8b76ebb3ef Rename IliasCrawler to KitIliasCrawler I-Al-Istannen 2021-05-16 13:28:06 +02:00
  • 467ea3a37e Document ILIAS-Crawler arguments in CONFIG.md I-Al-Istannen 2021-05-16 13:26:58 +02:00
  • 2b6235dc78 Fix pylint warnings (and 2 found bugs) in ILIAS crawler I-Al-Istannen 2021-05-16 13:17:12 +02:00
  • cd5aa61834 Set max line length for pylint I-Al-Istannen 2021-05-16 13:17:01 +02:00
  • 5ccb17622e Configure pycodestyle to use a max line length of 110 I-Al-Istannen 2021-05-16 13:01:41 +02:00
  • 1c226c31aa Add some repeat annotations to the ILIAS crawler I-Al-Istannen 2021-05-16 13:01:30 +02:00
  • 9ec0d3e16a Implement date-demangling in ILIAS crawler I-Al-Istannen 2021-05-16 11:54:42 +02:00
  • cf6903d109 Retry crawling on I/O failure I-Al-Istannen 2021-05-15 22:46:26 +02:00
  • 9fd356d290 Ensure tmp files are deleted Joscha 2021-05-15 23:00:40 +02:00
  • 989032fe0c Fix cookies getting deleted Joscha 2021-05-15 22:25:41 +02:00
  • 05573ccc53 Add fancy CLI options Joscha 2021-05-15 21:33:51 +02:00
  • c454fabc9d Add support for exercises in ILIAS crawler I-Al-Istannen 2021-05-15 21:40:17 +02:00
  • 7d323ec62b Implement video downloads in ilias crawler I-Al-Istannen 2021-05-15 21:29:43 +02:00
  • c7494e32ce Start implementing crawling in ILIAS crawler I-Al-Istannen 2021-05-15 20:42:18 +02:00
  • 1123c8884d Implement an IliasPage I-Al-Istannen 2021-05-15 18:57:17 +02:00
  • e1104f888d Add tfa authenticator Joscha 2021-05-15 18:27:16 +02:00
  • 8c32da7f19 Let authenticators provide username and password separately Joscha 2021-05-15 18:24:03 +02:00
  • d63494908d Properly invalidate exceptions Joscha 2021-05-15 17:37:05 +02:00
  • b70b62cef5 Make crawler sections start with "crawl:" Joscha 2021-05-15 17:23:33 +02:00
  • 868f486922 Rename local crawler path to target Joscha 2021-05-15 17:12:25 +02:00
  • b2a2b5999b Implement ILIAS auth and crawl home page I-Al-Istannen 2021-05-15 15:18:51 +02:00
  • 595de88d96 Fix authenticator and crawler names Joscha 2021-05-15 15:18:16 +02:00
  • a6fdf05ee9 Allow variable whitespace in arrow rules Joscha 2021-05-15 15:13:34 +02:00
  • f897d7c2e1 Add name variants for all arrows Joscha 2021-05-15 15:06:45 +02:00
  • b0f731bf84 Make crawlers use transformers Joscha 2021-05-15 14:03:15 +02:00
  • 302b8c0c34 Fix errors loading local crawler config Joscha 2021-05-15 13:32:13 +02:00
  • acd674f0a0 Change limiter logic Joscha 2021-05-15 13:21:38 +02:00
  • b0f9e1e8b4 Add vscode directory to gitignore I-Al-Istannen 2021-05-15 11:20:20 +02:00
  • ed2e19a150 Add reasons for invalid values Joscha 2021-05-15 00:39:55 +02:00
  • 296a169dd3 Make limiter logic more complex Joscha 2021-05-15 00:38:46 +02:00
  • 1591cb9197 Add options to slow down local crawler Joscha 2021-05-14 21:41:24 +02:00
  • 0c9167512c Fix output dir Joscha 2021-05-14 21:28:38 +02:00
  • a673ab0fae Delete old files Joscha 2021-05-14 00:20:59 +02:00
  • 6e5fdf4e9e Set user agent to "pferd/<version>" Joscha 2021-05-14 00:09:58 +02:00
  • 93a5a94dab Single-source version number Joscha 2021-05-13 23:52:46 +02:00
  • d565df27b3 Add HttpCrawler Joscha 2021-05-13 22:28:14 +02:00
  • 961f40f9a1 Document simple authenticator Joscha 2021-05-13 19:55:04 +02:00
  • e3ee4e515d Disable highlighting of primitives Joscha 2021-05-13 19:47:44 +02:00
  • 94d6a01cca Use file mtime in local crawler Joscha 2021-05-13 19:42:40 +02:00
  • 38bb66a776 Update file metadata in more cases Joscha 2021-05-13 19:40:10 +02:00
  • 68781a88ab Fix asynchronous methods being not awaited Joscha 2021-05-13 19:39:49 +02:00
  • 910462bb72 Log stuff happening to files Joscha 2021-05-13 19:37:27 +02:00
  • 6bd6adb977 Fix tmp file names Joscha 2021-05-13 19:36:46 +02:00
  • 0acdee15a0 Let crawlers obtain authenticators Joscha 2021-05-13 18:57:20 +02:00
  • c3ce6bb31c Fix crawler cleanup not being awaited Joscha 2021-05-11 00:28:45 +02:00
  • 0459ed093e Add simple authenticator Joscha 2021-05-11 00:27:43 +02:00
  • d5f29f01c5 Use global conductor instance Joscha 2021-05-10 23:50:16 +02:00
  • 595ba8b7ab Remove dummy crawler Joscha 2021-05-10 23:47:46 +02:00
  • cec0a8e1fc Fix mymy errors Joscha 2021-05-09 01:45:01 +02:00
  • f9b2fd60e2 Document local crawler and auth Joscha 2021-05-09 01:33:47 +02:00
  • 60cd9873bc Add local file crawler Joscha 2021-05-06 01:02:40 +02:00
  • 273d56c39a Properly load crawler config Joscha 2021-05-05 23:45:10 +02:00
  • 5497dd2827 Add @noncritical and @repeat decorators Joscha 2021-05-05 23:36:54 +02:00
  • bbfdadc463 Implement output directory Joscha 2021-05-05 18:08:34 +02:00
  • fde811ae5a Document on_conflict option Joscha 2021-05-05 00:55:55 +02:00
  • 07e831218e Add sync report Joscha 2021-05-02 00:56:10 +02:00
  • 91c33596da Load crawlers from config file Joscha 2021-04-30 16:22:14 +02:00
  • a8dcf941b9 Document possible redownload settings Joscha 2021-04-30 15:32:56 +02:00
  • e7a51decb0 Elaborate on transforms and implement changes Joscha 2021-04-29 20:13:46 +02:00
  • 9ec19be113 Document config file format Joscha 2021-04-29 18:55:08 +02:00
  • f776186480 Use PurePath instead of Path Joscha 2021-04-29 16:52:00 +02:00
  • 0096d83387 Simplify Limiter implementation Joscha 2021-04-29 16:37:42 +02:00
  • 20a24dbcbf Add changelog Joscha 2021-04-29 16:14:50 +02:00
  • 502654d853 Fix mypy errors Joscha 2021-04-29 15:47:52 +02:00
  • d2103d7c44 Document crawler Joscha 2021-04-29 15:43:20 +02:00
  • d96a361325 Test and fix exclusive output Joscha 2021-04-29 15:26:10 +02:00
  • 2e85d26b6b Use conductor via context manager Joscha 2021-04-29 14:23:28 +02:00
  • 6431a3fb3d Fix some mypy errors Joscha 2021-04-29 14:23:09 +02:00
  • ac3bfd7388 Make progress bars easier to use Joscha 2021-04-29 13:53:16 +02:00
  • 3ea86d18a0 Jerry-rig DummyCrawler to run Joscha 2021-04-29 13:45:04 +02:00
  • bbc792f9fb Implement Crawler and DummyCrawler Joscha 2021-04-29 13:44:29 +02:00
  • 7e127cd5cc Clean up and fix conductor and limiter Joscha 2021-04-29 13:43:50 +02:00
  • c4fb92c658 Make type hints compatible with Python 3.8 Joscha 2021-04-29 13:11:58 +02:00
  • 8da1ac6cee Extend mypy config Joscha 2021-04-29 11:25:13 +02:00
  • a18db57e6f Implement terminal conductor Joscha 2021-04-29 11:25:00 +02:00
  • b915e393dd Implement limiter Joscha 2021-04-29 10:24:28 +02:00
  • 3a74c23d09 Implement transformer Joscha 2021-04-29 09:51:25 +02:00
  • fbebc46c58 Load and dump config Joscha 2021-04-27 12:41:49 +02:00
  • cccd68e04a Bump version to v2.6.2 v2.6.2 v2 I-Al-Istannen 2021-04-29 00:18:26 +02:00
  • 2bd40a5f30 Fix -p and -u flags I-Al-Istannen 2021-04-29 00:15:12 +02:00
  • 5595a908d8 Configure entry point Joscha 2021-04-27 00:29:42 +02:00
  • 27e4abcfa3 Do project setup from scratch Joscha 2021-04-26 23:46:44 +02:00
  • 2ca1101326 Fix typo in sync_url I-Al-Istannen 2021-04-19 14:53:16 +02:00
  • c1ab7485e2 Bump version to 2.6.1 v2.6.1 I-Al-Istannen 2021-04-19 11:21:56 +02:00
  • 29cd5d1a3c Reflect totality of sanitize_windows_path in return type I-Al-Istannen 2021-04-19 11:10:02 +02:00
  • 6d5d9333ad Force folder to be file-system path I-Al-Istannen 2021-04-19 11:07:25 +02:00
  • 7cc40595dc Allow synchronizing to directory "." I-Al-Istannen 2021-04-14 20:25:25 +02:00
  • 80ae5ddfaa Bump version to v2.6.0 v2.6.0 I-Al-Istannen 2021-04-14 19:47:41 +02:00
  • 4f480d117e Install keyring in CI I-Al-Istannen 2021-04-14 19:24:05 +02:00
  • 1f2af3a290 Retry on more I/O Errors I-Al-Istannen 2021-04-13 11:32:55 +02:00
  • 14cdfb6a69 Fix typo in date demangler doc I-Al-Istannen 2021-04-13 11:19:51 +02:00
  • e2bf84392b [sync_url] Properly declare "no-videos" as flag I-Al-Istannen 2021-04-08 18:12:27 +02:00
  • 946b7a7931 Also crawl .c/.java/.zip from IPD page I-Al-Istannen 2021-02-09 12:30:59 +01:00
  • 9a9018751e Bump version v2.5.4 I-Al-Istannen 2021-02-06 22:54:05 +01:00
  • 83b75e8254 syncurl: Sanitize element name on windows if it is used as folder name I-Al-Istannen 2021-02-06 22:51:08 +01:00
  • 35c3fa205d
    Fixed description of activating venv (#22) Toorero 2021-01-28 21:24:09 +01:00