pferd

mirror of https://github.com/Garmelon/PFERD.git synced 2026-02-18 23:02:23 +01:00

Author	SHA1	Message	Date
I-Al-Istannen	83d12fcf2d	Add some explains to ilias crawler and use crawler exceptions	2021-05-20 14:58:54 +02:00
I-Al-Istannen	e4f9560655	Only retry on aiohttp errors in ILIAS crawler This patch removes quite a few retries and now only retries the ilias element method. Every other HTTP-interacting method (except for the root requests) is called from there and should be covered. In the future we also want to retry the root a few times, but that will be done after the download sink API is adjusted.	2021-05-19 22:01:09 +02:00
I-Al-Istannen	8cfa818f04	Only call should_crawl once	2021-05-19 21:57:55 +02:00
I-Al-Istannen	81301f3a76	Rename the ilias crawler to ilias web crawler	2021-05-19 21:41:17 +02:00
I-Al-Istannen	2976b4d352	Move ILIAS file templates to own file	2021-05-19 21:37:10 +02:00
I-Al-Istannen	9f03702e69	Split up ilias crawler in multiple files The ilias crawler contained a crawler and an HTML parser, now they are split in two.	2021-05-19 21:34:36 +02:00
Joscha	3300886120	Explain config file loading	2021-05-19 18:11:43 +02:00
Joscha	0d10752b5a	Configure explain log level via cli and config file	2021-05-19 17:50:10 +02:00
Joscha	92886fb8d8	Implement --version flag	2021-05-19 17:33:36 +02:00
Joscha	5916626399	Make noqua comment more specific	2021-05-19 17:16:59 +02:00
Joscha	a7c025fd86	Implement reusable FileSinkToken for OutputDirectory	2021-05-19 17:16:23 +02:00
Joscha	b7a999bc2e	Clean up crawler exceptions and (a)noncritical	2021-05-19 13:25:57 +02:00
Joscha	3851065500	Fix local crawler's download bars Display the pure path instead of the local path.	2021-05-18 23:23:40 +02:00
Joscha	4b68fa771f	Move logging logic to singleton - Renamed module and class because "conductor" didn't make a lot of sense - Used singleton approach (there's only one stdout after all) - Redesigned progress bars (now with download speed!)	2021-05-18 22:45:19 +02:00
I-Al-Istannen	1525aa15a6	Fix link template error and use indeterminate progress bar	2021-05-18 22:40:28 +02:00
I-Al-Istannen	db1219d4a9	Create a link file in ILIAS crawler This allows us to crawl links and represent them in the file system. Users can choose between an ILIAS-imitation (that optionally auto-redirects) and a plain text variant.	2021-05-17 21:44:54 +02:00
I-Al-Istannen	b8efcc2ca5	Respect filters in ILIAS crawler	2021-05-17 21:30:26 +02:00
Joscha	0bae009189	Run formatting tools	2021-05-16 14:32:53 +02:00
I-Al-Istannen	8b76ebb3ef	Rename IliasCrawler to KitIliasCrawler	2021-05-16 13:28:06 +02:00
I-Al-Istannen	2b6235dc78	Fix pylint warnings (and 2 found bugs) in ILIAS crawler	2021-05-16 13:17:12 +02:00
I-Al-Istannen	1c226c31aa	Add some repeat annotations to the ILIAS crawler	2021-05-16 13:01:56 +02:00
I-Al-Istannen	9ec0d3e16a	Implement date-demangling in ILIAS crawler	2021-05-16 13:01:56 +02:00
I-Al-Istannen	cf6903d109	Retry crawling on I/O failure	2021-05-16 13:01:56 +02:00
Joscha	9fd356d290	Ensure tmp files are deleted This doesn't seem to fix the case where an exception bubbles up to the top of the event loop. It also doesn't seem to fix the case when a KeyboardInterrupt is thrown, since that never makes its way into the event loop in the first place. Both of these cases lead to the event loop stopping, which means that the tmp file cleanup doesn't get executed even though it's inside a "with" or "finally".	2021-05-15 23:00:40 +02:00
Joscha	989032fe0c	Fix cookies getting deleted	2021-05-15 22:25:48 +02:00
Joscha	05573ccc53	Add fancy CLI options	2021-05-15 22:22:01 +02:00
I-Al-Istannen	c454fabc9d	Add support for exercises in ILIAS crawler	2021-05-15 21:40:17 +02:00
I-Al-Istannen	7d323ec62b	Implement video downloads in ilias crawler	2021-05-15 21:32:32 +02:00
I-Al-Istannen	c7494e32ce	Start implementing crawling in ILIAS crawler The ilias crawler can now crawl quite a few filetypes, splits off folders and crawls them concurrently.	2021-05-15 20:42:18 +02:00
I-Al-Istannen	1123c8884d	Implement an IliasPage This allows PFERD to semantically understand ILIAS HTML and is the foundation for the ILIAS crawler. This patch extends the ILIAS crawler to crawl the personal desktop and print the elements on it.	2021-05-15 18:59:23 +02:00
Joscha	e1104f888d	Add tfa authenticator	2021-05-15 18:27:16 +02:00
Joscha	8c32da7f19	Let authenticators provide username and password separately	2021-05-15 18:27:03 +02:00
Joscha	d63494908d	Properly invalidate exceptions The simple authenticator now properly invalidates its credentials. Also, the invalidation functions have been given better names and documentation.	2021-05-15 17:37:05 +02:00
Joscha	b70b62cef5	Make crawler sections start with "crawl:" Also, use only the part of the section name after the "crawl:" as the crawler's output directory. Now, the implementation matches the documentation again	2021-05-15 17:24:37 +02:00
Joscha	868f486922	Rename local crawler path to target	2021-05-15 17:12:25 +02:00
I-Al-Istannen	b2a2b5999b	Implement ILIAS auth and crawl home page This commit introduces the necessary machinery to authenticate with ILIAS and crawl the home page. It can't do much yet and just silently fetches the homepage.	2021-05-15 15:25:05 +02:00
Joscha	595de88d96	Fix authenticator and crawler names Now, the "auth:" and "crawl:" parts are considered part of the name. This fixes crawlers not being able to find their authenticators.	2021-05-15 15:25:05 +02:00
Joscha	a6fdf05ee9	Allow variable whitespace in arrow rules	2021-05-15 15:25:05 +02:00
Joscha	f897d7c2e1	Add name variants for all arrows	2021-05-15 15:25:05 +02:00
Joscha	b0f731bf84	Make crawlers use transformers	2021-05-15 15:25:05 +02:00
Joscha	302b8c0c34	Fix errors loading local crawler config Apparently getint and getfloat may return a None even though this is not mentioned in their type annotations.	2021-05-15 15:25:05 +02:00
Joscha	acd674f0a0	Change limiter logic Now download tasks are a subset of all tasks.	2021-05-15 15:25:05 +02:00
Joscha	ed2e19a150	Add reasons for invalid values	2021-05-15 15:25:05 +02:00
Joscha	296a169dd3	Make limiter logic more complex The limiter can now distinguish between crawl and download actions and has a fancy slot system and delay logic.	2021-05-15 15:25:05 +02:00
Joscha	1591cb9197	Add options to slow down local crawler These options are meant to make the local crawler behave more like a network-based crawler for purposes of testing and debugging other parts of the code base.	2021-05-15 15:25:01 +02:00
Joscha	0c9167512c	Fix output dir I missed these while renaming the resolve function. Shame on me for not running mypy earlier.	2021-05-14 21:28:38 +02:00
Joscha	a673ab0fae	Delete old files I should've done this earlier	2021-05-14 21:27:44 +02:00
Joscha	6e5fdf4e9e	Set user agent to "pferd/<version>"	2021-05-14 21:27:44 +02:00
Joscha	93a5a94dab	Single-source version number	2021-05-14 21:27:44 +02:00
Joscha	d565df27b3	Add HttpCrawler	2021-05-13 22:28:14 +02:00

1 2 3 4 5 ...

279 Commits