Joscha 
							
						 
					 
					
						
						
							
						
						b5785f260e 
					 
					
						
						
							
							Extract CLI argument parsing to separate module  
						
						 
						
						
						
						
					 
					
						2021-05-22 15:03:45 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						98b8ca31fa 
					 
					
						
						
							
							Add some todos  
						
						 
						
						
						
						
					 
					
						2021-05-22 14:45:46 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						4b104b6252 
					 
					
						
						
							
							Try out some HTTP authentication handling  
						
						 
						
						... 
						
						
						
						This is by no means final yet and will change a bit once the dl and cl
are changed, but it might serve as a first try. It is also wholly
untested. 
						
						
					 
					
						2021-05-21 12:02:51 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						83d12fcf2d 
					 
					
						
						
							
							Add some explains to ilias crawler and use crawler exceptions  
						
						 
						
						
						
						
					 
					
						2021-05-20 14:58:54 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						e4f9560655 
					 
					
						
						
							
							Only retry on aiohttp errors in ILIAS crawler  
						
						 
						
						... 
						
						
						
						This patch removes quite a few retries and now only retries the ilias
element method. Every other HTTP-interacting method (except for the root
requests) is called from there and should be covered.
In the future we also want to retry the root a few times, but that
will be done after the download sink API is adjusted. 
						
						
					 
					
						2021-05-19 22:01:09 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						8cfa818f04 
					 
					
						
						
							
							Only call should_crawl once  
						
						 
						
						
						
						
					 
					
						2021-05-19 21:57:55 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						81301f3a76 
					 
					
						
						
							
							Rename the ilias crawler to ilias web crawler  
						
						 
						
						
						
						
					 
					
						2021-05-19 21:41:17 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						2976b4d352 
					 
					
						
						
							
							Move ILIAS file templates to own file  
						
						 
						
						
						
						
					 
					
						2021-05-19 21:37:10 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						9f03702e69 
					 
					
						
						
							
							Split up ilias crawler in multiple files  
						
						 
						
						... 
						
						
						
						The ilias crawler contained a crawler and an HTML parser, now they are
split in two. 
						
						
					 
					
						2021-05-19 21:34:36 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						3300886120 
					 
					
						
						
							
							Explain config file loading  
						
						 
						
						
						
						
					 
					
						2021-05-19 18:11:43 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						0d10752b5a 
					 
					
						
						
							
							Configure explain log level via cli and config file  
						
						 
						
						
						
						
					 
					
						2021-05-19 17:50:10 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						92886fb8d8 
					 
					
						
						
							
							Implement --version flag  
						
						 
						
						
						
						
					 
					
						2021-05-19 17:33:36 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						5916626399 
					 
					
						
						
							
							Make noqua comment more specific  
						
						 
						
						
						
						
					 
					
						2021-05-19 17:16:59 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						a7c025fd86 
					 
					
						
						
							
							Implement reusable FileSinkToken for OutputDirectory  
						
						 
						
						
						
						
					 
					
						2021-05-19 17:16:23 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						b7a999bc2e 
					 
					
						
						
							
							Clean up crawler exceptions and (a)noncritical  
						
						 
						
						
						
						
					 
					
						2021-05-19 13:25:57 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						3851065500 
					 
					
						
						
							
							Fix local crawler's download bars  
						
						 
						
						... 
						
						
						
						Display the pure path instead of the local path. 
						
						
					 
					
						2021-05-18 23:23:40 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						4b68fa771f 
					 
					
						
						
							
							Move logging logic to singleton  
						
						 
						
						... 
						
						
						
						- Renamed module and class because "conductor" didn't make a lot of sense
- Used singleton approach (there's only one stdout after all)
- Redesigned progress bars (now with download speed!) 
						
						
					 
					
						2021-05-18 22:45:19 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						1525aa15a6 
					 
					
						
						
							
							Fix link template error and use indeterminate progress bar  
						
						 
						
						
						
						
					 
					
						2021-05-18 22:40:28 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						db1219d4a9 
					 
					
						
						
							
							Create a link file in ILIAS crawler  
						
						 
						
						... 
						
						
						
						This allows us to crawl links and represent them in the file system.
Users can choose between an ILIAS-imitation (that optionally
auto-redirects) and a plain text variant. 
						
						
					 
					
						2021-05-17 21:44:54 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						b8efcc2ca5 
					 
					
						
						
							
							Respect filters in ILIAS crawler  
						
						 
						
						
						
						
					 
					
						2021-05-17 21:30:26 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						0bae009189 
					 
					
						
						
							
							Run formatting tools  
						
						 
						
						
						
						
					 
					
						2021-05-16 14:32:53 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						3efec53f51 
					 
					
						
						
							
							Configure code checking and formatting tools  
						
						 
						
						... 
						
						
						
						Checking
- mypy
- flake8 (which uses pyflakes and pycodestyle)
Formatting
- autopep8
- isort 
						
						
					 
					
						2021-05-16 14:31:43 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						8b76ebb3ef 
					 
					
						
						
							
							Rename IliasCrawler to KitIliasCrawler  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:28:06 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						467ea3a37e 
					 
					
						
						
							
							Document ILIAS-Crawler arguments in CONFIG.md  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:26:58 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						2b6235dc78 
					 
					
						
						
							
							Fix pylint warnings (and 2 found bugs) in ILIAS crawler  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:17:12 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						cd5aa61834 
					 
					
						
						
							
							Set max line length for pylint  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:17:01 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						5ccb17622e 
					 
					
						
						
							
							Configure pycodestyle to use a max line length of 110  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:01:56 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						1c226c31aa 
					 
					
						
						
							
							Add some repeat annotations to the ILIAS crawler  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:01:56 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						9ec0d3e16a 
					 
					
						
						
							
							Implement date-demangling in ILIAS crawler  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:01:56 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						cf6903d109 
					 
					
						
						
							
							Retry crawling on I/O failure  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:01:56 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						9fd356d290 
					 
					
						
						
							
							Ensure tmp files are deleted  
						
						 
						
						... 
						
						
						
						This doesn't seem to fix the case where an exception bubbles up to the top of
the event loop. It also doesn't seem to fix the case when a KeyboardInterrupt is
thrown, since that never makes its way into the event loop in the first place.
Both of these cases lead to the event loop stopping, which means that the tmp
file cleanup doesn't get executed even though it's inside a "with" or "finally". 
						
						
					 
					
						2021-05-15 23:00:40 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						989032fe0c 
					 
					
						
						
							
							Fix cookies getting deleted  
						
						 
						
						
						
						
					 
					
						2021-05-15 22:25:48 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						05573ccc53 
					 
					
						
						
							
							Add fancy CLI options  
						
						 
						
						
						
						
					 
					
						2021-05-15 22:22:01 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						c454fabc9d 
					 
					
						
						
							
							Add support for exercises in ILIAS crawler  
						
						 
						
						
						
						
					 
					
						2021-05-15 21:40:17 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						7d323ec62b 
					 
					
						
						
							
							Implement video downloads in ilias crawler  
						
						 
						
						
						
						
					 
					
						2021-05-15 21:32:32 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						c7494e32ce 
					 
					
						
						
							
							Start implementing crawling in ILIAS crawler  
						
						 
						
						... 
						
						
						
						The ilias crawler can now crawl quite a few filetypes, splits off
folders and crawls them concurrently. 
						
						
					 
					
						2021-05-15 20:42:18 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						1123c8884d 
					 
					
						
						
							
							Implement an IliasPage  
						
						 
						
						... 
						
						
						
						This allows PFERD to semantically understand ILIAS HTML and is the
foundation for the ILIAS crawler. This patch extends the ILIAS crawler
to crawl the personal desktop and print the elements on it. 
						
						
					 
					
						2021-05-15 18:59:23 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						e1104f888d 
					 
					
						
						
							
							Add tfa authenticator  
						
						 
						
						
						
						
					 
					
						2021-05-15 18:27:16 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						8c32da7f19 
					 
					
						
						
							
							Let authenticators provide username and password separately  
						
						 
						
						
						
						
					 
					
						2021-05-15 18:27:03 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						d63494908d 
					 
					
						
						
							
							Properly invalidate exceptions  
						
						 
						
						... 
						
						
						
						The simple authenticator now properly invalidates its credentials. Also, the
invalidation functions have been given better names and documentation. 
						
						
					 
					
						2021-05-15 17:37:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						b70b62cef5 
					 
					
						
						
							
							Make crawler sections start with "crawl:"  
						
						 
						
						... 
						
						
						
						Also, use only the part of the section name after the "crawl:" as the crawler's
output directory. Now, the implementation matches the documentation again 
						
						
					 
					
						2021-05-15 17:24:37 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						868f486922 
					 
					
						
						
							
							Rename local crawler path to target  
						
						 
						
						
						
						
					 
					
						2021-05-15 17:12:25 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						b2a2b5999b 
					 
					
						
						
							
							Implement ILIAS auth and crawl home page  
						
						 
						
						... 
						
						
						
						This commit introduces the necessary machinery to authenticate with
ILIAS and crawl the home page.
It can't do much yet and just silently fetches the homepage. 
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						595de88d96 
					 
					
						
						
							
							Fix authenticator and crawler names  
						
						 
						
						... 
						
						
						
						Now, the "auth:" and "crawl:" parts are considered part of the name. This fixes
crawlers not being able to find their authenticators. 
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						a6fdf05ee9 
					 
					
						
						
							
							Allow variable whitespace in arrow rules  
						
						 
						
						
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						f897d7c2e1 
					 
					
						
						
							
							Add name variants for all arrows  
						
						 
						
						
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						b0f731bf84 
					 
					
						
						
							
							Make crawlers use transformers  
						
						 
						
						
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						302b8c0c34 
					 
					
						
						
							
							Fix errors loading local crawler config  
						
						 
						
						... 
						
						
						
						Apparently getint and getfloat may return a None even though this is not
mentioned in their type annotations. 
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						acd674f0a0 
					 
					
						
						
							
							Change limiter logic  
						
						 
						
						... 
						
						
						
						Now download tasks are a subset of all tasks. 
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						b0f9e1e8b4 
					 
					
						
						
							
							Add vscode directory to gitignore  
						
						 
						
						
						
						
					 
					
						2021-05-15 15:25:05 +02:00