Joscha 
							
						 
					 
					
						
						
							
						
						b7a999bc2e 
					 
					
						
						
							
							Clean up crawler exceptions and (a)noncritical  
						
						 
						
						
						
						
					 
					
						2021-05-19 13:25:57 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						3851065500 
					 
					
						
						
							
							Fix local crawler's download bars  
						
						 
						
						... 
						
						
						
						Display the pure path instead of the local path. 
						
						
					 
					
						2021-05-18 23:23:40 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						4b68fa771f 
					 
					
						
						
							
							Move logging logic to singleton  
						
						 
						
						... 
						
						
						
						- Renamed module and class because "conductor" didn't make a lot of sense
- Used singleton approach (there's only one stdout after all)
- Redesigned progress bars (now with download speed!) 
						
						
					 
					
						2021-05-18 22:45:19 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						1525aa15a6 
					 
					
						
						
							
							Fix link template error and use indeterminate progress bar  
						
						 
						
						
						
						
					 
					
						2021-05-18 22:40:28 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						db1219d4a9 
					 
					
						
						
							
							Create a link file in ILIAS crawler  
						
						 
						
						... 
						
						
						
						This allows us to crawl links and represent them in the file system.
Users can choose between an ILIAS-imitation (that optionally
auto-redirects) and a plain text variant. 
						
						
					 
					
						2021-05-17 21:44:54 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						b8efcc2ca5 
					 
					
						
						
							
							Respect filters in ILIAS crawler  
						
						 
						
						
						
						
					 
					
						2021-05-17 21:30:26 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						0bae009189 
					 
					
						
						
							
							Run formatting tools  
						
						 
						
						
						
						
					 
					
						2021-05-16 14:32:53 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						3efec53f51 
					 
					
						
						
							
							Configure code checking and formatting tools  
						
						 
						
						... 
						
						
						
						Checking
- mypy
- flake8 (which uses pyflakes and pycodestyle)
Formatting
- autopep8
- isort 
						
						
					 
					
						2021-05-16 14:31:43 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						8b76ebb3ef 
					 
					
						
						
							
							Rename IliasCrawler to KitIliasCrawler  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:28:06 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						467ea3a37e 
					 
					
						
						
							
							Document ILIAS-Crawler arguments in CONFIG.md  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:26:58 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						2b6235dc78 
					 
					
						
						
							
							Fix pylint warnings (and 2 found bugs) in ILIAS crawler  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:17:12 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						cd5aa61834 
					 
					
						
						
							
							Set max line length for pylint  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:17:01 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						5ccb17622e 
					 
					
						
						
							
							Configure pycodestyle to use a max line length of 110  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:01:56 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						1c226c31aa 
					 
					
						
						
							
							Add some repeat annotations to the ILIAS crawler  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:01:56 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						9ec0d3e16a 
					 
					
						
						
							
							Implement date-demangling in ILIAS crawler  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:01:56 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						cf6903d109 
					 
					
						
						
							
							Retry crawling on I/O failure  
						
						 
						
						
						
						
					 
					
						2021-05-16 13:01:56 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						9fd356d290 
					 
					
						
						
							
							Ensure tmp files are deleted  
						
						 
						
						... 
						
						
						
						This doesn't seem to fix the case where an exception bubbles up to the top of
the event loop. It also doesn't seem to fix the case when a KeyboardInterrupt is
thrown, since that never makes its way into the event loop in the first place.
Both of these cases lead to the event loop stopping, which means that the tmp
file cleanup doesn't get executed even though it's inside a "with" or "finally". 
						
						
					 
					
						2021-05-15 23:00:40 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						989032fe0c 
					 
					
						
						
							
							Fix cookies getting deleted  
						
						 
						
						
						
						
					 
					
						2021-05-15 22:25:48 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						05573ccc53 
					 
					
						
						
							
							Add fancy CLI options  
						
						 
						
						
						
						
					 
					
						2021-05-15 22:22:01 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						c454fabc9d 
					 
					
						
						
							
							Add support for exercises in ILIAS crawler  
						
						 
						
						
						
						
					 
					
						2021-05-15 21:40:17 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						7d323ec62b 
					 
					
						
						
							
							Implement video downloads in ilias crawler  
						
						 
						
						
						
						
					 
					
						2021-05-15 21:32:32 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						c7494e32ce 
					 
					
						
						
							
							Start implementing crawling in ILIAS crawler  
						
						 
						
						... 
						
						
						
						The ilias crawler can now crawl quite a few filetypes, splits off
folders and crawls them concurrently. 
						
						
					 
					
						2021-05-15 20:42:18 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						1123c8884d 
					 
					
						
						
							
							Implement an IliasPage  
						
						 
						
						... 
						
						
						
						This allows PFERD to semantically understand ILIAS HTML and is the
foundation for the ILIAS crawler. This patch extends the ILIAS crawler
to crawl the personal desktop and print the elements on it. 
						
						
					 
					
						2021-05-15 18:59:23 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						e1104f888d 
					 
					
						
						
							
							Add tfa authenticator  
						
						 
						
						
						
						
					 
					
						2021-05-15 18:27:16 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						8c32da7f19 
					 
					
						
						
							
							Let authenticators provide username and password separately  
						
						 
						
						
						
						
					 
					
						2021-05-15 18:27:03 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						d63494908d 
					 
					
						
						
							
							Properly invalidate exceptions  
						
						 
						
						... 
						
						
						
						The simple authenticator now properly invalidates its credentials. Also, the
invalidation functions have been given better names and documentation. 
						
						
					 
					
						2021-05-15 17:37:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						b70b62cef5 
					 
					
						
						
							
							Make crawler sections start with "crawl:"  
						
						 
						
						... 
						
						
						
						Also, use only the part of the section name after the "crawl:" as the crawler's
output directory. Now, the implementation matches the documentation again 
						
						
					 
					
						2021-05-15 17:24:37 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						868f486922 
					 
					
						
						
							
							Rename local crawler path to target  
						
						 
						
						
						
						
					 
					
						2021-05-15 17:12:25 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						b2a2b5999b 
					 
					
						
						
							
							Implement ILIAS auth and crawl home page  
						
						 
						
						... 
						
						
						
						This commit introduces the necessary machinery to authenticate with
ILIAS and crawl the home page.
It can't do much yet and just silently fetches the homepage. 
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						595de88d96 
					 
					
						
						
							
							Fix authenticator and crawler names  
						
						 
						
						... 
						
						
						
						Now, the "auth:" and "crawl:" parts are considered part of the name. This fixes
crawlers not being able to find their authenticators. 
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						a6fdf05ee9 
					 
					
						
						
							
							Allow variable whitespace in arrow rules  
						
						 
						
						
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						f897d7c2e1 
					 
					
						
						
							
							Add name variants for all arrows  
						
						 
						
						
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						b0f731bf84 
					 
					
						
						
							
							Make crawlers use transformers  
						
						 
						
						
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						302b8c0c34 
					 
					
						
						
							
							Fix errors loading local crawler config  
						
						 
						
						... 
						
						
						
						Apparently getint and getfloat may return a None even though this is not
mentioned in their type annotations. 
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						acd674f0a0 
					 
					
						
						
							
							Change limiter logic  
						
						 
						
						... 
						
						
						
						Now download tasks are a subset of all tasks. 
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						b0f9e1e8b4 
					 
					
						
						
							
							Add vscode directory to gitignore  
						
						 
						
						
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						ed2e19a150 
					 
					
						
						
							
							Add reasons for invalid values  
						
						 
						
						
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						296a169dd3 
					 
					
						
						
							
							Make limiter logic more complex  
						
						 
						
						... 
						
						
						
						The limiter can now distinguish between crawl and download actions and has a
fancy slot system and delay logic. 
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						1591cb9197 
					 
					
						
						
							
							Add options to slow down local crawler  
						
						 
						
						... 
						
						
						
						These options are meant to make the local crawler behave more like a
network-based crawler for purposes of testing and debugging other parts of the
code base. 
						
						
					 
					
						2021-05-15 15:25:01 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						0c9167512c 
					 
					
						
						
							
							Fix output dir  
						
						 
						
						... 
						
						
						
						I missed these while renaming the resolve function. Shame on me for not running
mypy earlier. 
						
						
					 
					
						2021-05-14 21:28:38 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						a673ab0fae 
					 
					
						
						
							
							Delete old files  
						
						 
						
						... 
						
						
						
						I should've done this earlier 
						
						
					 
					
						2021-05-14 21:27:44 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						6e5fdf4e9e 
					 
					
						
						
							
							Set user agent to "pferd/<version>"  
						
						 
						
						
						
						
					 
					
						2021-05-14 21:27:44 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						93a5a94dab 
					 
					
						
						
							
							Single-source version number  
						
						 
						
						
						
						
					 
					
						2021-05-14 21:27:44 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						d565df27b3 
					 
					
						
						
							
							Add HttpCrawler  
						
						 
						
						
						
						
					 
					
						2021-05-13 22:28:14 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						961f40f9a1 
					 
					
						
						
							
							Document simple authenticator  
						
						 
						
						
						
						
					 
					
						2021-05-13 19:55:04 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						e3ee4e515d 
					 
					
						
						
							
							Disable highlighting of primitives  
						
						 
						
						... 
						
						
						
						This commit prevents rich from highlighting python-looking syntax like numbers,
arrays, 'None' etc. 
						
						
					 
					
						2021-05-13 19:47:44 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						94d6a01cca 
					 
					
						
						
							
							Use file mtime in local crawler  
						
						 
						
						
						
						
					 
					
						2021-05-13 19:42:40 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						38bb66a776 
					 
					
						
						
							
							Update file metadata in more cases  
						
						 
						
						... 
						
						
						
						PFERD now not only updates file metadata when a file is successfully added or
changed, but also when a file is downloaded and then detected to be unchanged.
This could occur for example if a remote file's modification time was bumped,
possibly because somebody touched the file without changing it. 
						
						
					 
					
						2021-05-13 19:40:10 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						68781a88ab 
					 
					
						
						
							
							Fix asynchronous methods being not awaited  
						
						 
						
						
						
						
					 
					
						2021-05-13 19:39:49 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						910462bb72 
					 
					
						
						
							
							Log stuff happening to files  
						
						 
						
						
						
						
					 
					
						2021-05-13 19:37:27 +02:00