Joscha 
							
						 
					 
					
						
						
							
						
						29d5a40c57 
					 
					
						
						
							
							Replace asyncio.gather with custom Crawler function  
						
						
						
						
					 
					
						2021-05-23 17:25:16 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						c0cecf8363 
					 
					
						
						
							
							Log crawl and download actions more extensively  
						
						
						
						
					 
					
						2021-05-23 16:25:44 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						b998339002 
					 
					
						
						
							
							Fix cleanup logging of paths  
						
						
						
						
					 
					
						2021-05-23 16:25:44 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						245c9c3dcc 
					 
					
						
						
							
							Explain output dir decisions and steps  
						
						
						
						
					 
					
						2021-05-23 16:25:44 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						d8f26a789e 
					 
					
						
						
							
							Implement CLI Command for ilias crawler  
						
						
						
						
					 
					
						2021-05-23 13:30:42 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						e1d18708b3 
					 
					
						
						
							
							Rename "no_videos" to videos  
						
						
						
						
					 
					
						2021-05-23 13:30:42 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						b44b49476d 
					 
					
						
						
							
							Fix noncritical and anoncritical decorators  
						
						... 
						
						
						
						I must've forgot to update the anoncritical decorator when I last changed the
noncritical decorator. Also, every exception should make the crawler not
error_free, not just CrawlErrors. 
						
						
					 
					
						2021-05-23 13:24:53 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						7e0bb06259 
					 
					
						
						
							
							Clean up TODOs  
						
						
						
						
					 
					
						2021-05-23 12:47:30 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						ecdedfa1cf 
					 
					
						
						
							
							Add no-videos flag to ILIAS crawler  
						
						
						
						
					 
					
						2021-05-23 12:37:01 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						3d4b997d4a 
					 
					
						
						
							
							Retry crawl_url and work around Python's closure handling  
						
						... 
						
						
						
						Closures capture the scope and not the variables. Therefore, any
type-narrowing performed by mypy on captured variables is lost inside
the closure. 
						
						
					 
					
						2021-05-23 12:28:15 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						e81005ae4b 
					 
					
						
						
							
							Fix CLI arguments  
						
						
						
						
					 
					
						2021-05-23 12:24:21 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						33a81a5f5c 
					 
					
						
						
							
							Document authentication in HTTP crawler and rename prepare_request  
						
						
						
						
					 
					
						2021-05-23 11:55:34 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						25e2abdb03 
					 
					
						
						
							
							Improve transformer explain wording  
						
						
						
						
					 
					
						2021-05-23 11:45:14 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						803e5628a2 
					 
					
						
						
							
							Clean up logging  
						
						... 
						
						
						
						Paths are now (hopefully) logged consistently across all crawlers 
						
						
					 
					
						2021-05-23 11:37:19 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						c88f20859a 
					 
					
						
						
							
							Explain config file dumping  
						
						
						
						
					 
					
						2021-05-23 11:04:50 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						ec3767c545 
					 
					
						
						
							
							Create crawler base dir at start of crawl  
						
						
						
						
					 
					
						2021-05-23 10:52:02 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						729ff0a4c7 
					 
					
						
						
							
							Fix simple authenticator output  
						
						
						
						
					 
					
						2021-05-23 10:45:37 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						6fe51e258f 
					 
					
						
						
							
							Number rules starting at 1  
						
						
						
						
					 
					
						2021-05-23 10:45:37 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						44ecb2fbe7 
					 
					
						
						
							
							Fix cleanup deleting crawler's base directory  
						
						
						
						
					 
					
						2021-05-23 10:45:37 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						53e031d9f6 
					 
					
						
						
							
							Reuse dl/cl for I/O retries in ILIAS crawler  
						
						
						
						
					 
					
						2021-05-23 00:28:27 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						8ac85ea0bd 
					 
					
						
						
							
							Fix a few typos in HttpCrawler  
						
						
						
						
					 
					
						2021-05-22 23:37:34 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						adfdc302d7 
					 
					
						
						
							
							Save cookies after successful authentication in HTTP crawler  
						
						
						
						
					 
					
						2021-05-22 23:30:32 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						3053278721 
					 
					
						
						
							
							Move HTTP crawler to own file  
						
						
						
						
					 
					
						2021-05-22 23:23:21 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						4d07de0d71 
					 
					
						
						
							
							Adjust forum log message in ilias crawler  
						
						
						
						
					 
					
						2021-05-22 23:20:21 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						953a1bba93 
					 
					
						
						
							
							Adjust to new crawl / download names  
						
						
						
						
					 
					
						2021-05-22 23:18:05 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						e724ff7c93 
					 
					
						
						
							
							Fix normal arrow  
						
						
						
						
					 
					
						2021-05-22 20:44:59 +00:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						62f0f7bfc5 
					 
					
						
						
							
							Explain crawling and partially explain downloading  
						
						
						
						
					 
					
						2021-05-22 20:39:57 +00:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						9cb2b68f09 
					 
					
						
						
							
							Fix arrow parsing error messages  
						
						
						
						
					 
					
						2021-05-22 20:39:29 +00:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						1bbc0b705f 
					 
					
						
						
							
							Improve transformer error handling  
						
						
						
						
					 
					
						2021-05-22 20:38:56 +00:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						662191eca9 
					 
					
						
						
							
							Fix crash as soon as first cl or dl token was acquired  
						
						
						
						
					 
					
						2021-05-22 20:25:58 +00:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						8fad8edc1e 
					 
					
						
						
							
							Remove duplicated beautifulsoup4 dependency  
						
						
						
						
					 
					
						2021-05-22 20:02:15 +00:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						ae3d80664c 
					 
					
						
						
							
							Update local crawler to new crawler structure  
						
						
						
						
					 
					
						2021-05-22 21:46:36 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						e21795ee35 
					 
					
						
						
							
							Make file cleanup part of default crawler behaviour  
						
						
						
						
					 
					
						2021-05-22 21:45:51 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						ec95dda18f 
					 
					
						
						
							
							Unify crawling and downloading steps  
						
						... 
						
						
						
						Now, the progress bar, limiter etc. for downloading and crawling are all handled
via the reusable CrawlToken and DownloadToken context managers. 
						
						
					 
					
						2021-05-22 21:36:53 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						098ac45758 
					 
					
						
						
							
							Remove deprecated repeat decorators  
						
						
						
						
					 
					
						2021-05-22 21:13:25 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						9889ce6b57 
					 
					
						
						
							
							Improve PFERD error handling  
						
						
						
						
					 
					
						2021-05-22 21:13:25 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						b4d97cd545 
					 
					
						
						
							
							Improve output dir and report error handling  
						
						
						
						
					 
					
						2021-05-22 20:54:42 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						afac22c562 
					 
					
						
						
							
							Handle abort in exclusive output state correctly  
						
						... 
						
						
						
						If the event loop is stopped while something holds the exclusive output, the
"log" singleton is now reset so the main thread can print a few more messages
before exiting. 
						
						
					 
					
						2021-05-22 18:58:19 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						552cd82802 
					 
					
						
						
							
							Run async input and password getters in daemon thread  
						
						... 
						
						
						
						Previously, it ran in the event loop's default executor, which would block until
all its workers were done working.
If Ctrl+C was pressed while input or a password were being read, the
asyncio.run() call in the main thread would be interrupted however, not the
input thread. This meant that multiple key presses (either enter or a second
Ctrl+C) were necessary to stop a running PFERD in some circumstances.
This change instead runs the input functions in daemon threads so they exit as
soon as the main thread exits. 
						
						
					 
					
						2021-05-22 18:37:53 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						dfde0e2310 
					 
					
						
						
							
							Improve reporting of unexpected exceptions  
						
						
						
						
					 
					
						2021-05-22 18:36:25 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						54dd2f8337 
					 
					
						
						
							
							Clean up main and improve error handling  
						
						
						
						
					 
					
						2021-05-22 16:47:24 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						b5785f260e 
					 
					
						
						
							
							Extract CLI argument parsing to separate module  
						
						
						
						
					 
					
						2021-05-22 15:03:45 +02:00 
						 
				 
			
				
					
						
							
							
								Joscha 
							
						 
					 
					
						
						
							
						
						98b8ca31fa 
					 
					
						
						
							
							Add some todos  
						
						
						
						
					 
					
						2021-05-22 14:45:46 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						4b104b6252 
					 
					
						
						
							
							Try out some HTTP authentication handling  
						
						... 
						
						
						
						This is by no means final yet and will change a bit once the dl and cl
are changed, but it might serve as a first try. It is also wholly
untested. 
						
						
					 
					
						2021-05-21 12:02:51 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						83d12fcf2d 
					 
					
						
						
							
							Add some explains to ilias crawler and use crawler exceptions  
						
						
						
						
					 
					
						2021-05-20 14:58:54 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						e4f9560655 
					 
					
						
						
							
							Only retry on aiohttp errors in ILIAS crawler  
						
						... 
						
						
						
						This patch removes quite a few retries and now only retries the ilias
element method. Every other HTTP-interacting method (except for the root
requests) is called from there and should be covered.
In the future we also want to retry the root a few times, but that
will be done after the download sink API is adjusted. 
						
						
					 
					
						2021-05-19 22:01:09 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						8cfa818f04 
					 
					
						
						
							
							Only call should_crawl once  
						
						
						
						
					 
					
						2021-05-19 21:57:55 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						81301f3a76 
					 
					
						
						
							
							Rename the ilias crawler to ilias web crawler  
						
						
						
						
					 
					
						2021-05-19 21:41:17 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						2976b4d352 
					 
					
						
						
							
							Move ILIAS file templates to own file  
						
						
						
						
					 
					
						2021-05-19 21:37:10 +02:00 
						 
				 
			
				
					
						
							
							
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						9f03702e69 
					 
					
						
						
							
							Split up ilias crawler in multiple files  
						
						... 
						
						
						
						The ilias crawler contained a crawler and an HTML parser, now they are
split in two. 
						
						
					 
					
						2021-05-19 21:34:36 +02:00