Joscha 
							
						 
					 
					
						
						
							
						
						b0f731bf84 
					 
					
						
						
							
							Make crawlers use transformers  
						
						 
						
						
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						302b8c0c34 
					 
					
						
						
							
							Fix errors loading local crawler config  
						
						 
						
						... 
						
						
						
						Apparently getint and getfloat may return a None even though this is not
mentioned in their type annotations. 
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						acd674f0a0 
					 
					
						
						
							
							Change limiter logic  
						
						 
						
						... 
						
						
						
						Now download tasks are a subset of all tasks. 
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								I-Al-Istannen 
							
						 
					 
					
						
						
							
						
						b0f9e1e8b4 
					 
					
						
						
							
							Add vscode directory to gitignore  
						
						 
						
						
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						ed2e19a150 
					 
					
						
						
							
							Add reasons for invalid values  
						
						 
						
						
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						296a169dd3 
					 
					
						
						
							
							Make limiter logic more complex  
						
						 
						
						... 
						
						
						
						The limiter can now distinguish between crawl and download actions and has a
fancy slot system and delay logic. 
						
						
					 
					
						2021-05-15 15:25:05 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						1591cb9197 
					 
					
						
						
							
							Add options to slow down local crawler  
						
						 
						
						... 
						
						
						
						These options are meant to make the local crawler behave more like a
network-based crawler for purposes of testing and debugging other parts of the
code base. 
						
						
					 
					
						2021-05-15 15:25:01 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						0c9167512c 
					 
					
						
						
							
							Fix output dir  
						
						 
						
						... 
						
						
						
						I missed these while renaming the resolve function. Shame on me for not running
mypy earlier. 
						
						
					 
					
						2021-05-14 21:28:38 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						a673ab0fae 
					 
					
						
						
							
							Delete old files  
						
						 
						
						... 
						
						
						
						I should've done this earlier 
						
						
					 
					
						2021-05-14 21:27:44 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						6e5fdf4e9e 
					 
					
						
						
							
							Set user agent to "pferd/<version>"  
						
						 
						
						
						
						
					 
					
						2021-05-14 21:27:44 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						93a5a94dab 
					 
					
						
						
							
							Single-source version number  
						
						 
						
						
						
						
					 
					
						2021-05-14 21:27:44 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						d565df27b3 
					 
					
						
						
							
							Add HttpCrawler  
						
						 
						
						
						
						
					 
					
						2021-05-13 22:28:14 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						961f40f9a1 
					 
					
						
						
							
							Document simple authenticator  
						
						 
						
						
						
						
					 
					
						2021-05-13 19:55:04 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						e3ee4e515d 
					 
					
						
						
							
							Disable highlighting of primitives  
						
						 
						
						... 
						
						
						
						This commit prevents rich from highlighting python-looking syntax like numbers,
arrays, 'None' etc. 
						
						
					 
					
						2021-05-13 19:47:44 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						94d6a01cca 
					 
					
						
						
							
							Use file mtime in local crawler  
						
						 
						
						
						
						
					 
					
						2021-05-13 19:42:40 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						38bb66a776 
					 
					
						
						
							
							Update file metadata in more cases  
						
						 
						
						... 
						
						
						
						PFERD now not only updates file metadata when a file is successfully added or
changed, but also when a file is downloaded and then detected to be unchanged.
This could occur for example if a remote file's modification time was bumped,
possibly because somebody touched the file without changing it. 
						
						
					 
					
						2021-05-13 19:40:10 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						68781a88ab 
					 
					
						
						
							
							Fix asynchronous methods being not awaited  
						
						 
						
						
						
						
					 
					
						2021-05-13 19:39:49 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						910462bb72 
					 
					
						
						
							
							Log stuff happening to files  
						
						 
						
						
						
						
					 
					
						2021-05-13 19:37:27 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						6bd6adb977 
					 
					
						
						
							
							Fix tmp file names  
						
						 
						
						
						
						
					 
					
						2021-05-13 19:36:46 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						0acdee15a0 
					 
					
						
						
							
							Let crawlers obtain authenticators  
						
						 
						
						
						
						
					 
					
						2021-05-13 18:57:20 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						c3ce6bb31c 
					 
					
						
						
							
							Fix crawler cleanup not being awaited  
						
						 
						
						
						
						
					 
					
						2021-05-11 00:28:45 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						0459ed093e 
					 
					
						
						
							
							Add simple authenticator  
						
						 
						
						... 
						
						
						
						... including some required authenticator infrastructure 
						
						
					 
					
						2021-05-11 00:28:03 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						d5f29f01c5 
					 
					
						
						
							
							Use global conductor instance  
						
						 
						
						... 
						
						
						
						The switch from crawler-local conductors to a single pferd-global conductor was
made to prepare for auth section credential providers. 
						
						
					 
					
						2021-05-11 00:05:04 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						595ba8b7ab 
					 
					
						
						
							
							Remove dummy crawler  
						
						 
						
						
						
						
					 
					
						2021-05-10 23:47:46 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						cec0a8e1fc 
					 
					
						
						
							
							Fix mymy errors  
						
						 
						
						
						
						
					 
					
						2021-05-09 01:45:01 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						f9b2fd60e2 
					 
					
						
						
							
							Document local crawler and auth  
						
						 
						
						
						
						
					 
					
						2021-05-09 01:33:47 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						60cd9873bc 
					 
					
						
						
							
							Add local file crawler  
						
						 
						
						
						
						
					 
					
						2021-05-06 01:02:40 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						273d56c39a 
					 
					
						
						
							
							Properly load crawler config  
						
						 
						
						
						
						
					 
					
						2021-05-05 23:45:10 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						5497dd2827 
					 
					
						
						
							
							Add @noncritical and @repeat decorators  
						
						 
						
						
						
						
					 
					
						2021-05-05 23:36:54 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						bbfdadc463 
					 
					
						
						
							
							Implement output directory  
						
						 
						
						
						
						
					 
					
						2021-05-05 18:08:34 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						fde811ae5a 
					 
					
						
						
							
							Document on_conflict option  
						
						 
						
						
						
						
					 
					
						2021-05-05 12:24:35 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						07e831218e 
					 
					
						
						
							
							Add sync report  
						
						 
						
						
						
						
					 
					
						2021-05-02 00:56:10 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						91c33596da 
					 
					
						
						
							
							Load crawlers from config file  
						
						 
						
						
						
						
					 
					
						2021-04-30 16:22:14 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						a8dcf941b9 
					 
					
						
						
							
							Document possible redownload settings  
						
						 
						
						
						
						
					 
					
						2021-04-30 15:32:56 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						e7a51decb0 
					 
					
						
						
							
							Elaborate on transforms and implement changes  
						
						 
						
						
						
						
					 
					
						2021-04-29 20:24:18 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						9ec19be113 
					 
					
						
						
							
							Document config file format  
						
						 
						
						
						
						
					 
					
						2021-04-29 20:24:18 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						f776186480 
					 
					
						
						
							
							Use PurePath instead of Path  
						
						 
						
						... 
						
						
						
						Path should only be used when we need to access the file system. For all other
purposes (mainly crawling), we use PurePath instead since the paths don't
correspond to paths in the local file system. 
						
						
					 
					
						2021-04-29 20:20:25 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						0096d83387 
					 
					
						
						
							
							Simplify Limiter implementation  
						
						 
						
						
						
						
					 
					
						2021-04-29 20:20:25 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						20a24dbcbf 
					 
					
						
						
							
							Add changelog  
						
						 
						
						
						
						
					 
					
						2021-04-29 20:20:25 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						502654d853 
					 
					
						
						
							
							Fix mypy errors  
						
						 
						
						
						
						
					 
					
						2021-04-29 15:47:52 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						d2103d7c44 
					 
					
						
						
							
							Document crawler  
						
						 
						
						
						
						
					 
					
						2021-04-29 15:43:20 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						d96a361325 
					 
					
						
						
							
							Test and fix exclusive output  
						
						 
						
						
						
						
					 
					
						2021-04-29 15:27:16 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						2e85d26b6b 
					 
					
						
						
							
							Use conductor via context manager  
						
						 
						
						
						
						
					 
					
						2021-04-29 14:23:28 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						6431a3fb3d 
					 
					
						
						
							
							Fix some mypy errors  
						
						 
						
						
						
						
					 
					
						2021-04-29 14:23:09 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						ac3bfd7388 
					 
					
						
						
							
							Make progress bars easier to use  
						
						 
						
						... 
						
						
						
						The crawler now supports two types of progress bars 
						
						
					 
					
						2021-04-29 13:53:16 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						3ea86d18a0 
					 
					
						
						
							
							Jerry-rig DummyCrawler to run  
						
						 
						
						
						
						
					 
					
						2021-04-29 13:45:04 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						bbc792f9fb 
					 
					
						
						
							
							Implement Crawler and DummyCrawler  
						
						 
						
						
						
						
					 
					
						2021-04-29 13:44:29 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						7e127cd5cc 
					 
					
						
						
							
							Clean up and fix conductor and limiter  
						
						 
						
						... 
						
						
						
						Turns out you have to await an async lock, who knew... 
						
						
					 
					
						2021-04-29 13:44:04 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						c4fb92c658 
					 
					
						
						
							
							Make type hints compatible with Python 3.8  
						
						 
						
						
						
						
					 
					
						2021-04-29 13:11:58 +02:00  
					
					
						 
						
						
							
							
							
							
							
							 
						
					 
				 
			
				
					
						
							
							
								 
								Joscha 
							
						 
					 
					
						
						
							
						
						8da1ac6cee 
					 
					
						
						
							
							Extend mypy config  
						
						 
						
						
						
						
					 
					
						2021-04-29 11:44:47 +02:00