Commit Graph

164 Commits

Author SHA1 Message Date
I-Al-Istannen
42345ecc61 Demangle "Morgen" too 2020-04-30 12:05:25 +02:00
I-Al-Istannen
920d521d68 Change PrettyLogger.warn to PrettyLogger.warning 2020-04-25 20:11:51 +02:00
I-Al-Istannen
e0b46a306a Use warn method in IliasCrawler 2020-04-25 20:07:40 +02:00
I-Al-Istannen
8a42a2a396 Move logging into its own file 2020-04-25 20:02:01 +02:00
I-Al-Istannen
80247400a4 Debug log when starting an ilias download 2020-04-25 13:02:07 +02:00
Joscha
1aaa6e7ab5 Use PathLike everywhere 2020-04-24 18:41:14 +00:00
Joscha
7f53543324 Satisfy pylint and add todo 2020-04-24 18:26:28 +00:00
Joscha
292e516297 Change crawler and downloader output 2020-04-24 18:24:44 +00:00
Joscha
8258fa8919 Add test run option to PFERD 2020-04-24 18:00:21 +00:00
Joscha
5b929f09a2 Move download strategies to downloader
Also fixes an issue where the downloader didn't mark files that were not
downloaded due to the strategy used.
2020-04-24 14:27:40 +00:00
Joscha
4d32f863bc Clean up organizer after synchronizing 2020-04-24 14:17:23 +00:00
Joscha
4e7333b396 Allow specifying paths as strings in Pferd 2020-04-24 11:50:40 +00:00
I-Al-Istannen
4c0e3b493a Use download_modified_or_new as default strategy 2020-04-24 13:48:06 +02:00
Joscha
2de079a5d3 Add a few Transform combinators 2020-04-24 11:35:46 +00:00
I-Al-Istannen
509e624d47 Satisfy pyling. Useful docstrings? Not quite sure. 2020-04-23 20:35:59 +02:00
I-Al-Istannen
980f69b5af Fix organizer marking itself causing an error 2020-04-23 20:02:05 +02:00
I-Al-Istannen
0b00a9c26b Log when starting to synchronize 2020-04-23 19:56:37 +02:00
Joscha
1ef85c45e5 Switch Transform to PurePath 2020-04-23 17:40:43 +00:00
Joscha
5ef5a56e69 Extract Location into separate file 2020-04-23 17:38:28 +00:00
I-Al-Istannen
f3f4be2690 More free functions 2020-04-23 19:21:49 +02:00
I-Al-Istannen
076b8c5a1f Add download strategies to save bandwith
Only download files that are newer than the local version.
2020-04-23 18:29:20 +02:00
I-Al-Istannen
13bc78c889 Display reason for ignoring an element in ilias crawler 2020-04-23 13:54:58 +02:00
I-Al-Istannen
dc964a9d98 Remove finished TODOs 2020-04-23 13:30:34 +02:00
I-Al-Istannen
c2b14f3db9 ilias crawler: Use direct download link if possible 2020-04-23 13:08:12 +02:00
Joscha
4b59a7c375 Move around TODOs 2020-04-23 10:49:01 +00:00
I-Al-Istannen
bef210ae77 Rename and implement IliasDirectoryFilter 2020-04-23 12:35:18 +02:00
I-Al-Istannen
ea005517cf Only remove folders if they exist in tmpdir 2020-04-23 12:09:45 +02:00
Joscha
df0eb84a44 Fix TmpDir and Location
TmpDir: Clean up before and after, not just after
Location: Resolve path so that parent check works properly
2020-04-23 09:50:32 +00:00
Joscha
2de4255a78 Add Pferd class 2020-04-23 09:50:32 +00:00
Joscha
3c808879c9 Add Transforms and Transformables 2020-04-22 18:25:09 +00:00
I-Al-Istannen
a051e3bcca ilias crawler: Add some unhelpful documentation 2020-04-22 17:58:19 +02:00
I-Al-Istannen
eb7df036df WIP: ilias crawler: Also crawl assignments 2020-04-22 14:32:20 +02:00
I-Al-Istannen
23db59e733 WIP: ilias-crawler: Demangle dates 2020-04-22 12:58:44 +02:00
I-Al-Istannen
ac65b06a8e Satisfy pylint a bit 2020-04-22 01:37:34 +02:00
I-Al-Istannen
8891041069 WIP: crawler: Add opencast video crawler 2020-04-21 23:01:19 +02:00
I-Al-Istannen
70d63e3e90 WIP: Start small ILIAS crawler 2020-04-21 13:32:03 +02:00
I-Al-Istannen
b2a7af2e3e Store modification_date in IliasDownloadInfo, remove parameters 2020-04-21 13:31:50 +02:00
I-Al-Istannen
23bed48c8c Satisfy autopep8 2020-04-21 13:30:42 +02:00
Joscha
0926d33798 Use downloader-specific data classes 2020-04-20 18:07:45 +00:00
I-Al-Istannen
55ba2f4070 Fix pylint in downloaders 2020-04-20 19:49:15 +02:00
I-Al-Istannen
d18b48aaf4 Stream in http downloader 2020-04-20 19:45:25 +02:00
Joscha
4ef0ffe3bf Listen to pylint and mypy 2020-04-20 17:44:58 +00:00
Joscha
ce77995c8f Rename http downloader module 2020-04-20 17:08:51 +00:00
I-Al-Istannen
ed9245c14d Remove old organizer 2020-04-20 18:50:23 +02:00
I-Al-Istannen
01e6972c96 Add ilias downloader 2020-04-20 18:49:01 +02:00
I-Al-Istannen
8181ae5b17 Guard http response in context manager 2020-04-20 18:47:46 +02:00
Joscha
6407190ae0 Soupify requests responses properly 2020-04-20 16:38:30 +00:00
I-Al-Istannen
87395faac2 Add base for simple HTTP downloader 2020-04-20 17:43:59 +02:00
I-Al-Istannen
a9e6e7883d Create temp dir folder in constructor 2020-04-20 17:43:59 +02:00
Joscha
154d6b29dd Listen to pylint 2020-04-20 15:16:22 +00:00
I-Al-Istannen
62ac569ec4 Revert "Add proposed crawler entry type"
This reverts commit 9f1a0a58ab.

Each crawler will have its own data class.
2020-04-20 16:59:20 +02:00
I-Al-Istannen
9f1a0a58ab Add proposed crawler entry type 2020-04-20 16:54:47 +02:00
Joscha
879a2c7c80 Rewrite ILIAS authenticator 2020-04-20 14:26:30 +00:00
Joscha
ff06c5215e Fix authenticator 2020-04-20 14:26:29 +00:00
I-Al-Istannen
135a8dce4b Fix resolve_path allowing paths outside its folder
This happened if the directory name was a prefix of the offending file name.
2020-04-20 16:07:14 +02:00
I-Al-Istannen
63bbcad918 Add resolve method to tmp_dir 2020-04-20 15:40:07 +02:00
I-Al-Istannen
6584d6a905 Elaborate accept_file in new_organizer 2020-04-20 15:40:07 +02:00
Joscha
5990098ef8 Add UserPassAuthenticator 2020-04-20 13:26:45 +00:00
I-Al-Istannen
f3d3d6bb65 Add some docs to cookie_jar 2020-04-20 14:38:03 +02:00
I-Al-Istannen
b2fe7cc064 Add preliminary logging to organizer and tmp_dir 2020-04-20 14:37:44 +02:00
I-Al-Istannen
930d821dd7 Add a simple organizer 2020-04-20 14:29:48 +02:00
I-Al-Istannen
5c2ff14839 Add "prompt_yes_no" to utils 2020-04-20 14:29:48 +02:00
I-Al-Istannen
a3d6dc7873 Clean up temp_folder 2020-04-20 14:29:48 +02:00
Joscha
53ad1c924b Add cookie jar 2020-04-20 11:35:26 +00:00
I-Al-Istannen
8c431c7d81 Add a simple temporary folder 2020-04-20 12:08:52 +02:00
Joscha
d5dd5aac06 Fix some mypy errors 2020-04-20 01:54:47 +00:00
Joscha
25043a4aaa Remove unnecessary files
Also document some plans for the new program structure in REWRITE.md
2020-04-19 19:49:43 +00:00
I-Al-Istannen
cf3553175f Add OS_Exams synchronizer 2020-02-27 14:51:29 +01:00
I-Al-Istannen
bf8b3cf9f7 Hack in support for TI exams
This just adds an additional crawl check for AlteKlausuren. This is not
present on the root site but at the suffix `/Klausuren`.
Example config:

```py
 # The "Klausur" needs to be copied verbatim!
ti.synchronize("Klausur", "sync dir name",
               transform=ro_19_klausur_transform, filter=ro_19_klausur_filter)
```
2020-02-24 20:58:27 +01:00
I-Al-Istannen
f5bc49160f Lose 50 minutes of my life (and fix the TGI tut) 2019-12-12 12:50:16 +01:00
I-Al-Istannen
4433696509 [TGI] Add TGi tut 2019-11-18 09:58:16 +01:00
I-Al-Istannen
1407c6d264 Download all TGI files and not just lectures 2019-10-17 22:14:32 +02:00
I-Al-Istannen
1973c931bd Add support for other years in TGI downloader 2019-10-15 15:37:52 +02:00
I-Al-Istannen
458cc1c6d6 Add support for TGI website 2019-10-15 15:34:59 +02:00
Joscha
f94629a7fa Fix exceptions with weird content types
(hopefully)
2019-09-22 11:55:47 +00:00
I-Al-Istannen
2752e98621 Fix relative url joining in ti downloader 2019-07-26 10:06:01 +02:00
Joscha
ea01dc7cb2 Allow even more types of files 2019-07-05 08:48:43 +00:00
Joscha
77056e6f8d Allow more types of files 2019-07-04 12:16:42 +00:00
Joscha
d468a45662 Allow wolfram files 2019-06-11 12:42:55 +00:00
I-Al-Istannen
67da4e69fa Add colorful log output
Highlight the important operations (new, modified) in different colours.
2019-06-07 13:28:55 +02:00
Joscha
2016f61bf8 Crawl more of the TI page 2019-05-09 11:04:24 +00:00
Joscha
c72e92db18 Make Ti downloader authentication more robust 2019-05-06 12:04:01 +00:00
Joscha
44b4204517 Add basic Ti downloader 2019-05-06 11:54:36 +00:00
Joscha
d730d0064c Conform to other files' __all__ 2019-04-26 09:45:24 +00:00
Joscha
ae6cc40fb5 Rename ILIAS crawler to ilias
To be consistent with the other classes' capitalisation of acronyms
2019-04-26 04:29:12 +00:00
Joscha
0891e7f1bc Fix logging messages not appearing 2019-04-26 03:58:11 +00:00
Joscha
9693e1d968 Make logging easier 2019-04-25 19:53:13 +00:00
Joscha
f1ba618378 Remove unnecessary files 2019-04-25 19:18:19 +00:00
Joscha
dfddc93039 Move norbert from aiohttp to requests
Also fix streaming (when downloading) in the other classes.
2019-04-25 19:15:36 +00:00
Joscha
f0c42ce8ec Clean up
Use shorter name for responses, like in the requests doc.

Change Organizer's __all__ to be more in line with the other __all__s.
2019-04-25 19:02:48 +00:00
Joscha
82adeb324f Move ffm stuff from aiohttp to requests 2019-04-25 19:01:53 +00:00
Joscha
9bae030186 Move ilias stuff from aiohttp to requests 2019-04-25 18:52:48 +00:00
Joscha
c7a9a42b3d Allow files of type application/msword 2019-04-24 12:34:50 +00:00
Joscha
5a1bf2188b Switch from tabs to spaces 2019-04-24 12:34:20 +00:00
Joscha
3019e4255b Replace "/" in file names with "." 2018-12-14 09:27:12 +00:00
Joscha
616a8d96a2 Sort norbert files while downloading 2018-12-05 11:44:35 +00:00
Joscha
2d9223b8e6 Add norbert synchronizer 2018-11-29 10:26:58 +00:00
Joscha
bdc0e8ad03 Remember files correctly for cleanin up 2018-11-28 08:59:07 +00:00
Joscha
dad33b8c7f Save identically named files under different names 2018-11-27 17:23:32 +00:00
Joscha
98a2b5db34 Fix tut crawling 2018-11-27 10:28:39 +00:00