This works as follows:
 * get-sources is a Python script that downloads files from the web, and converts them to the desired target format
 * sources.txt lists the files' sources, attribution, licensing etc
 * web.ini contains login details for sites that require it (and as such is not included in the source tree - see web.ini.sample for format)

