[HacktionLab] Static archive of site - the best command

Charlie Harvey charlie at newint.org
Tue Jul 18 10:59:07 UTC 2023


Hi,

For the sake of completeness, here are some other wget params that can
be useful:

--wait 1  to put a delay between page fetches (if killing your server
may be an issue)

-e robots=off to ignore robots.txt

-c  to continue if you get halfway through and need to restart

--user-agent="Mozilla"  if the site has cloudflare in front of it (they
block wget, curl et al by their UA name)

Cheers,

On 18/07/2023 11:14, Mike Harris wrote:
> Thanks for the suggestions all.  I will try the wget command first as
> my need is to set up a new WP site for them, whilst providing a static
> archive of their original site, and then they can link to their old docs.
> 
> The original site was built using some bespoke hosting company’s thing,
> called “Webs” or similar, they then got bought by VistaPrint, and then
> some of the little (maverick) sites like this one (a district
> association of allotment associations) have been told their sites are
> going dark with (apparently) no offer of an archive of their site or
> anything … grrrr >:-( 
> 
> 
> 
> Mike Harris
> 
> XtreamLab
> W: https://XtreamLab.net
> T: +44 7811 671 893
> 
>> On 18 Jul 2023, at 10:22, Nick Sellen <hacktionlab at nicksellen.co.uk>
>> wrote:
>>
>> 
>> Also worth a mention of the webarch service to do this -->
>> https://archived.website/ (which uses httrack
>> https://www.webarchitects.coop/archiving)
>>
>> ------- Original Message -------
>> On Tuesday, July 18th, 2023 at 08:57, m3shrom <m3shrom at riseup.net> wrote:
>>
>>> This has some good content
>>>
>>> https://www.stevenmaude.co.uk/posts/archiving-a-wordpress-site-with-wget-and-hosting-for-free
>>>
>>> It's focused on wordpress but potentially relevant for other content.
>>>
>>> Sample command I used for a wp network.
>>>
>>> wget --page-requisites --convert-links --adjust-extension --mirror
>>> --span-hosts
>>> --domains=mcrblogs.co.uk,www.mcrblogs.co.uk,edlab.org.ukmcrblogs.co.uk/afrocats
>>>
>>> nice one
>>> mick
>>>
>>> On 17/07/2023 23:23, Mike Harris wrote:
>>>> Hi all, but especially Mick,
>>>>
>>>> Last year Mick gave a talk on recovering the old Schnews website and producing a static version of it by a certain clever use of curl or wget.
>>>>
>>>> What’s the best command to get a complete functional static version of the entirety of a website for all linked to content?
>>>>
>>>> I ask because I need to grab a site for someone that’s about to ‘go dark’ and no one can get the details to login and get to the file system side of things.
>>>>
>>>> Cheers,
>>>>
>>>> Mike.
>>>>
>>>> Mike Harris
>>>>
>>>> XtreamLab
>>>> W: https://XtreamLab.net
>>>> T: +44 7811 671 893
>>>> _______________________________________________
>>>> HacktionLab mailing list
>>>> HacktionLab at lists.aktivix.org
>>>> https://lists.aktivix.org/mailman/listinfo/hacktionlab
>>
>> _______________________________________________
>> HacktionLab mailing list
>> HacktionLab at lists.aktivix.org
>> https://lists.aktivix.org/mailman/listinfo/hacktionlab
> 
> _______________________________________________
> HacktionLab mailing list
> HacktionLab at lists.aktivix.org
> https://lists.aktivix.org/mailman/listinfo/hacktionlab
> 


-- 
Charlie Harvey
IT Director
New Internationalist

t: +44 (0)1865 403 249
m: +44 (0)7912 327 288
w: https://newint.org/
k: https://ox4.li/gpgkey/

** Shop with a conscience: https://ethicalshop.org **

New Internationalist is an independent not-for-profit communications
cooperative. We publish the multi-award winning New Internationalist
magazine, a range of books from non-fiction to graphic novels, and
run an ethical mail order business. Learn more at: https://newint.org/

Incorporated in the UK under no. 1005239.

Old Music Hall, 106-108 Cowley Rd., Oxford, OX4 1JE, UK
PO Box 819, Markham, Ontario, L3P 8A2, Canada

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.aktivix.org/pipermail/hacktionlab/attachments/20230718/91c91b20/attachment.sig>


More information about the HacktionLab mailing list