A little shell script that I’ve used to generate the list of page composing a given site
#!/bin/bash
if [ "$#" -eq 2 ]
then
wget -erobots=off --mirror --delete-after --reject .jpg,.png,.gif,.swf,.css,.js,.txt,.pdf,.rtf,.odt,.doc $1 2>&1 |grep $1 |cut -d " " -f 4 |cut -d "?" -f 1 | sort | uniq > $2
rm -rf ${1#http://}
else
echo "Usage: ./scriptname http://testsite outputfile"
fi
Save with a desired file name and give it a chmod +x and then just launch from shell:
./scriptname http://testsite outputfile
No Comments
No comments yet.