wget login pages
May 15, 2008 Leave a comment
how do you scrape a page that you have to login to get to? Well, one way is to save the cookies and use –post-data, though this depends on how the session is saved.
$ wget http://site/login/index.php –post-data “username=user&password=pass” –save-cookies=cookies.txt –keep-session-cookies
then to grab other pages
$ wget –load-cookies=cookies.txt http://login/someotherpage/index.php