wget login pages

how do you scrape a page that you have to login to get to? Well, one way is to save the cookies and use –post-data, though this depends on how the session is saved.

$ wget http://site/login/index.php –post-data “username=user&password=pass” –save-cookies=cookies.txt –keep-session-cookies

then to grab other pages

$ wget –load-cookies=cookies.txt http://login/someotherpage/index.php

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s