<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>WebstersProdigy &#187; Network</title>
	<atom:link href="http://webstersprodigy.net/category/computers/network/feed/" rel="self" type="application/rss+xml" />
	<link>http://webstersprodigy.net</link>
	<description>Updates every other Friday... usually</description>
	<lastBuildDate>Sat, 26 May 2012 06:58:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='webstersprodigy.net' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>WebstersProdigy &#187; Network</title>
		<link>http://webstersprodigy.net</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://webstersprodigy.net/osd.xml" title="WebstersProdigy" />
	<atom:link rel='hub' href='http://webstersprodigy.net/?pushpress=hub'/>
		<item>
		<title>Linkedin Crawler</title>
		<link>http://webstersprodigy.net/2010/11/13/linkedin-crawler/</link>
		<comments>http://webstersprodigy.net/2010/11/13/linkedin-crawler/#comments</comments>
		<pubDate>Sat, 13 Nov 2010 02:27:09 +0000</pubDate>
		<dc:creator>webstersprodigy</dc:creator>
				<category><![CDATA[Network]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[gradproject]]></category>
		<category><![CDATA[linkedin]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://webstersprodigy.net/?p=731</guid>
		<description><![CDATA[The following is also source used in the grad project. I&#8217;ll post the actual paper at some point. But here is the linkedin crawler portion with the applicable source. By it&#8217;s nature, this code is breakable, and may not work even at the time of posting. But it did work long enough for me to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=731&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The following is also source used in the grad project. I&#8217;ll post the actual paper at some point. But here is the linkedin crawler portion with the applicable source. By it&#8217;s nature, this code is breakable, and may not work even at the time of posting. But it did work long enough for me to gather addresses, which was the point.</p>
<p>Usage is/was</p>
<blockquote><p>LinkedinPageGatherer.py Linkedinusername Linkedinpassword</p></blockquote>
<p>Following is an excerpt from the &#8216;paper&#8217;.</p>
<p>the HTMLParser libraries are more resilient to changes in source. Both HTMLParser and lxml libraries have different code available to process broken HTML. The HTMLParser libraries were chosen as more appropriate for these problems [lxml][htmlparsing].</p>
<p>There has been an effort to put all HTML specific logic in debuggable places so if the HTML generated changes then it is easy to modify the code parsing to reflect those changes (assuming equivalent information is available). However, changes in source are frequent, and the source code has had to be modified roughly every 3 months to reflect changes in HTML layout.</p>
<p>Unfortunately, although the functionality is simple, this program has grown to be much more complex due to roadblocks put in place by both LinkedIn Google.</p>
<p>To search LinkedIn from itself, it is necessary to have a LinkedIn account. With an account, it is possible to search with or without connections, although the searching criteria differ depending on the type of account you have. Because of this, one of the criteria for searching LinkedIn is cookie management, which has to be written to keep track of the HTTP session. In addition, LinkedIn uses a POST parameter nonce at each page that must be retrieved and POSTed for every page submission. Because of the nonce, it is also necessary to login at the login page, save the nonce and the cookie, and proceed to search through the same path an actual user would.</p>
<p>Once the tool is able to search for companies, there is an additional limitation. With the free account, the search is limited to displaying only 100 connections. This is inconvenient as the desired number of actual connections is often much larger. The tool I&#8217;ve written takes various criteria (such as location, title, etc) to perform multiple more specific searches of 100 results each. The extra information is harvested at each search to use for later searches. With more specific searches, the tool inserts unique items into a list of users. When the initial search initiates, LinkedIn reports the total number of results (although it only lets the account view 100 at a time) so the tool uses this total number as one possible stopping condition &#8211; when a percentage of that number has been reached or a certain number of failed searches have been tried.</p>
<p>This is easier to illustrate with an example. In the case of FPL, there are over 2000 results. However, it can be asserted that at least one of the results is from a certain Miami address. Using this as a search restriction the total results may be reduced to 500, the first 100 of which can be inserted. It can also be asserted that there is at least one result from the Miami address who is a project manager. Using this restriction, there are only 5 results, which have different criteria to do advanced searches on. Using this iterative approach, it is possible to gather most of the 2000. In the program I have written, this functionality is still experimental and the parameters must be adjusted.</p>
<p>One additional difficulty with LinkedIn is that with these results it does not display a name, only a job title associated with the company. Obviously, this is not ideal. A name is necessary for even the most basic spear phishing attacks. An email may sound slightly awkward if addressed as &#8220;Dear Project Manager in the Cyber Security Group&#8221;. The solution I found to retrieve employee names is to use Google. Using specific Google queries based on the LinkedIn names, it is possible to retrieve the names associated with a job, company, and job title.</p>
<p>Google has a use policy prohibiting automated crawlers. Because of this policy, it does various checks on the queries to verify that the browser is a known real browser. If it is not, Google returns a 403 status stating that the browser is not known. To circumvent this, a packet dump was performed on a valid browser. The code now has a snippet to send information exactly like an actual browser would along with randomized time delays to mimic a person. It should be impossible for Google to tell the difference over the long run – whatever checks they do can be mimicked. The code includes several configurable browsers to masquerade as. Below is the code snippet including the default spoofed browser which is Firefox running on Linux.</p>
<p><pre class="brush: python;">
def getHeaders(self, browser=&quot;ubuntuFF&quot;):
  #ubuntu firefox spoof
  if browser == &quot;ubuntuFF&quot;:
    headers = {
      &quot;Host&quot;: &quot;www.google.com&quot;,
      &quot;User-Agent&quot;: &quot;Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091109 Ubuntu/9.10 (karmic) Firefox/3.5&quot;,
      &quot;Accept&quot; : &quot;text/html,application/xhtml+xml,application xml;q=0.9,*/*;q=0.8&quot;,
      &quot;Accept-Language&quot; : &quot;en-us,en;q=0.5&quot;,
      &quot;Accept-Charset&quot; : &quot;ISO-8859-1,utf-8;q=0.7,*;q=0.7&quot;,
      &quot;Keep-Alive&quot; : &quot;300&quot;,
      &quot;Proxy-Connection&quot; : &quot;keep-alive&quot;
    }
...
</pre><br />
Although both Google and LinkedIn make it difficult to automate information mining, their approach will fundamentally fail a motivated adversary. Because these companies want to make information available to users, this information can also be retrieved automatically. Captcha technology has been one traditional solution, though by its nature it suffers from similar flaws in design.</p>
<p>The LinkedIn crawler program demonstrates the possibility of an attacker targeting a company to harvest people’s names, which many times can be mapped to email addresses as demonstrated in previous sections.</p>
<p>GoogleQueery.py</p>
<p><pre class="brush: python;">
#! /usr/bin/python

#class to make google queries
#must masquerade as a legitimate browser
#Using this violates Google ToS

import httplib
import urllib
import sys
import HTMLParser
import re

#class is basically fed a google url for linkedin for the
#sole purpose of getting a linkedin link
class GoogleQueery(HTMLParser.HTMLParser):
  def __init__(self, goog_url):
    HTMLParser.HTMLParser.__init__(self)
    self.linkedinurl = []
    query = urllib.urlencode({&quot;q&quot;: goog_url})
    conn = httplib.HTTPConnection(&quot;www.google.com&quot;)
    headers = self.getHeaders()
    conn.request(&quot;GET&quot;, &quot;/search?hl=en&amp;&quot;+query, headers=headers)
    resp = conn.getresponse()
    data = resp.read()
    self.feed(data)
    self.get_num_results(data)
    conn.close()

  #this is necessary because google wants to be mean and 403 based on... not sure
  #but it seems  I must look like a real browser to get a 200
  def getHeaders(self, browser=&quot;chromium&quot;):
    #if browser == &quot;random&quot;:
      #TODO randomize choice
    #ubuntu firefox spoof
    if browser == &quot;ubuntuFF&quot;:
      headers = {
        &quot;Host&quot;: &quot;www.google.com&quot;,
        &quot;User-Agent&quot;: &quot;Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091109 Ubuntu/9.10 (karmic) Firefox/3.5&quot;,
        &quot;Accept&quot; : &quot;text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8&quot;,
        &quot;Accept-Language&quot; : &quot;en-us,en;q=0.5&quot;,
        &quot;Accept-Charset&quot; : &quot;ISO-8859-1,utf-8;q=0.7,*;q=0.7&quot;,
        &quot;Keep-Alive&quot; : &quot;300&quot;,
        &quot;Proxy-Connection&quot; : &quot;keep-alive&quot;
        }
    elif browser == &quot;chromium&quot;:
      headers = {
        &quot;Host&quot;: &quot;www.google.com&quot;,
        &quot;Proxy-Connection&quot;: &quot;keep-alive&quot;,
        &quot;User-Agent&quot;: &quot;Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.5 Safari/533.2&quot;,
        &quot;Referer&quot;: &quot;http://www.google.com/&quot;,
        &quot;Accept&quot;: &quot;application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5&quot;,
        &quot;Avail-Dictionary&quot;: &quot;FcpNLYBN&quot;,
        &quot;Accept-Language&quot;: &quot;en-US,en;q=0.8&quot;,
        &quot;Accept-Charset&quot;: &quot;ISO-8859-1,utf-8;q=0.7,*;q=0.3&quot;
      }
    elif browser == &quot;ie&quot;:
      headers = {
        &quot;Host&quot;: &quot;www.google.com&quot;,
        &quot;Proxy-Connection&quot;: &quot;keep-alive&quot;,
        &quot;User-Agent&quot;: &quot;Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)&quot;,
        &quot;Referer&quot;: &quot;http://www.google.com/&quot;,
        &quot;Accept&quot;: &quot;application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5&quot;,
        &quot;Accept-Language&quot;: &quot;en-US,en;q=0.8&quot;,
        &quot;Accept-Charset&quot;: &quot;ISO-8859-1,utf-8;q=0.7,*;q=0.3&quot;
      }
    return headers

  def get_num_results(self, data):
    index = re.search(&quot;&lt;b&gt;1&lt;/b&gt; - &lt;b&gt;[d]+&lt;/b&gt; of [w]*[ ]?&lt;b&gt;([d,]+)&quot;, data)
    try:
      self.numResults = int(index.group(1).replace(&quot;,&quot;, &quot;&quot;))
    except:
      self.numResults = 0
      if not &quot;- did not match any documents. &quot; in data:
        print &quot;Warning: numresults parsing problem&quot;
        print &quot;setting number of results to 0&quot;

  def handle_starttag(self, tag, attrs):
    try:
      if tag == &quot;a&quot; and (((&quot;linkedin.com/pub/&quot; in attrs[0][1])
                    or  (&quot;linkedin.com/in&quot; in attrs[0][1]))
                    and (&quot;http://&quot; in attrs[0][1])
                    and (&quot;search?q=cache&quot; not in attrs[0][1])
                    and (&quot;/dir/&quot; not in attrs[0][1])):
        self.linkedinurl.append(attrs[0][1])
        #print self.linkedinurl
      #perhaps add a google cache option here in the future
    except IndexError:
      pass

#for testing
if __name__ == &quot;__main__&quot;:
  #url = &quot;site:linkedin.com &quot;PROJECT ADMINISTRATOR at CAT INL QATAR W.L.L.&quot; &quot;Qatar&quot;&quot;
  m = GoogleQueery(url)

</pre></p>
<p>LinkedinHTMLParser.py</p>
<p><pre class="brush: python;">
#! /usr/bin/python

#this should probably be put in LinkedinPageGatherer.py

import HTMLParser

from person_searchobj import person_searchobj

class LinkedinHTMLParser(HTMLParser.HTMLParser):
  &quot;&quot;&quot;
  subclass of HTMLParser specifically for parsing Linkedin names to person_searchobjs
  requires a call to .feed(data), stored data in the personArray
  &quot;&quot;&quot;
  def __init__(self):
    HTMLParser.HTMLParser.__init__(self)
    self.personArray = []
    self.personIndex = -1
    self.inGivenName = False
    self.inFamilyName = False
    self.inTitle = False
    self.inLocation = False

  def handle_starttag(self, tag, attrs):
    try:
      if tag == &quot;li&quot; and attrs[0][0] == &quot;class&quot; and (&quot;vcard&quot; in attrs[0][1]):
        self.personIndex += 1
        self.personArray.append(person_searchobj())
      if attrs[0][1] == &quot;given-name&quot; and self.personIndex &gt;=0:
        self.inGivenName = True
      elif attrs[0][1] == &quot;family-name&quot; and self.personIndex &gt;= 0:
        self.inFamilyName = True
      elif tag == &quot;dd&quot; and attrs[0][1] == &quot;title&quot; and self.personIndex &gt;= 0:
        self.inTitle = True
      elif tag == &quot;span&quot; and attrs[0][1] == &quot;location&quot; and self.personIndex &gt;= 0:
        self.inLocation = True
    except IndexError:
      pass

  def handle_endtag(self, tag):
    if tag == &quot;span&quot;:
      self.inGivenName = False
      self.inFamilyName = False
      self.inLocation = False
    elif tag == &quot;dd&quot;:
      self.inTitle = False

  def handle_data(self, data):
    if self.inGivenName:
      self.personArray[self.personIndex].givenName = data.strip()
    elif self.inFamilyName:
      self.personArray[self.personIndex].familyName = data.strip()
    elif self.inTitle:
      self.personArray[self.personIndex].title = data.strip()
    elif self.inLocation:
      self.personArray[self.personIndex].location = data.strip()

#for testing - use a file since this is just a parser
if __name__ == &quot;__main__&quot;:
  import sys
  file = open (&quot;test.htm&quot;)
  df = file.read()
  parser = LinkedinHTMLParser()
  parser.feed(df)
  print &quot;================&quot;
  for person in parser.personArray:
    print person.goog_printstring()
  file.close()
</pre></p>
<p>LinkedinPageGatherer.py &#8211; this is what should be called directly.</p>
<p><pre class="brush: python;">
#!/usr/bin/python

import urllib
import urllib2
import sys
import time
import copy
import pickle
import math

from person_searchobj import person_searchobj
from LinkedinHTMLParser import LinkedinHTMLParser
from GoogleQueery import GoogleQueery

#TODO add a test function that tests the website format for easy diagnostics when HTML changes
#TODO use HTMLParser like a sane person
class LinkedinPageGatherer:
  &quot;&quot;&quot;
  class that generates the initial linkeding queeries using the company name
  as a search parameter. These search strings will be searched using google
  to obtain additional information (these limited initial search strings usually lack
  vital info like names)
  &quot;&quot;&quot;
  def __init__(self, companyName, login, password, maxsearch=100,
               totalresultpercent=.7, maxskunk=100):
    &quot;&quot;&quot;
    login and password are params for a valid linkedin account
    maxsearch is the number of results - linkedin limit unpaid accounts to 100
    totalresultpercent is the number of results this script will try to find
    maxskunk is the number of searches this class will attempt before giving up
    &quot;&quot;&quot;
    #list of person_searchobj
    self.people_searchobj = []
    self.companyName = companyName
    self.login = login
    self.password = password
    self.fullurl = (&quot;http://www.linkedin.com/search?search=&amp;company=&quot;+companyName+
                    &quot;&amp;currentCompany=currentCompany&quot;, &quot;&amp;page_num=&quot;, &quot;0&quot;)
    self.opener = self.linkedin_login()
    #for the smart_people_adder
    self.searchSpecific = []
    #can only look at 100 people at a time. Parameters used to narrow down queries
    self.total_results = self.get_num_results()
    self.maxsearch = maxsearch
    self.totalresultpercent = totalresultpercent
    #self.extraparameters = {&quot;locationinfo&quot; : [], &quot;titleinfo&quot; : [], &quot;locationtitle&quot; : [] }
    #extraparameters is a simple stack that adds keywords to restrict the search
    self.extraparameters = []
    #TODO can only look at 100 people at a time - like to narrow down queries
    #and auto grab more
    currrespercent = 0.0
    skunked = 0
    currurl = self.fullurl[0] + self.fullurl[1]
    extraparamindex = 0

    while currrespercent &lt; self.totalresultpercent and skunked &lt;= maxskunk:
      numresults = self.get_num_results(currurl)
      save_num = len(self.people_searchobj)

      print &quot;-------&quot;
      print &quot;currurl&quot;, currurl
      print &quot;percentage&quot;, currrespercent
      print &quot;skunked&quot;, skunked
      print &quot;numresults&quot;, numresults
      print &quot;save_num&quot;, save_num

      for i in range (0, int(min(math.ceil(self.maxsearch/10), math.ceil(numresults/10)))):
        #function adds to self.people_searchobj
        print &quot;currurl&quot; + currurl + str(i)
        self.return_people_links(currurl + str(i))
      currrespercent = float(len(self.people_searchobj))/self.total_results
      if save_num == len(self.people_searchobj):
        skunked += 1
      for i in self.people_searchobj:
        pushTitles = [(&quot;title&quot;, gName) for gName in i.givenName.split()]
        #TODO this could be inproved for more detailed results, etc, but keeping it simple for now
        pushKeywords = [(&quot;keywords&quot;, gName) for gName in i.givenName.split()]
        pushTotal = pushTitles[:] + pushKeywords[:]
        #append to extraparameters if unique
        self.push_search_parameters(pushTotal)
      print &quot;parameters&quot;, self.extraparameters
      #get a new url to search for, if necessary
      #use the extra params in title, &quot;keywords&quot; parameters
      try:
        refineel = self.extraparameters[extraparamindex]
        extraparamindex += 1
        currurl = self.fullurl[0] + &quot;&amp;&quot; + refineel[0] + &quot;=&quot; + refineel[1] + self.fullurl[1]
      except IndexError:
        break

  &quot;&quot;&quot;
  #TODO: This idea is fine, but we should get names first to better distinguish people
  #also maybe should be moved
  def smart_people_adder(self):
    #we've already done a basic search, must do more
    if &quot;basic&quot; in self.searchSpecific:
  &quot;&quot;&quot;
  def return_people_links(self, linkedinurl):
    req = urllib2.Request(linkedinurl)
    fd = self.opener.open(req)
    pagedata = &quot;&quot;
    while 1:
      data = fd.read(2056)
      pagedata = pagedata + data
      if not len(data):
        break
    #print pagedata
    self.parse_page(pagedata)

  def parse_page(self, page):
    thesePeople = LinkedinHTMLParser()
    thesePeople.feed(page)
    for newperson in thesePeople.personArray:
      unique = True
      for oldperson in self.people_searchobj:
        #if all these things match but they really are different people, they
        #will likely still be found as unique google results
        if (oldperson.givenName == newperson.givenName and
            oldperson.familyName == newperson.familyName and
            oldperson.title == newperson.title and
            oldperson.location == oldperson.location):
              unique = False
              break
      if unique:
        self.people_searchobj.append(newperson)
  &quot;&quot;&quot;
    print &quot;=======================&quot;
    for person in self.people_searchobj:
      print person.goog_printstring()
  &quot;&quot;&quot;

  #return the number of results, very breakable
  def get_num_results(self, url=None):
    #by default return total in company
    if url == None:
      fd = self.opener.open(self.fullurl[0] + &quot;1&quot;)
    else:
      fd = self.opener.open(url)
    data = fd.read()
    fd.close()
    searchstr = &quot;&lt;p class=&quot;summary&quot;&gt;&quot;
    sindex = data.find(searchstr) + len(searchstr)
    eindex = data.find(&quot;&lt;/strong&gt;&quot;, sindex)
    return(int(data[sindex:eindex].strip().strip(&quot;&lt;strong&gt;&quot;).replace(&quot;,&quot;, &quot;&quot;).strip()))

  #returns an opener object that contains valid cookies
  def linkedin_login(self):
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor())
    urllib2.install_opener(opener)
    #login page
    fd = opener.open(&quot;https://www.linkedin.com/secure/login?trk=hb_signin&quot;)
    data = fd.read()
    fd.close()
    #csrf 'prevention' login value
    searchstr = &quot;&quot;&quot;&lt;input type=&quot;hidden&quot; name=&quot;csrfToken&quot; value=&quot;ajax:&quot;&quot;&quot;
    sindex = data.find(searchstr) + len(searchstr)
    eindex = data.find('&quot;', sindex)
    params = urllib.urlencode(dict(csrfToken=&quot;ajax:-&quot;+data[sindex:eindex],
                              session_key=self.login,
                              session_password=self.password,
                              session_login=&quot;Sign+In&quot;,
                              session_rikey=&quot;&quot;))
    #need the second request to get the csrf stuff, initial cookies
    request = urllib2.Request(&quot;https://www.linkedin.com/secure/login&quot;)
    request.add_header(&quot;Host&quot;, &quot;www.linkedin.com&quot;)
    request.add_header(&quot;Referer&quot;, &quot;https://www.linkedin.com/secure/login?trk=hb_signin&quot;)
    time.sleep(1.5)
    fd = opener.open(request, params)
    data = fd.read()
    if &quot;&lt;div id=&quot;header&quot; class=&quot;guest&quot;&gt;&quot; in data:
      print &quot;Linkedin authentication faild. Please supply a valid linkedin account&quot;
      sys.exit(1)
    else:
      print &quot;Linkedin authentication Successful&quot;
    fd.close()
    return opener

  def push_search_parameters(self, extraparam):
    uselesswords = [ &quot;for&quot;, &quot;the&quot;, &quot;and&quot;, &quot;at&quot;, &quot;in&quot;]
    for pm in extraparam:
      pm = (pm[0], pm[1].strip().lower())
      if (pm not in self.extraparameters) and (pm[1] not in uselesswords) and pm != None:
        self.extraparameters.append(pm)

class LinkedinTotalPageGather(LinkedinPageGatherer):
  &quot;&quot;&quot;
  Overhead class that generates the person_searchobjs, using GoogleQueery
  &quot;&quot;&quot;
  def __init__(self, companyName, login, password):
    LinkedinPageGatherer.__init__(self, companyName, login, password)
    extraPeople = []
    for person in self.people_searchobj:
      mgoogqueery = GoogleQueery(person.goog_printstring())
      #making the assumption that each pub url is a unique person
      count = 0
      for url in mgoogqueery.linkedinurl:
        #grab the real name from the url
        begindex = url.find(&quot;/pub/&quot;) + 5
        endindex = url.find(&quot;/&quot;, begindex)
        if count == 0:
          person.url = url
          person.name = url[begindex:endindex]
        else:
          extraObj = copy.deepcopy(person)
          extraObj.url = url
          extraObj.name = url[begindex:endindex]
          extraPeople.append(extraObj)
        count += 1
      print person
    print &quot;Extra People&quot;
    for person in extraPeople:
      print person
      self.people_searchobj.append(person)

if __name__ == &quot;__main__&quot;:
  #args are email and password for linkedin
  my = LinkedinTotalPageGather(company, sys.argv[1], sys.argv[2])
</pre></p>
<p>person_searchobj.py</p>
<p><pre class="brush: python;">
#! /usr/bin/python

class person_searchobj():
  &quot;&quot;&quot;this object is used for the google search and the final person object&quot;&quot;&quot;

  def __init__ (self, givenname=&quot;&quot;, familyname=&quot;&quot;, title=&quot;&quot;, organization=&quot;&quot;, location=&quot;&quot;):
    &quot;&quot;&quot;
    given name could be a title in this case, does not matter in terms of google
    but then may have to change for the final person object
    &quot;&quot;&quot;
    #&quot;name&quot; is their actual name, unlike givenName and family name which are linkedin names
    self.name = &quot;&quot;
    self.givenName = givenname
    self.familyName = familyname
    self.title = title
    self.organization = organization
    self.location = location

    #this is retrieved by GoogleQueery
    self.url = &quot;&quot;

  def goog_printstring(self):
    &quot;&quot;&quot;return the google print string used for queries&quot;&quot;&quot;
    retrstr = &quot;site:linkedin.com &quot;
    for i in  [self.givenName, self.familyName, self.title, self.organization, self.location]:
      if i != &quot;&quot;:
        retrstr += '&quot;' + i +'&quot; '
    return retrstr

  def __repr__(self):
    &quot;&quot;&quot;Overload __repr__ for easy printing. Mostly for debugging&quot;&quot;&quot;
    return (self.name + &quot;n&quot; +
            &quot;------n&quot;
            &quot;GivenName: &quot; + self.givenName + &quot;n&quot; +
            &quot;familyName:&quot; + self.familyName + &quot;n&quot; +
            &quot;Title:&quot; + self.title + &quot;n&quot; +
            &quot;Organization:&quot; + self.organization + &quot;n&quot; +
            &quot;Location&quot; + self.location + &quot;n&quot; +
            &quot;URL:&quot; + self.url + &quot;nn&quot;)

</pre> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webstersprodigy.wordpress.com/731/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webstersprodigy.wordpress.com/731/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/webstersprodigy.wordpress.com/731/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/webstersprodigy.wordpress.com/731/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/webstersprodigy.wordpress.com/731/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/webstersprodigy.wordpress.com/731/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/webstersprodigy.wordpress.com/731/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/webstersprodigy.wordpress.com/731/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/webstersprodigy.wordpress.com/731/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/webstersprodigy.wordpress.com/731/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/webstersprodigy.wordpress.com/731/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/webstersprodigy.wordpress.com/731/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/webstersprodigy.wordpress.com/731/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/webstersprodigy.wordpress.com/731/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=731&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webstersprodigy.net/2010/11/13/linkedin-crawler/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/be2c27a28b3788a3b9a7a8fa243d2978?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">webstersprodigy</media:title>
		</media:content>
	</item>
		<item>
		<title>email_spider</title>
		<link>http://webstersprodigy.net/2010/08/13/email_spider/</link>
		<comments>http://webstersprodigy.net/2010/08/13/email_spider/#comments</comments>
		<pubDate>Fri, 13 Aug 2010 02:07:00 +0000</pubDate>
		<dc:creator>webstersprodigy</dc:creator>
				<category><![CDATA[Network]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[gradproject]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://webstersprodigy.net/?p=719</guid>
		<description><![CDATA[This was a small part of a project that was itself about 1/3 of my graduate project. I used it to collect certain information. Here is the excerpt from the paper. Website Email Spider Program In order to automatically process publicly available email addresses, a simple tool was developed, with source code available in Appendix [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=719&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This was a small part of a project that was itself about 1/3 of my graduate project. I used it to collect certain information. Here is the excerpt from the paper.</p>
<p>Website Email Spider Program</p>
<p>In order to automatically process publicly available email addresses, a simple tool was developed, with source code available in Appendix A. An automated tool is able to process web pages in a way that is less error prone than manual methods, and it also makes processing the sheer number of websites possible (or at least less tedious).<br />
This tool begins at a few root pages, which can be comma delimited. From these, it searches for all unique links by keeping track of a queue so that pages are not usually revisited (although revisiting a page is still possible in case the server is case insensitive or equivalent pages are dynamically generated with unique URLs). In addition, the base class is passed a website scope so that pages outside of that scope are not spidered. By default, the scope is simply a regular expression including the top domain name of the organization.</p>
<p>Each page requested searches the contents for the following regular expression to identify common email formats:</p>
<p>[w_.-]{3,}@[w_.-]{6,}</p>
<p>The 3 and 6 repeaters were necessary because of false positives otherwise obtained due to various encodings. This regular expression will not obtain all email addresses. However, it will obtain the most common addresses with a minimum of false positives. In addition, the obtained email addresses are run against a blacklist of uninteresting generic form addresses (such as help@example.com, info@example.com, or sales@example.com).</p>
<p>These email addresses are saved in memory and reported when the program completes or is interrupted. Note because of the dynamic nature of some pages, these can potentially spider infinitely and must be interrupted (for example, a calendar application that uses links to go back in time indefinitely). Most emails seemed to be obtained in the first 1,000 pages crawled. A limit of 10,000 pages was chosen as a reasonable scope. Although this limit was reached several times, the spider program uses a breadth search method. It was observed that most unique addresses were obtained early in the spidering process, and extending the number of pages tended to have a diminishing return. Despite this, websites with more pages also tended to correlate with greater email addresses returned (see analysis section).</p>
<p>Much of the logic in the spidering tool is dedicated to correctly parsing html. By their nature, web pages vary widely with links, with many sites using a mix of directory traversal, absolute URLs, and partial URLs. It is no surprise there are so many security vulnerabilities related to browsers parsing this complex data.<br />
There is also an effort made to make the software somewhat more efficient by ignoring superfluous links to objects such as documents, executables, etc. Although if such a file is encountered an exception will catch the processing error, these files consume resources.</p>
<p>Using this tool is straightforward, but a certain familiarity is expected – it was not developed for an end user but for this specific experiment. For example, a URL is best processed in the format http://example.com/ since in its current state it would use example.com to verify that spidered addresses are within a reasonable scope. It prints debugging messages constantly because every site seemed to have unique parsing quirks. Although other formats and usages may work, there was little effort to make this software easy to use.</p>
<div id="_mcePaste">Here is the source.</div>
<p><pre class="brush: python;">
#!/usr/bin/python

import HTMLParser
import urllib2
import re
import sys
import signal
import socket

socket.setdefaulttimeout(20)

#spider is meant for a single url
#proto can be http, https, or any
class PageSpider(HTMLParser.HTMLParser):
  def __init__(self, url, scope, searchList=[], emailList=[], errorDict={}):
    HTMLParser.HTMLParser.__init__(self)
    self.url = url
    self.scope = scope
    self.searchList = searchList
    self.emailList = emailList
    try:
      urlre = re.search(r&quot;(w+):[/]+([^/]+).*&quot;, self.url)
      self.baseurl = urlre.group(2)
      self.proto = urlre.group(1)
    except AttributeError:
      raise Exception(&quot;URLFormat&quot;, &quot;URL passed is invalid&quot;)
    if self.scope == None:
      self.scope = self.baseurl
    try:
      req = urllib2.urlopen(self.url)
      htmlstuff = req.read()
    except KeyboardInterrupt:
      raise
    except urllib2.HTTPError:
      #not able to fetch a url eg 404
      errorDict[&quot;link&quot;] += 1
      print &quot;Warning: link error&quot;
      return
    except urllib2.URLError:
      errorDict[&quot;link&quot;] += 1
      print &quot;Warning: URLError&quot;
      return
    except ValueError:
      errorDict[&quot;link&quot;] += 1
      print &quot;Warning link error&quot;
      return
    except:
      print &quot;Unknown Error&quot;, self.url
      errorDict[&quot;link&quot;] += 1
      return
    emailre = re.compile(r&quot;[w_.-]{3,}@[w_.-]{2,}.[w_.-]{2,}&quot;)
    nemail = re.findall(emailre, htmlstuff)
    for i in nemail:
      if i not in self.emailList:
        self.emailList.append(i)
    try:
      self.feed(htmlstuff)
    except HTMLParser.HTMLParseError:
      errorDict[&quot;parse&quot;] += 1
      print &quot;Warning: HTML Parse Error&quot;
      pass
    except UnicodeDecodeError:
      errorDict[&quot;decoding&quot;] += 1
      print &quot;Warning: Unicode Decode Error&quot;
      pass
  def handle_starttag(self, tag, attrs):
    if (tag == &quot;a&quot; or tag ==&quot;link&quot;) and attrs:
      #process the url formats, make sure the base is in scope
      for k, v in attrs:
        #check it's an htref and that it's within scope
        if  (k == &quot;href&quot; and
            (((&quot;http&quot; in v) and (re.search(self.scope, v))) or
            (&quot;http&quot; not in v)) and
            (not (v.endswith(&quot;.pdf&quot;) or v.endswith(&quot;.exe&quot;) or
             v.endswith(&quot;.doc&quot;) or v.endswith(&quot;.docx&quot;) or
             v.endswith(&quot;.jpg&quot;) or v.endswith(&quot;.jpeg&quot;) or
             v.endswith(&quot;.png&quot;) or v.endswith(&quot;.css&quot;) or
             v.endswith(&quot;.gif&quot;) or v.endswith(&quot;.GIF&quot;) or
             v.endswith(&quot;.mp3&quot;) or v.endswith(&quot;.mp4&quot;) or
             v.endswith(&quot;.mov&quot;) or v.endswith(&quot;.MOV&quot;) or
             v.endswith(&quot;.avi&quot;) or v.endswith(&quot;.flv&quot;) or
             v.endswith(&quot;.wmv&quot;) or v.endswith(&quot;.wav&quot;) or
             v.endswith(&quot;.ogg&quot;) or v.endswith(&quot;.odt&quot;) or
             v.endswith(&quot;.zip&quot;) or v.endswith(&quot;.gz&quot;) or
             v.endswith(&quot;.bz&quot;) or v.endswith(&quot;.tar&quot;) or
             v.endswith(&quot;.xls&quot;) or v.endswith(&quot;.xlsx&quot;) or
             v.endswith(&quot;.qt&quot;) or v.endswith(&quot;.divx&quot;) or
             v.endswith(&quot;.JPG&quot;) or v.endswith(&quot;.JPEG&quot;)))):
          #Also todo - modify regex so that &gt;= 3 chars in front &gt;= 7 chars in back
          url = self.urlProcess(v)
          #TODO 10000 is completely arbitrary
          if (url not in self.searchList) and (url != None) and len(self.searchList) &lt; 10000:
            self.searchList.append(url)
  #returns complete url in the form http://stuff/bleh
  #as input handles (./url, http://stuff/bleh/url, //stuff/bleh/url)
  def urlProcess(self, link):
    link = link.strip()
    if &quot;http&quot; in link:
      return (link)
    elif link.startswith(&quot;//&quot;):
      return self.proto + &quot;://&quot; + link[2:]
    elif link.startswith(&quot;/&quot;):
      return self.proto + &quot;://&quot; + self.baseurl + link
    elif link.startswith(&quot;#&quot;):
      return None
    elif &quot;:&quot; not in link and &quot; &quot; not in link:
      while link.startswith(&quot;../&quot;):
        link = link[3:]
        #TODO [8:-1] is just a heuristic, but too many misses shouldn't be bad... maybe?
        if self.url.endswith(&quot;/&quot;) and (&quot;/&quot; in self.url[8:-1]):
          self.url = self.url[:self.url.rfind(&quot;/&quot;, 0, -1)] + &quot;/&quot;
      dir = self.url[:self.url.rfind(&quot;/&quot;)] + &quot;/&quot;
      return dir + link
    return None

class SiteSpider:
  def __init__(self, searchList, scope=None, verbocity=True, maxDepth=4):
    #TODO maxDepth logic
    #necessary to add to this list to avoid infinite loops
    self.searchList = searchList
    self.emailList = []
    self.errors = {&quot;decoding&quot;:0, &quot;link&quot;:0, &quot;parse&quot;:0, &quot;connection&quot;:0, &quot;unknown&quot;:0}
    if scope == None:
      try:
        urlre = re.search(r&quot;(w+):[/]+([^/]+).*&quot;, self.searchList[0])
        self.scope = urlre.group(2)
      except AttributeError:
        raise Exception(&quot;URLFormat&quot;, &quot;URL passed is invalid&quot;)
    else:
      self.scope = scope
    index = 0
    threshhold = 0
    while 1:
      try:
        PageSpider(self.searchList[index], self.scope, self.searchList, self.emailList, self.errors)
        if verbocity:
          print self.searchList[index]
          print &quot; Total Emails:&quot;, len(self.emailList)
          print &quot; Pages Processed:&quot;, index
          print &quot; Pages Found:&quot;, len(self.searchList)
        index += 1
      except IndexError:
        break
      except KeyboardInterrupt:
        break
      except:
        threshhold += 1
        print &quot;Warning: unknown error&quot;
        self.errors[&quot;unknown&quot;] += 1
        if threshhold &gt;= 40:
          break
        pass
    garbageEmails =   [ &quot;help&quot;,
                        &quot;webmaster&quot;,
                        &quot;contact&quot;,
                        &quot;sales&quot; ]
    print &quot;REPORT&quot;
    print &quot;----------&quot;
    for email in self.emailList:
      if email not in garbageEmails:
        print email
    print &quot;nTotal Emails:&quot;, len(self.emailList)
    print &quot;Pages Processed:&quot;, index
    print &quot;Errors:&quot;, self.errors

if __name__ == &quot;__main__&quot;:
  SiteSpider(sys.argv[1].split(&quot;,&quot;))

</pre> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webstersprodigy.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webstersprodigy.wordpress.com/719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/webstersprodigy.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/webstersprodigy.wordpress.com/719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/webstersprodigy.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/webstersprodigy.wordpress.com/719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/webstersprodigy.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/webstersprodigy.wordpress.com/719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/webstersprodigy.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/webstersprodigy.wordpress.com/719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/webstersprodigy.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/webstersprodigy.wordpress.com/719/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/webstersprodigy.wordpress.com/719/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/webstersprodigy.wordpress.com/719/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=719&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webstersprodigy.net/2010/08/13/email_spider/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/be2c27a28b3788a3b9a7a8fa243d2978?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">webstersprodigy</media:title>
		</media:content>
	</item>
		<item>
		<title>overthewire vortex level 0</title>
		<link>http://webstersprodigy.net/2010/07/25/overthewire-vortex-level-1/</link>
		<comments>http://webstersprodigy.net/2010/07/25/overthewire-vortex-level-1/#comments</comments>
		<pubDate>Sun, 25 Jul 2010 05:49:59 +0000</pubDate>
		<dc:creator>webstersprodigy</dc:creator>
				<category><![CDATA[Network]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[socket]]></category>

		<guid isPermaLink="false">http://webstersprodigy.net/?p=714</guid>
		<description><![CDATA[SPOILER. These games are awesome. Find them at http://www.overthewire.org.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=714&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>SPOILER. These games are awesome. Find them at http://www.overthewire.org. </p>
<p><pre class="brush: python;">
#!/usr/bin/python

#edited so it doesn't quite work...

import socket
import struct

HOST='host'
PORT=1111
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST,PORT))

blob = &quot;&quot;
#no idea why 2 packets... but seems to be consistent
for i in range (0,2):
  data = s.recv(2048)
  blob = blob + data

print &quot;DATA: &quot;, data
print len(blob)
#blob should be 4 unsigned ints
intdata = struct.unpack(&quot;IIII&quot;, blob)
total=0
for i in intdata:
  total += i

myblob = struct.pack(&quot;I&quot;, total)
s.send(myblob)

pw = s.recv(1024)
print pw
s.close()
</pre> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webstersprodigy.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webstersprodigy.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/webstersprodigy.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/webstersprodigy.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/webstersprodigy.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/webstersprodigy.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/webstersprodigy.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/webstersprodigy.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/webstersprodigy.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/webstersprodigy.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/webstersprodigy.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/webstersprodigy.wordpress.com/714/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/webstersprodigy.wordpress.com/714/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/webstersprodigy.wordpress.com/714/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=714&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webstersprodigy.net/2010/07/25/overthewire-vortex-level-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/be2c27a28b3788a3b9a7a8fa243d2978?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">webstersprodigy</media:title>
		</media:content>
	</item>
		<item>
		<title>nmap script to try and detect login pages</title>
		<link>http://webstersprodigy.net/2010/04/07/nmap-script-to-try-and-detect-login-pages/</link>
		<comments>http://webstersprodigy.net/2010/04/07/nmap-script-to-try-and-detect-login-pages/#comments</comments>
		<pubDate>Wed, 07 Apr 2010 20:18:00 +0000</pubDate>
		<dc:creator>webstersprodigy</dc:creator>
				<category><![CDATA[GrayHat]]></category>
		<category><![CDATA[Network]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[lua]]></category>
		<category><![CDATA[nmap]]></category>

		<guid isPermaLink="false">http://webstersprodigy.net/?p=660</guid>
		<description><![CDATA[The title sort of explains it.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=660&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The title sort of explains it.</p>
<p><pre class="brush: python;">
description = [[
Attempts to check if a login page exists on the port.
]]

---
-- @output
-- 80/tcp open  http
-- |_ http-login-form: HTTP login detected

-- HTTP authentication information gathering script
-- rev 1.0 (2010-02-06)

author = &quot;Rich Lundeen &lt;mopey@webstersprodigy.net&gt;&quot;

license = &quot;Same as Nmap--See http://nmap.org/book/man-legal.html&quot;

categories = {&quot;ioactive&quot;}

require(&quot;shortport&quot;)
require(&quot;http&quot;)
require(&quot;pcre&quot;)

portrule = shortport.port_or_service({80, 443, 8080}, {&quot;http&quot;,&quot;https&quot;})

parse_url = function(url)
  local re = pcre.new(&quot;^([^:]*):[/]*([^/]*)&quot;, 0, &quot;C&quot;)
  local s, e, t = re:exec(url, 0, 0)
  local proto = string.sub(url, t[1], t[2])
  local host = string.sub(url, t[3], t[4])
  local path = string.sub(url, t[4] + 1)
  local port = string.find(host, &quot;:&quot;)
  if port ~= nil then
    --TODO check bounds, sanity, cast port to an int
    local thost = string.sub(host, 0, port-1)
    port = string.sub(host, port+1)
    host = thost
  else
    if proto == &quot;http&quot; then
      port = 80
    elseif proto == &quot;https&quot; then
      port = 443
    end
  end
  return host, port, path
end

--attempting to be compatible with nessus function in http.inc
--in this case, host is a url - it should use get_http_page
--get_http_page = function(port, host, redirect)
  

--port and url are objects passed to the action function
--redirect an integer to prohibit loops
get_http_page_nmap = function(port, host, redirect, path)
  if path == nil then
    path = &quot;/&quot;
  end
  if redirect == nil then
    redirect = 2
  end
  local answer = http.get(host, port, path)
  if ((answer.header.location ~= nil) and (redirect &gt; 0) and 
      (answer.status &gt;=300) and (answer.status &lt; 400)) then
    nhost, nport, npath = parse_url(answer.header.location)
    if (((nhost ~= host.targetname) and (nhost ~= host.ip) and 
        (nhost ~= host.name)) or nport ~= port.number ) then
      --cannot redirect more, different service
      return answer, path
    else
      return get_http_page_nmap(port, host, redirect-1, npath)
    end
  end
  return answer, path
end

action = function(host, port)
  local result, path = get_http_page_nmap(port, host, 3)
  --seems to be a bug in the matching
  local loginflags = pcre.flags().CASELESS + pcre.flags().MULTILINE
  local loginre = {
     pcre.new(&quot;&lt;script&gt;[^&gt;]*login&quot;    , loginflags, &quot;C&quot;),
     pcre.new(&quot;&lt;[^&gt;]*login&quot;           , loginflags, &quot;C&quot;),
     pcre.new(&quot;&lt;script&gt;[^&gt;]*password&quot; , loginflags, &quot;C&quot;),
     pcre.new(&quot;&lt;script&gt;[^&gt;]*user&quot;     , loginflags, &quot;C&quot;),
     pcre.new(&quot;&lt;input[^&gt;)]*user&quot;      , loginflags, &quot;C&quot;),
     pcre.new(&quot;&lt;input[^&gt;)]*pass&quot;      , loginflags, &quot;C&quot;),
     pcre.new(&quot;&lt;input[^&gt;)]*pwd&quot;       , loginflags, &quot;C&quot;) }

  local loginform = false
  for i,v in ipairs(loginre) do
    local ismatch, j = v:match(result.body, 0)
    if ismatch then
      loginform = true
      break
      end
  end
  if loginform then
    return &quot;Login Form Detected at &quot; .. path
  end
end
</pre> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webstersprodigy.wordpress.com/660/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webstersprodigy.wordpress.com/660/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/webstersprodigy.wordpress.com/660/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/webstersprodigy.wordpress.com/660/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/webstersprodigy.wordpress.com/660/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/webstersprodigy.wordpress.com/660/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/webstersprodigy.wordpress.com/660/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/webstersprodigy.wordpress.com/660/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/webstersprodigy.wordpress.com/660/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/webstersprodigy.wordpress.com/660/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/webstersprodigy.wordpress.com/660/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/webstersprodigy.wordpress.com/660/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/webstersprodigy.wordpress.com/660/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/webstersprodigy.wordpress.com/660/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=660&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webstersprodigy.net/2010/04/07/nmap-script-to-try-and-detect-login-pages/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/be2c27a28b3788a3b9a7a8fa243d2978?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">webstersprodigy</media:title>
		</media:content>
	</item>
		<item>
		<title>snmp cheatsheet</title>
		<link>http://webstersprodigy.net/2010/01/16/snmp-cheatsheet/</link>
		<comments>http://webstersprodigy.net/2010/01/16/snmp-cheatsheet/#comments</comments>
		<pubDate>Sat, 16 Jan 2010 00:49:26 +0000</pubDate>
		<dc:creator>webstersprodigy</dc:creator>
				<category><![CDATA[Network]]></category>
		<category><![CDATA[snmp]]></category>

		<guid isPermaLink="false">http://webstersprodigy.net/?p=653</guid>
		<description><![CDATA[In my line of work, I come across SNMP default community strings quite a bit. I seem to always be searching for a reference on how to query various things - and also what I might change.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=653&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In my line of work, I come across SNMP default community strings quite a bit. I seem to always be searching for a reference on how to query various things &#8211; and also what I might change.</p>
<p>I&#8217;ve found the following page very useful:</p>
<p>http://www-01.ibm.com/software/webservers/httpservers/doc/v1319/9ac2mib.htm</p>
<p>Then an example query might be:</p>
<blockquote><p>snmpget -v 2c -c private 70.37.128.238 sysDescr.0</p></blockquote>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webstersprodigy.wordpress.com/653/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webstersprodigy.wordpress.com/653/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/webstersprodigy.wordpress.com/653/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/webstersprodigy.wordpress.com/653/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/webstersprodigy.wordpress.com/653/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/webstersprodigy.wordpress.com/653/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/webstersprodigy.wordpress.com/653/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/webstersprodigy.wordpress.com/653/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/webstersprodigy.wordpress.com/653/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/webstersprodigy.wordpress.com/653/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/webstersprodigy.wordpress.com/653/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/webstersprodigy.wordpress.com/653/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/webstersprodigy.wordpress.com/653/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/webstersprodigy.wordpress.com/653/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=653&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webstersprodigy.net/2010/01/16/snmp-cheatsheet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/be2c27a28b3788a3b9a7a8fa243d2978?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">webstersprodigy</media:title>
		</media:content>
	</item>
		<item>
		<title>proxychains &#8211; handy tool!</title>
		<link>http://webstersprodigy.net/2009/12/06/proxychains-handy-tool/</link>
		<comments>http://webstersprodigy.net/2009/12/06/proxychains-handy-tool/#comments</comments>
		<pubDate>Sun, 06 Dec 2009 06:11:51 +0000</pubDate>
		<dc:creator>webstersprodigy</dc:creator>
				<category><![CDATA[Network]]></category>
		<category><![CDATA[Security Tools]]></category>

		<guid isPermaLink="false">http://webstersprodigy.net/?p=643</guid>
		<description><![CDATA[proxychains is a pretty amazing tool available at http://proxychains.sourceforge.net/. It is a versitile proxy tool. So folks like me, who would like the source IPs to be from a proxy, or multiple proxys. For me, the main uses are proxying gui port scan stuff like nessus and proxying tor.ychains.sourceforge.net/. It is a versitile proxy tool. So folks like me, who would like the source IPs to be from a proxy, or multiple proxys. For me, the main uses are proxying gui port scan stuff like nessus and proxying tor.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=643&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>proxychains is a pretty amazing tool available at http://proxychains.sourceforge.net/. It is a versitile proxy tool. So folks like me, who would like the source IPs to be from a proxy, or multiple proxys. For me, the main uses are proxying gui port scan stuff like nessus and proxying tor.</p>
<p>Proxying port scans can be handy if you want the address to come from something else. For example, you might have an ssh server somewhere that you&#8217;d like to scan from. Or you might want to port scan through tor. To porscan through an ssh server</p>
<blockquote><p>ssh -D 2323 mysshserver</p>
<p>#edit /etc/proxychains.conf  so socks4 is set to 2323</p>
<p>#socks4  127.0.0.1 2323</p>
<p>proxychains nmap -T4&#8230;</p></blockquote>
<p>then all nmap traffic will appear to come from your ssh server. Very cool! In addition, you can set up a tor proxy, haver proxychains point to it from proxychains.conf, and launch your program similarly using proxychains. This has the advantage of having everything go through tor. So if you wanted you could port scan through tor.</p>
<p>A usually more legitimate use would be to launch firefox using proxychains through tor. This is superior to simply setting the proxy through ff itself because when ff sets a local proxy there is still dns leakage, potential flash leakage etc. If it is launched through the proxy, all children of the process go through tor.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webstersprodigy.wordpress.com/643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webstersprodigy.wordpress.com/643/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/webstersprodigy.wordpress.com/643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/webstersprodigy.wordpress.com/643/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/webstersprodigy.wordpress.com/643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/webstersprodigy.wordpress.com/643/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/webstersprodigy.wordpress.com/643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/webstersprodigy.wordpress.com/643/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/webstersprodigy.wordpress.com/643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/webstersprodigy.wordpress.com/643/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/webstersprodigy.wordpress.com/643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/webstersprodigy.wordpress.com/643/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/webstersprodigy.wordpress.com/643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/webstersprodigy.wordpress.com/643/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=643&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webstersprodigy.net/2009/12/06/proxychains-handy-tool/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/be2c27a28b3788a3b9a7a8fa243d2978?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">webstersprodigy</media:title>
		</media:content>
	</item>
		<item>
		<title>Auto Pw Change</title>
		<link>http://webstersprodigy.net/2009/11/13/auto-pw-change/</link>
		<comments>http://webstersprodigy.net/2009/11/13/auto-pw-change/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 23:57:51 +0000</pubDate>
		<dc:creator>webstersprodigy</dc:creator>
				<category><![CDATA[Network]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://webstersprodigy.net/?p=631</guid>
		<description><![CDATA[I had to change this script a lot, so take with a grain of salt.  That said, we changed about 1000 LOCAL passwords in a couple hours - which would have really taken all day and been more boring.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=631&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I had to change this script a lot, so take with a grain of salt.  That said, we changed about 1000 LOCAL passwords in a couple hours &#8211; which would have really taken all day and been more boring.</p>
<p><pre class="brush: python;">
#!/usr/bin/python

import pexpect

#most likely should be first for speed
passlist = [&quot;pass1&quot;, &quot;pass2&quot;, &quot;pass3&quot;]
#most critical should be listed in file first for speed
user=&quot;root&quot;
newpass=&quot;newpass&quot;

#open hosts file
hostfile=open(&quot;hosts.txt&quot;, &quot;r&quot;)

for host in hostfile:
  host = host.strip()
  changeSuccesful = False
  #need to find the currpass to change it
  #so auth by key may not be ideal in this case
  p = pexpect.spawn(&quot;ssh &quot; + user + &quot;@&quot; + host + &quot; passwd&quot;
  
  #try block so it doesn't crash the program
  try:
    #different systmes vary with exact text
    conn_result = p.expect([&quot;assword:&quot;, pexpect.EOF, &quot;Are you sure you want to continue&quot;])
    if conn_result == 2:
      print &quot;accepting public key for &quot;, host
      p.sendline(&quot;yes&quot;)
      conn_result = p.expect([&quot;assword:&quot;, pexpect.EOF])
    if conn_result == 0:
      for password in passlist:
        print &quot;tryin password for &quot;, host
        p.sendline(password)
        pass_result = p.expect([&quot;denied&quot;, &quot;current.*assword:&quot;, &quot;new.*assword&quot;, pexpect.EOF])
        if pass_result == 1:
          p.sendline(password)
          p.expect(&quot;new.*assword:&quot;)
        #this should execute if a key OR password was accepted
        if pass_result == 1 or pass_result == 2:
          p.sendline(newpass)
          p.expect(&quot;new.*assword:&quot;)
          p.sendline(newpass)
          changeSuccesful = True
          print &quot;Succesful pwchange: host &quot;+ host 
          break
    if not changeSuccesful:
      print &quot;UnSuccesful pwchange: host &quot;+ host 
  except:
    print &quot;Uncaught exception: host &quot;+ host 


</pre> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webstersprodigy.wordpress.com/631/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webstersprodigy.wordpress.com/631/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/webstersprodigy.wordpress.com/631/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/webstersprodigy.wordpress.com/631/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/webstersprodigy.wordpress.com/631/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/webstersprodigy.wordpress.com/631/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/webstersprodigy.wordpress.com/631/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/webstersprodigy.wordpress.com/631/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/webstersprodigy.wordpress.com/631/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/webstersprodigy.wordpress.com/631/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/webstersprodigy.wordpress.com/631/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/webstersprodigy.wordpress.com/631/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/webstersprodigy.wordpress.com/631/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/webstersprodigy.wordpress.com/631/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=631&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webstersprodigy.net/2009/11/13/auto-pw-change/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/be2c27a28b3788a3b9a7a8fa243d2978?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">webstersprodigy</media:title>
		</media:content>
	</item>
		<item>
		<title>Paper Fun: Simplified Single Packet Authorization</title>
		<link>http://webstersprodigy.net/2009/07/10/paper-fun-simplified-single-packet-authorization/</link>
		<comments>http://webstersprodigy.net/2009/07/10/paper-fun-simplified-single-packet-authorization/#comments</comments>
		<pubDate>Fri, 10 Jul 2009 08:37:07 +0000</pubDate>
		<dc:creator>webstersprodigy</dc:creator>
				<category><![CDATA[Network]]></category>
		<category><![CDATA[paper]]></category>
		<category><![CDATA[spa]]></category>

		<guid isPermaLink="false">http://webstersprodigy.net/?p=588</guid>
		<description><![CDATA[Port Knocking and Single Packet Authorization (SPA) are relatively new (circa 2004 and later) techniques used to enable anonymous, temporary activation of remote network services that are otherwise blocked by means of a firewall. These techniques greatly enhance the so-called "zero-day" exploit resilience of systems which properly implement them, but they have weaknesses and more importantly share a weakness common to most common security augmentation system: human nature. This paper presents a framework for securely enabling remote services in a manner which focuses on the human factor, a concept often neglected in security research and the key reason that such systems rarely see widespread usage in the real-world. The primary focus is to make SPA easier for humans to interact with.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=588&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Another paper to be presented next week at worldcomp</p>
<p>Port Knocking and Single Packet Authorization (SPA) are relatively new (circa 2004 and later) techniques used to enable anonymous, temporary activation of remote network services that are otherwise blocked by means of a firewall. These techniques greatly enhance the so-called &#8220;zero-day&#8221; exploit resilience of systems which properly implement them, but they have weaknesses and more importantly share a weakness common to most common security augmentation system: human nature. This paper presents a framework for securely enabling remote services in a manner which focuses on the human factor, a concept often neglected in security research and the key reason that such systems rarely see widespread usage in the real-world. The primary focus is to make SPA easier for humans to interact with.</p>
<p style="margin-bottom:0;">pdf paper is here: <a href="https://skydrive.live.com/redir.aspx?cid=19794fac33285fd5&amp;resid=19794FAC33285FD5!147&amp;parid=19794FAC33285FD5!112">Simplified Single Packet Authorization_1.4</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webstersprodigy.wordpress.com/588/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webstersprodigy.wordpress.com/588/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/webstersprodigy.wordpress.com/588/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/webstersprodigy.wordpress.com/588/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/webstersprodigy.wordpress.com/588/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/webstersprodigy.wordpress.com/588/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/webstersprodigy.wordpress.com/588/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/webstersprodigy.wordpress.com/588/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/webstersprodigy.wordpress.com/588/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/webstersprodigy.wordpress.com/588/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/webstersprodigy.wordpress.com/588/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/webstersprodigy.wordpress.com/588/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/webstersprodigy.wordpress.com/588/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/webstersprodigy.wordpress.com/588/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=588&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webstersprodigy.net/2009/07/10/paper-fun-simplified-single-packet-authorization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/be2c27a28b3788a3b9a7a8fa243d2978?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">webstersprodigy</media:title>
		</media:content>
	</item>
		<item>
		<title>Paper fun: Concerns with Time-Space Based Wireless Security</title>
		<link>http://webstersprodigy.net/2009/07/09/paper-fun-concerns-with-time-space-based-wireless-security/</link>
		<comments>http://webstersprodigy.net/2009/07/09/paper-fun-concerns-with-time-space-based-wireless-security/#comments</comments>
		<pubDate>Thu, 09 Jul 2009 22:36:51 +0000</pubDate>
		<dc:creator>webstersprodigy</dc:creator>
				<category><![CDATA[Network]]></category>
		<category><![CDATA[paper]]></category>
		<category><![CDATA[wireless]]></category>

		<guid isPermaLink="false">http://webstersprodigy.net/?p=584</guid>
		<description><![CDATA[Wireless ad-hoc network protocols are a  topic of much recent discussion and development. This has prompted many researchers to develop interesting and promising-sounding protocols that should be considered and examined. One such protocol, Authenticated Protocol for Wireless Ad Hoc Networks (APEC), was designed by Robert Hiromoto and Hope Forsmann[1]. APEC has been the subject of an increasing amount of scientific discussion and research around Universities, Laboratories, and professional conferences. In this paper, we examine APEC in depth and discuss many potential problems with the protocol that must be addressed if APEC is achieve widespread acceptance.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=584&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m presenting this at worlcomp this year.</p>
<p>Abstract:</p>
<p>Wireless ad-hoc network protocols are a  topic of much recent discussion and development. This has prompted many researchers to develop interesting and promising-sounding protocols that should be considered and examined. One such protocol, Authenticated Protocol for Wireless Ad Hoc Networks (APEC), was designed by Robert Hiromoto and Hope Forsmann[1]. APEC has been the subject of an increasing amount of scientific discussion and research around Universities, Laboratories, and professional conferences. In this paper, we examine APEC in depth and discuss many potential problems with the protocol that must be addressed if APEC is achieve widespread acceptance.</p>
<p>Paper: <a href="https://skydrive.live.com/redir.aspx?cid=19794fac33285fd5&amp;resid=19794FAC33285FD5!146&amp;parid=19794FAC33285FD5!112">probelm_with_time_0.7</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webstersprodigy.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webstersprodigy.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/webstersprodigy.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/webstersprodigy.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/webstersprodigy.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/webstersprodigy.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/webstersprodigy.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/webstersprodigy.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/webstersprodigy.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/webstersprodigy.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/webstersprodigy.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/webstersprodigy.wordpress.com/584/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/webstersprodigy.wordpress.com/584/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/webstersprodigy.wordpress.com/584/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=584&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webstersprodigy.net/2009/07/09/paper-fun-concerns-with-time-space-based-wireless-security/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/be2c27a28b3788a3b9a7a8fa243d2978?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">webstersprodigy</media:title>
		</media:content>
	</item>
		<item>
		<title>browsing with firefox, tor, refcontrol, and noscript on ubuntu</title>
		<link>http://webstersprodigy.net/2009/05/08/browsing-with-firefox-tor-refcontrol-and-noscript-on-ubuntu/</link>
		<comments>http://webstersprodigy.net/2009/05/08/browsing-with-firefox-tor-refcontrol-and-noscript-on-ubuntu/#comments</comments>
		<pubDate>Fri, 08 May 2009 20:54:43 +0000</pubDate>
		<dc:creator>webstersprodigy</dc:creator>
				<category><![CDATA[GrayHat]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Network]]></category>
		<category><![CDATA[firefox]]></category>
		<category><![CDATA[referer]]></category>
		<category><![CDATA[tor]]></category>
		<category><![CDATA[ubuntu]]></category>

		<guid isPermaLink="false">http://webstersprodigy.net/?p=534</guid>
		<description><![CDATA[I am doing some research that involves a *lot* of google searches. Because this research involves a significant number of directed queries, it seems logical to hide this information as much as practical. If there is a web host who notices sequential names in a Google referer URL repeatedly, this might raise suspicion or alter behavior which could skew results. Similarly, it is desirable to hide IP information from both the web host (for similar reasons) and possibly even search engines.<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=534&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is a topic that&#8217;s been covered a lot. However, it took a bit of research to find a solution that worked for me, so I thought I&#8217;d write about it here.</p>
<p>I am doing some research that involves a *lot* of google searches. Because this research involves a significant number of directed queries, it seems logical to hide this information as much as practical. If there is a web host who notices sequential names in a Google  referer URL repeatedly, this might raise suspicion or alter behavior which could skew results. Similarly, it is desirable to hide IP information from both the web host (for similar reasons) and possibly even search engines.</p>
<p>First, to  avoid any changes to usual browsing, a new firefox profile was created using:</p>
<blockquote><p>firefox -ProfileManager</p></blockquote>
<p>Additionally, to run both firefox profiles at once, the first was run as normal, which the second has the additional options:</p>
<blockquote><p>firefox -P &lt;new-profile&gt; -no-remote</p></blockquote>
<p>I add this to my taskbar alongside the regular old firefox %u so I can choose a profile with a click.</p>
<p>To hide the HTTP referer, a firefox extension called RefControl was selected  <a href="https://addons.mozilla.org/en-US/firefox/addon/953">https://addons.mozilla.org/en-US/firefox/addon/953</a>. This simply replaces the referer for every query with one that is configurable. Although this is certainly possible with a more traditional proxy (like paros), RefControl&#8217;s ease of use is essential with the shear number of queries that were performed for this research. For this research, I changed the referer passed several times from names like “yahoo.com”, “cnn.com”, etc. Although the traffic patterns may still seem suspicious to an administrator who carefully monitors his logs, it reveals virtually no information about what it is that is being searched for.</p>
<p>To obfuscate the IP address, tor and privoxy were used. Tor bounces the HTTP requests around a distributed network of relays all around the world. An in depth discussion of Tor is out of the context here, but in a nutshell “it prevents somebody watching your Internet connection from learning what sites you visit, and it prevents the sites you visit from learning your physical location” http://www.torproject.org/. Privoxy is additionally used to prevent applications like flash or dns from leaking information. Since both privoxy and tor are required, you need to install these:</p>
<blockquote><p>apt-get install tor privoxy</p></blockquote>
<p>and to get privoxy to work with tor, I uncommented the following line (if it&#8217;s not there just add it):</p>
<blockquote><p>forward-socks4a / localhost:9050 .</p></blockquote>
<p>Despite the advantages, this did make browsing for names quite slow. I really like torbutton. In the not so distant future I remember having to modify proxy settings every time I wanted to go back and forth using tor. With tor</p>
<p>Lastly, the noscript firefox plugin was used to mitigate all javascript based attacks that might be used to obtain IP information http://noscript.net/.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/webstersprodigy.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/webstersprodigy.wordpress.com/534/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/webstersprodigy.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/webstersprodigy.wordpress.com/534/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/webstersprodigy.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/webstersprodigy.wordpress.com/534/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/webstersprodigy.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/webstersprodigy.wordpress.com/534/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/webstersprodigy.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/webstersprodigy.wordpress.com/534/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/webstersprodigy.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/webstersprodigy.wordpress.com/534/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/webstersprodigy.wordpress.com/534/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/webstersprodigy.wordpress.com/534/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=webstersprodigy.net&#038;blog=35949064&#038;post=534&#038;subd=webstersprodigy&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://webstersprodigy.net/2009/05/08/browsing-with-firefox-tor-refcontrol-and-noscript-on-ubuntu/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/be2c27a28b3788a3b9a7a8fa243d2978?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">webstersprodigy</media:title>
		</media:content>
	</item>
	</channel>
</rss>
