Simpson’s Paradox

Statistics can be weird. Just when you’ve done the game show paradox, and the birthday paradox, there’s this. I think people in general need to realize that we as humans are just not that good at intuitively knowing probability.

From John Rice’s Statistics Textbook:

A black urn contains 5 red and 6 green balls, and a white urn contains 3 red and 4 green balls. You are allowed to choose an urn and then choose a ball at random from the urn. If you choose a red ball, you get a prize. Which urn should you choose to draw from? If you draw from the black urn, the probability of choosing a red ball is 5/11 (the number of ways you can draw a red ball divided by the total number of outcomes). If you choose to draw from the white urn, the probability of choosing a red ball is 3/7, so you should choose to dra from the black urn.

Now consider another game in which a second black urn has 6 red and 3 green balls, and a second white urn has 9 red and 5 green balls. If you draw from the black urn, the probability of a red ball is 6/9, whereas if you choose to draw from the white urn, the probability is 9/14. Again, you should choose to draw from the black urn.

In the final game, the contents of the second black urn are added to the first black urn, and the contents of the second white urn are added to the first white urn. Again, you can choose which urn to draw from. Which should you choose? Intuition says choose the black urn, but let’s calculate the probabilities. The black urn now contains 11 red and 9 green, so the probability of drawing a red ball from it is 11/20 = .55. The white urn now contains 12 red and 9 green balls, so the probability of drawing a red ball from it is 12/21 = .571.  So, you should choose the white urn.

When you think about it, it actually makes sense. Because the number is greater in the second one it sort of evens out. Still, a little weird though.

Another common example is in batting averages. Here is an example wikipedia gives. In this example, David Justice has a better batting average than Mike Jeter for two years in a row, but his cumulative batting average is worse.

1995 1996 Combined
Derek Jeter 12/48 .250 183/582 .314 195/630 .310
David Justice 104/411 .253 45/140 .321 149/551 .270

Actually, the batting averages make it more intuitive for me.

Golay G24

This is a dirty implemenation of Golay correcting code using python.

This is a solution to 18.13 problem 1 from Trappe and Washington’s Crytography book. To run this, you need bash, python, and the numpy libraries. To run, run golay.sh.  The algorithm is located in golay.py

golay.py

#!/usr/bin/env python
 
from numpy import *
import sys
 
geng24 = array([[1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,1,1,0,0,0,1,0],
                [0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,1,1,1,0,0,0,1],
                [0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,1,1,0,1,1,1,0,0,0],
                [0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,1,1,0,1,1,1,0,0],
                [0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,1,1,0,1,1,1,0],
                [0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,1,1,0,1,1,1],
                [0,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,1,0,1,1,0,1,1],
                [0,0,0,0,0,0,0,1,0,0,0,0,1,1,1,0,0,0,1,0,1,1,0,1],
                [0,0,0,0,0,0,0,0,1,0,0,0,1,1,1,1,0,0,0,1,0,1,1,0],
                [0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,1,1,0,0,0,1,0,1,1],
                [0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,1,1,1,0,0,0,1,0,1],
                [0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,1,1,1,1,1,1,1,1,1]])
 
B = array([[1,1,1,0,1,1,1,0,0,0,1,0],
           [1,0,1,1,0,1,1,1,0,0,0,1],
           [1,1,0,1,1,0,1,1,1,0,0,0],
           [1,0,1,0,1,1,0,1,1,1,0,0],
           [1,0,0,1,0,1,1,0,1,1,1,0],
           [1,0,0,0,1,0,1,1,0,1,1,1],
           [1,1,0,0,0,1,0,1,1,0,1,1],
           [1,1,1,0,0,0,1,0,1,1,0,1],
           [1,1,1,1,0,0,0,1,0,1,1,0],
           [1,0,1,1,1,0,0,0,1,0,1,1],
           [1,1,0,1,1,1,0,0,0,1,0,1],
           [0,1,1,1,1,1,1,1,1,1,1,1]])
 
errorvector = array([0,0,0,0,0,0,0,0,0,0,0,0])
 
def weight(obj):
  wt = 0
  for i in obj:
    wt += i
  return wt
 
geng24_transpose = geng24.transpose()
 
#r = array([1,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,1,0])
#r = array([0,1,0,0,0,0,1,1,0,1,0,1,1,0,1,1,0,1,0,0,0,0,1,1])
#r = array([0,0,1,0,1,0,1,0,1,0,1,0,0,1,0,0,1,1,1,0,1,1,0,0])
#r = array([1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1])
 
r = []
for i in sys.argv[1].split(','):
  r.append(int(i))
r = array(r)
 
s = (dot(r,geng24_transpose) % 2 )
 
sdotb = (dot(s,B) % 2 )
 
#e = array([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
e = zeros( (1,24), dtype=int8)
e = e.flat
 
if weight(s) <= 3:
  print "ERROR CORRECTION 3"
  e = array([s, array([0,0,0,0,0,0,0,0,0,0,0,0])])
  e.shape = (1,24)
  e = e.flat
 
if weight(sdotb) <= 3:
  print "ERROR CORRECTION 4"
  e_new = array([array([0,0,0,0,0,0,0,0,0,0,0,0]), sdotb])
  e_new.shape = (1,24)
  e = (e + e_new) % 2
  e = e.flat
 
for j in range(12,24):
  if weight((geng24[:,j] + s)%2) < 2:
    print "ERROR CORRECTION 5"
    e[j] = 1
    e_new = array([(geng24[:,j]+s)%2, array([0,0,0,0,0,0,0,0,0,0,0,0])])
    e_new.shape = (1,24)
    e = (e + e_new) % 2
    e = e.flat
 
for j in range(0,12):
  if weight((sdotb + B[j,:])%2) <= 2:
    print "ERROR CORRECTION 6"
    e[j] = 1
    e_new = array([array([0,0,0,0,0,0,0,0,0,0,0,0]),((sdotb + B[j,:])%2)])
    e_new.shape = (1, 24)
    e = (e + e_new) % 2
    e = e.flat
 
print "\ne =  [",
for i in e:
  print i,
print "]\n"
 
c = ((e + r)%2)
m = array(c[:12])
 
print "r = ", r
print "c = ", c
print "m = ", m

Here is golay.sh

#!/bin/bash
 
echo "Running with Example Problem"
echo "==========================="
./golay.py 1,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,1,0
 
echo "
Running with Problem1"
echo "==========================="
./golay.py 0,1,0,0,0,0,1,1,0,1,0,1,1,0,1,1,0,1,0,0,0,0,1,1
 
echo "
Running with Problem2"
echo "==========================="
./golay.py 0,0,1,0,1,0,1,0,1,0,1,0,0,1,0,0,1,1,1,0,1,1,0,0
 
echo "
Running with Problem3"
echo "==========================="
./golay.py 1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1

Encrypt a message with RSA in python

Update
As an aside as I’m going through old posts: There’s a lot more that needs to be done to make this “good”. For one, the message shouldn’t be applied directly to RSA – rather, a hash should be used…. and for another, there’s no auth, and for another, the unsafe pickle might be code execution… below is just a toy for basic usage

For some people in my class this was easy, and others it was difficult.  Some people have spent a good 40 hours on this, so I thought I’d post some code to help out.  There isn’t much documentation on the crypto modules.

server.py

#!/usr/bin/env python

from Crypto.Hash import MD5
from Crypto.PublicKey import RSA
from Crypto.Util import randpool

import pickle
import socket
import sys

#generate the RSA key
blah = randpool.RandomPool()
RSAKey = RSA.generate(512, blah.get_bytes)

RSAPubKey = RSAKey.publickey()

#listen for a connection
host = ''
port = 12345

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((host,port))
s.listen(1)

print "Server is running on port %d; press Ctrl-C to terminate." % port

while 1:
  clientsock, clientaddr = s.accept()
  print "got connection from ", clientsock.getpeername()
  #send the public key over
  clientsock.send(pickle.dumps(RSAPubKey))

  rcstring = ''
  while 1:
    buf = clientsock.recv(1024)
    rcstring += buf
    if not len(buf):
      break
  clientsock.close()
  #done with the network stuff, at least for this connection

  #encmessage is the cipher text
  encmessage = pickle.loads(rcstring)

  print RSAKey.decrypt(encmessage)

client.py

#!/usr/bin/env python
from Crypto.Hash import MD5
from Crypto.PublicKey import RSA
from Crypto.Util import randpool

import pickle
import socket

host = 'localhost'
port = 12345

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

s.connect((host, port))

#this should loop around until a delimeter is read
#or something similar
rcstring = s.recv(2048)

#this object is of type RSAobj_c, which only has public key
#encryption is possible, but not decryption
publickey = pickle.loads(rcstring)

print publickey

#encrypt the top secret data
secretText = publickey.encrypt("Hello, this is Rich.", 32)

s.sendall(pickle.dumps(secretText))
s.close()