
So, I tested my little bounding-box idea, and it seems to work. Just throw away any vertex that is farther away from the one you're comparing with in any of the principal directions. The fastest way to implement this may depend on a number of factors, but I brought the running time down to 41% on the first try, which I count as a success. A diff file with the changes I made is attached. I decided to do the bounding box test even when the pyd is present, but since the pyd version without my test seem to be still faster (I haven't tested this, but that's what it looks like from your numbers), that might not be ideal. So you might want to experiment some more with this.
-- I'm not mad at you, just Westphalian.