At DevelopmentNow we are doing some work for a client, work which involves us doing a bunch of scraping and reformatting of their data to build a mobile-friendly website. This is a big customer with a big IT department, so as you might expect there are often issues when we need access to their internal network but are prohibited and/or delayed to the point where we are blocked from building and testing our code which hits their sites. One such situation arose recently, where we needed to point our scraper code against their internal QA machine running the latest version of their site. We needed to test our scraper code against this new site to determine whether their changes in the newest version of their site were compatible with our code. Unfortunately, the QA site is only accessible through a machine which I have SSH access to and only when logged in via the VPN to which they provided access. What to do?

I had tried to just create a SSH tunnel and then have our scraper code connect to the tunnel which forwards to a single host (qa-machine) on their network, but unfortunately the qa-machine redirects back and forth between an intermediate login server, so I was getting blocked there. Using a SOCKS proxy seemed like the best bet, as a SOCKS proxy will work for all traffic.

I was hoping that I could just establish a SOCKS proxy using my SSH client and then point my Safari browser to the SOCKS proxy and do some initial testing. In other words run this command (ssh -D 8080 clientmachine) and then change my network settings to use a SOCKS proxy on localhost:8080 and browse to qa-machine. However, the Cisco VPN client seems to hijack all connections, so that even after doing this all connections were going through the VPN, which meant I could not connect to the site I wanted. If I am logged into clientmachine from an SSH connection I can access qa-machine using a tool like wget, but not using my desktop apps like Firefox on my OSX laptop when logged in on the VPN.

I was able to run Firefox from the clientmachine over X11 forwarding, but this was slow and the browser crashed frequently enough to make this intolerable. A cool idea but unusable.

My real goal was, after all, to run our rails app code locally such that it acted as if it was running on clientmachine, which is a machine where access to qa-machine is granted. I could have installed the code on clientmachine, and then started the rails app there, but then I had no idea how I would have been able to connect to the app running on clientmachine from my machine; perhaps some ssh reverse tunnel? I did not want to go there.

I then looked to see if mechanize (the gem which we use as the base of our scraper code) had support for SOCKS proxies, and it did not. The one problem with SOCKS proxies is that your application has to support using them, and lots of software does not (generally web browsers do, but not much else, so you cannot use your favorite twitter client typically). So, this was a dead end.

Finally, I looked around and found a neat little gem called socksify. Once this is installed it gives you an executable which re-routes all TCP connections through a socks proxy. So, I first created the socks proxy using ssh -D 8080 clientmachine and then ran socksify_ruby localhost 8080 ./script/server -p 9090. I then hit http://localhost:9090 and voila! Once I did this all of the scraper code running inside of our rails app are directed through the socks proxy, which means to the scraper it looks like it is running on clientmachine where the qa-machine is accessible. I was not sure if the VPN was hijacking the traffic again (I’d be highly surprised since this was happening over an encrypted SSH connection, but…), so I shutdown the proxy by killing the window, and then when I tried to hit the rails app I got “Connection refused” so it is indeed working. Nice little utility if you need to tunnel your traffic through a socks proxy.