FreedomCoder Information for free-minded geeks

13Jun/100

Github Email Crawler

Small proof of concept to crawl several emails using Google, ruby and some Sunday coding.

One of the things that Github discloses ( if provided ) is the email addresses.  Short of that, it also discloses information such as Full Name, website, Location, etc, as it is show on the image below.

Taking advantage of this and the fact that it is stored on the  crawled page by Google it is really simple to search for Profile pages in github using a string close to the one shown:

site:github.com intitle:Profile

After that it is just a matter of retrieving each profile URL using a really simple regex like.

response.body.scan(/"http:\/\/github.com\/(.*)"/)

which should be followed by doing a get request of each and every profile.  It is relevant to mention that some emails address are encoded to prevent simple bots from crawling email addresses, but it is easily to bypass, since it is only encoded using a url-encoding method.

text.gsub!(/eval\(decodeURIComponent\('.*'\)\)/) { |a| CGI.unescape(a) }

Once we have the profile we can start gathering emails from the github.com site.  Even though this is just a simple proof of concept there are plenty of information that could be gathered to aid different types of social-engineering attacks.

Anyways, after a few minutes I had a really crapy and simple script that will crawl Google and find all Github.com Profiles in order to obtain all the disclosed email addresses.

( As I previously mentioned this could be expanded to harvest much more profile-able  data. )

Enjoy ! :)


Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • BarraPunto
  • LinkedIn
Comments (0) Trackbacks (0)

No comments yet.


Leave a comment


Powered by WP Hashcash

No trackbacks yet.