Github2S3: Backup Github Repositories To Amazon S3

Published on Author Akhil Bansal4 Comments

Cross Posted from http://vinsol.com/blog

Who doesn’t know GitHub now a days, its a service to host git repositories. We, at Vinsol, use GitHub extensively to host all(50+) of our git repositories. Although Github is an awesome service, we miss one feature a lot, which is ‘archiving a repository’ somewhere outside Github. This is somewhat similar to the ‘archiving a project’ in BaseCamp. Since every account on GitHub has a limit on number of private repositories, we wanted to have feature like archiving or backing up inactive repositories on S3 to comply with this limit.

As there is no such feature provided by GitHub, we wrote a ruby script last week to take backup of git repositories with all tags & branches. This ruby script reads a git repositories info from a YML file and upload compressed repository to S3.

You can download this ruby script and YML file from https://github.com/vinsol/github2s3. After downloading you need to make required changes in github_repos.yml file, which is basically adding your repositories and their clone urls. Also, you need to update your AWS ACCESS KEY & AWS SECRET ACCESS KEY and bucket name in github2s3.rb file.

# AWS S3 credentials

AWS_ACCESS_KEY_ID = "ACCESS_KEY"
AWS_SECRET_ACCESS_KEY = "SECRET_KEY"

# S3 bucket name to put dumps
S3_BUCKET = "github-backup"

Once you are done with the above required changes you can run this ruby script by “ruby github2s3.rb”. It will clone each of the repository mentioned in YML file, compress them and upload to S3.

Note:
– You must have permissions to clone the github repository
– You must have git and ruby installed with aws-s3, colorize gem
– You can also use command line arguments, ex: ruby github2s3.rb [email protected]:account/test_repo.git [email protected]:account/another_test_repo.git

Restore:
Restoring a repository from backup is very simple, just download the repository backup from s3, uncompress and run:

git push --mirror [email protected]:mycompany/my-new-repo.git

That’s it.

If you have any feedback or suggestions on this approach of archiving Github projects, please share your comments.

4 Responses to Github2S3: Backup Github Repositories To Amazon S3

Leave a Reply

Your email address will not be published. Required fields are marked *