While reviewing my old colleges IT class’ backup system, since disk space was running low. I found what I deem a terrible flaw … A weekly 1:1 backup, of EVERY users files, close to 40 GB each week. I thought that this could’ve been better. I know for a fact, that many of the files backup up again and again … They never change. Ever.
I then sought out on a challenge, to code my own incremental backup system – using only Python and the Linux filesystem, extfs.
In my previous college’s IT classes, the students were the ones running the IT ‘department’. This meant that students had the responsibility for each others data in backup situations. A simple solution was enforced: Buying a 2 TB drive (two 1 TB drives), and setting a weekly 1:1 backup of ALL users (including those graduated).
It was awesome at first. From then on, you could (as any user) browse all users directories in a weekly iteration back. See what had changed and so forth. The webserver was even set up to show each users directories from different weeks, by entering a subdomain like “week35” for week 35. For example, my site is available here: http://rtgkom.dk/~michaelgb07/, while it is possible to view my news feed here: http://rtgkom.dk/~michaelgb07/…, and here it is shown as a copy from Week 1: http://week01.rtgkom.dk/~michaelgb07/….
The backside ofcourse, was the 1:1 backup which took its toll on harddiskspace. I decided to find a solution, the requirements were among others that the transparency wouldn’t be harmed. What this meant was, that I couldn’t just copy the changed and new files into a new weeks directory – since I’d be missing the context of the previous weeks (already existing files – and files that’d been deleted). It also meant that I had to make a system that did not require a tool to view (I couldn’t do a database approach, and I couldn’t make a viewer for XML files and such) – it had to be done in pure file system.