Automatically restore files from lost+found

6 thoughts on “Automatically restore files from lost+found

  1. The problem is that the data structure you're using is horribly slow for this purpose. You could speed it up *tremendously*by using a prefix tree. Of course, since you want to keep bash, we'll have to resort to a simpler version.
    Basically, you're going to put 16 ls-md5sum-files.txt files: ls-md5sum-files-1.txt, …, ls-md5sum-files-f.txt.
    At build time, sort your hashes according to the first letter of the hash into these files. Same at grep time.
    This should speed-up your search by a 16x factor. You can divide them according to the 2 first letters, giving you a 256x speed-up, and so-on.

  2. How about extending rdiff-backup or one of the other backup tools to do something similar?

  3. I don't have time to test this, but what about placing the md5 at the start of each line, sorting the files, then stepping through both files a line at a time finding and moving matches along the way.

    It just depends whether 2 sorts would be quicker than grepping the file each time.

  4. Load the hashes in a perl HASH like so:

    find -type f -print0 |
    xargs -0 openssl sha1 |
    perl -ne '
    sub parsesha1{

    foreach $hash (keys %file_hashes){
    print “$hash:”,$file_hashes{$hash},”

    } '

  5. The thing is Bash really sucks handling spaces in arrays (which is what you do with the “for file in $filelist” construct). Don't do it, plus I don't think there is a hash table in bash, of course you can make your own….

    Use perl instead, as I said before, or what would take even less time to code is to use python pickle to store the the hash + files in a machine readable format, then just load it when you start up again.. I haven't done this myself but I've seen it done and it seem trivial and a good thing to learn.

