NOTE: you can find the newest version at: http://blog.windfluechter.net/content/blog/2011/03/30/1095-updated-automatically-restore-files-lostfound
Ok, my last version was a pure Bash solution: working, but slow. There were some comments how to improve the performance and I decided finally to reimplement the second script as Python script.
The Bash script didn’t finish within a day. The Python script ends after 1-2 hours in my test scenario. So, here are the scripts again:
make-lsLR.sh – call this regularly (cron) to create the needed files that are stored in /root/. Of course you can alter the location easily and exclude other directories from being scanned.
check_lost+found.py – The second script is to be run when your fsck managed to mess up with your files and stored them into lost+found directory. It takes 3 arguments: 1) the source directory where your messed up lost+found directory is, 2) the target directory to which the data will be saved and 3) a switch to actually make it happen instead of a dry-run.
4 thoughts on “Automatically restore files from lost+found – improved”
I can never resist the urge to offer code reviews to publicly posted Python code. Feel free to ignore it.
0. As an aside, Python's insistence on indentation breaks down in cases like this — when buggy blog software removes it. This is about the only downside of semantic indentation that I've seen.
1. The string module is obsolete, you can express most of the operations more concisely by using string object methods, e.g. entry.split(' ', 2), entry.replace(”
2. It would be less jarring if you picked one consistent way of referring to functions from the same module — e.g. os.path.isdir versus just isdir after using 'from os.path import *'.
3. Many experienced Python programmers will suggest avoiding 'import *', except in very rare special cases.
4. A bare 'except:' clause often catches too much (e.g. the KeyboardInterrupt exception when the user tries to abort your script by pressing ^C), it's a good habit to explicitly list the exceptions you want to catch (OSError in your case).
5. There's no need to strip the newline if all you're doing with the line of text is splitting it.
6. some_list[4:] is indistinguishable from some_list
7. str(something_that_is_already_a_string) is a no-op.
8. beware string.strip(” “): it splits on a single space character, unlike string.strip() which splits on runs of whitespace characters; use string.strip(None, 4) if you want to limit the number of splits
9. if you don't care about compatibility with older Python versions, 'if md5s in sfiles:' looks better than 'if sfiles.has_key(md5s):'.
Thanks for your comments!
ad 0) yes, true, that's why I put a download link there as well… 😉
ad 1) there's no mention of string being obsolete in my 10 years old Python book, but I try to keep this in mind. 🙂
ad 2) well, true.
ad 3) same, but I wouldn't call me an experienced Python programmer, just an occasional one… 🙂
ad 4) Thanks! changed.
ad 5, 6, 7, 8) list[4:] is because of filenames with spaces in it and without the additional  I often got ['/path/file'] strings which resulted in target path names like /target/to/['/path/filename']. I thing this was also the reason for the str() function there. Hmmm, OTOH, str() might cause the ['']. I'll try with string.strip(None, 4) then…
ad 9) this isn't covered by my 10 years old book, I think, so thanks again! 🙂
Hint: There's no need to re-compute md5sums for the files listed in /var/lib/dpkg/info/*.md5sums
Oh, thx… true, but files installed bei dpkg can be re-installed easily anyway. So the main focus is for private files like images, pictures and documents. I just collect all files for simplicity.
I plan to write another script to fix file permissions in case of a mistake done with chown/chmod, btw…
Comments are closed.