We can laugh (almost) about it now, but...
Our operations group, a VMS group but trying to learn UNIX, was assigned account administration. They were cleaning up a few non-used accounts like they do on VMS - backup and purge. When they came across the account "sccs", which had never been accessed, away it went. The "deleteuser" utility fom DEC asks if you would like to delete all the files in the account. Seems reasonable, huh?
On a old decstation 3100 I was deleting last semesters users to try to dig up some disk space, I also deleted some test users at the same time.
One user took longer then usual, so I hit control-c and tried ls. "ls: command not found"
Turns out that the test user had / as the home directory and the remove user script in ultrix just happily blew away the whole disk.
ftp, telnet, rcp, rsh, etc were all gone. Had to go to tapes, and had one LONG rebuild of X11R5.
Fortunately it wasn't our primary system, and I'm only a student....
We have a home-grown admin system that controls accounts on all of our machines. It has a remove user operation that removes the user from all machines at the same time in the middle of the night.
Well, one night, the thing goes off and tries to remove a user with the home directory '/'. All the machines went down, with varying ammounts of stuff missing (depending on how soon the script, rm, find, and other importing things were clobbered).
Nobody knew what what was going on! The systems were restored from backup, and things seemed to be going OK, until the next night when the remove-user script was fired off by cron again.
This time, Corporate Security was called in, and the admin group's supervisor was called back from his vacation (I think there's something in there about a helicopter picking the guy up from a rafting trip in the Grand Canyon).
By chance, somebody checked the cron scripts, and all was well for the next night...
I was working on a line printer spooler, which lived in /etc. I wanted to remove it, and so issued the command "rm /etc/lpspl." There was only one problem. Out of habit, I typed "passwd" after "/etc/" and removed the password file. Oops.
I called up the person who handled backups, and he restored the password file.
A couple of days later, I did it again! This time, after he restored it, he made a link, /etc/safe_from_tim.
About a week later, I overwrote /etc/passwd, rather than removing it.
After he restored it again, he installed a daemon that kept a copy of /etc/passwd, on another file system, and automatically restored it if it appeared to have been damaged.
Fortunately, I finished my work on /etc/lpspl around this time, so we didn't have to see if I could find a way to wipe out a couple of filesystems...
After a real bad crash (tm) and having been an admin (on an RS/6000) for less than a month (honest it wasn't my fault, yea right stupid) we got to test our backup by doing:
# cd /
# rm -rf *
ohhhhhhhh sh*t i hope those tapes are good.
Ya know it's kinda funny (in a perverse way) to watch the system just slowly go away.
My mistake on SunOS (with OpenWindows) was to try and clean up all the '.*' directories in /tmp. Obviously "rm -rf /tmp/*" missed these, so I was very careful and made sure I was in /tmp and then executed "rm -rf ./.*".
I will never do this again. If I am in any doubt as to how a wildcard will expand I will echo it first.
Cleaning out an old directory, I did 'rm *', then noticed several files that began with dot (.profile, etc) still there. So, in a fit of obtuse brilliance, I typed...
rm -rf .* &
By the time I got it stopped, it had chewed through 3 filesystems which all had to be restored from tape (.* expands to ../*, and the -r makes it keep walking up the directory tree). Live and learn...
rik@nella15.cc.monash.edu.au (Rik Harris) writes: [snippet about "using 'find' in an auto-cleanup script which blew away half of the source" deleted. -ed.]
If you're doing this using find always put -xdev in:
find /tmp/ -xdev -fstype 4.2 -type f -atime +5 -exec rm {} \;
This stops find from working its way down filesystems mounted under /tmp/. If you're using, say, perl you have to stat . and .. and see if they are mounted on the same device. The fstype 4.2 is pure paranoia.
Needless to say, I once forgot to do this. All was well for some weeks until Convex's version of NQS decided to temporarily mount /mnt under /tmp... Interestingly, only two people noticed. Yes, the chief op. keeps good backups!
Other triumphs: I created a list of a user's files that hadn't been accessed for three months and a perl script for him to delete them. Of course, it had to be tested, I mislaid a quote from a print statement... This did turn into a triumph, he only wanted a small fraction of them back so we saved 20 MB.
I once deleted the only line from within an if..then statement in rc.local, the sun refused to come up, and it was surprisingly difficult to come up single user with a writeable file system.
AIX is a whole system of nightmares strung together. If you stray outside of the sort of setup IBM implicitly assume you have (all IBM kit, no non IBM hosts on the network, etc.) you're liable to end up in deep doodoo.
One thing I would like all vendors to do (I know one or two do) is to give root the option of logging in using another shell. Am I the only one to have mangled a root shell?
Just imagine having the sendmail.cf file in /etc. Now, I was working on the sendmail stuff and had come up with lots of sendmail.cf.xxx which I wanted to get rid of so I typed "rm -f sendmail.cf. *". At first I was surprised about how much time it took to remove some 10 files or so. Hitting the interrupt key, when I finally saw what had happened was way to late, though.
Fortune has it that I'm a very lazy person. That's why I never bothered to just back up directories with data that changes often. Therefore I managed to restore /etc successfully before rebooting... :-) Happy end, after all. Of course I had lost the only well working version of my sendmail.cf...
Once I was going to make a new file system using mkfs. The device I wanted
to make it on was /dev/c0d1s8. The device name that I used, however, was
/dev/c0d0s8 which held a very important application. I had always been a
little annoyed by the 10 second wait that mkfs has before it actually makes
the file system. I'm sure glad it waited that time though. I probably waited
9.9 seconds before I realized my mistake and hit that DEL key just in time.
That was a near disaster avoided.
[ I wish all systems were like that. Linux mkfs doesn't wait, but at ]
[ least I have the source! -ed. ]
Another time I wasn't so lucky. I was a very new SA, and I was trying to clean some junk out of a system. I was in /usr/bin when I noticed a sub directory that didn't belong there. A former SA had put it there. I did an ls on it and determined that it could be zapped. Forgetting that I was still in /usr/bin, I did an rm *. No 10 second idiot proofing with rm. Now if some one would only create an OS with a "Do what I mean, not what I say" feature.
Gary "Experience is what allows you to recognize a mistake the second time you make it." Fowler
I once had "gnu-emacs" aliased to 'em' (and 'emacs' etc)
One day I wanted to edit the start up file and mistyped
# rm /etc/rc.local
instead of the obvious.
*Fortunately* I had just finished a backup and was now finding out the joys of tar and it's love of path names. [./etc/rc.local and /etc/rc.local and etc/rc.local) are *not* the same for tar and TK-50s take a *long* time search for non-existant files :(]
Of course the BREAK (Ctrl-P) key on a VAX and an Ultrix manual and a certain /etc/ttys line are just a horror story waiting to happen! Especially when the VAX and manuals are in a unsupervised place :)
Most of our disks reside on a single, high-powered server. We decided this probably wasn't too good an idea, and put a new disk on one of the workstations (particularly since the w/s has a faster transfer rate than the server does!). It's still really useful to be able to use all disks from the one machine, so I mounted the w/s disk on the server. I said to myself (being a Friday afternoon...see previous post) "it's only temporary.../mnt is already being used...I'll mount it in /tmp". So, I mounted on /tmp/a (or something). This was fine for a few hours, but then the auto-cleanup script kicked in, and blew away half of my source (the stuff over 2 weeks old). I didn't notice this for a few days, though. After I figured out what had happened, and restored the files (we _do_ have a good backup strategy), everything was OK.
Until a few months later. We were trying to convince a sysadmin from another site that he shouldn't NFS export his disks rw,root to everyone, so I mounted the disk to put a few suid root programs in his home directory to convince him. Well, it's only a temporary mount, so....
You guessed it, another Friday afternoon. I did a umount /tmp/b, and forgot about it. I noticed this one about halfway through the next day. (NFS over a couple of 64k links is pretty slow). The disk had not unmounted because it was busy...busy with two find scripts, happily checking for suid programs, and deleting anything over a week old. A df on the filesystem later showed about 12% full :-( Sorry Craig.
Now, I create /mnt1, /mnt2, /mnt3.... :-)
Remember....Friday afternoons are BAD news.
Well, after reading some of the stories in this thread I guess I can tell mine. I got an RS/6000 mod. 220 for my office about 6 months ago. The OS was preloaded so I had little chance to learn that process. Being used to a full-screen editor I was not happy with vi so I read in the manual that INED (IBM's editor for AIX) was full-screen and I logged in as root and installed it. I immediately started to play with the new editor and somehow found a series of keys that told the editor to delete the current directory. To this day I don't know what that sequence of keys was, but I was unfortunately in the /etc directory when I found it, and I got a prompt that said "do you want to remove this?" and I thought i was just removing the file I had been playing with but instead I removed /etc!
I got the chance to learn how to install AIX from scratch. I did reinstall INED even though I was a little gun-shy but I made sure that whenever I used it from then on I was *not* root. I have since decided that EMACS may be a better choice.
Well, waddya know... Some half hour ago, coming back from root (I was installing m4 on our system) [Shit, all my neato emacs tricks won't work. Damn, damn, damn kill, kill, KILL] to my own userid, I got this little message: "Can't find home directory /mnt0/crissl." and an other: "Can't lstat .". [Grrrrr, ^S and ^Q haven't been remapped...]
Guess what happened, not an hour ago... A collegue of mine was emptying some directories of computer-course accounts. As I did a "ps -t" on his tty, what did I see? "rm -rf .*"
Well, I'm not alone, he got sixteen other homedirectories as well. And guess what filesystems we don't make incremental backups of... And why not? Beats me...
I haven't killed him yet, he first has to restore the lot.
And for those "touch \-i" fans out there: you wouldn't have been protected...
A few months ago in comp.sys.hp, someone posted about their repairs to an HP 7x0, after a new sysadmin had just started work. They {the new person} had been looking throught the file system to try to make some space, saw /dev and the mainly 0 length files therein. Next command was "rm -f /dev/*" and they wondered why they couldn't login ;)
I think the result was that the new person was sent on a sysamin's course a.s.a.p
> ... if you're trying rm -rf / you'll NEVER get a clear disk - at least
> /bin/rm (and if it reached /bin/rmdir before scanning some directories
> then add a lot of empty directories). I've seen it once...
Then it must be version-dependent. On this Sun, "cp /bin/rm foo" followed by "./foo foo" does not leave a foo behind, and strings shows that rm appears not to call rmdir (which makes sense, as it can just use unlink()).
In any case, I'm reminded of the following article. This is a classic which, like the story of Mel, has been on the net several times; it was in this newsgroup in January. It was first posted in 1986.
Have you ever left your terminal logged in, only to find when you came back to it that a (supposed) friend had typed "rm -rf ~/*" and was hovering over the keyboard with threats along the lines of "lend me a fiver 'til Thursday, or I hit return"? Undoubtedly the person in question would not have had the nerve to inflict such a trauma upon you, and was doing it in jest. So you've probably never experienced the worst of such disasters....
It was a quiet Wednesday afternoon. Wednesday, 1st October, 15:15
BST, to be precise, when Peter, an office-mate of mine, leaned away
from his terminal and said to me, "Mario, I'm having a little trouble
sending mail." Knowing that msg was capable of confusing even the
most capable of people, I sauntered over to his terminal to see what
was wrong. A strange error message of the form (I forget the exact
details) "cannot access /foo/bar for userid 147" had been issued by
msg. My first thought was "Who's userid 147?; the sender of the
message, the destination, or what?" So I leant over to another
terminal, already logged in, and typed
grep 147 /etc/passwd
only to receive the response
/etc/passwd: No such file or directory.
Instantly, I guessed that something was amiss. This was confirmed
when in response to
ls /etc
I got
ls: not found.
I suggested to Peter that it would be a good idea not to try anything
for a while, and went off to find our system manager.
When I arrived at his office, his door was ajar, and within ten
seconds I realised what the problem was. James, our manager, was
sat down, head in hands, hands between knees, as one whose world has
just come to an end. Our newly-appointed system programmer, Neil, was
beside him, gazing listlessly at the screen of his terminal. And at
the top of the screen I spied the following lines:
# cd
# rm -rf *
Oh, shit, I thought. That would just about explain it.
I can't remember what happened in the succeeding minutes; my memory is
just a blur. I do remember trying ls (again), ps, who and maybe a few
other commands beside, all to no avail. The next thing I remember was
being at my terminal again (a multi-window graphics terminal), and
typing
cd /
echo *
I owe a debt of thanks to David Korn for making echo a built-in of his
shell; needless to say, /bin, together with /bin/echo, had been
deleted. What transpired in the next few minutes was that /dev, /etc
and /lib had also gone in their entirety; fortunately Neil had
interrupted rm while it was somewhere down below /news, and /tmp, /usr
and /users were all untouched.
Meanwhile James had made for our tape cupboard and had retrieved what claimed to be a dump tape of the root filesystem, taken four weeks earlier. The pressing question was, "How do we recover the contents of the tape?". Not only had we lost /etc/restore, but all of the device entries for the tape deck had vanished. And where does mknod live? You guessed it, /etc. How about recovery across Ethernet of any of this from another VAX? Well, /bin/tar had gone, and thoughtfully the Berkeley people had put rcp in /bin in the 4.3 distribution. What's more, none of the Ether stuff wanted to know without /etc/hosts at least. We found a version of cpio in /usr/local, but that was unlikely to do us any good without a tape deck.
Alternatively, we could get the boot tape out and rebuild the root filesystem, but neither James nor Neil had done that before, and we weren't sure that the first thing to happen would be that the whole disk would be re-formatted, losing all our user files. (We take dumps of the user files every Thursday; by Murphy's Law this had to happen on a Wednesday). Another solution might be to borrow a disk from another VAX, boot off that, and tidy up later, but that would have entailed calling the DEC engineer out, at the very least. We had a number of users in the final throes of writing up PhD theses and the loss of a maybe a weeks' work (not to mention the machine down time) was unthinkable.
So, what to do? The next idea was to write a program to make a device descriptor for the tape deck, but we all know where cc, as and ld live. Or maybe make skeletal entries for /etc/passwd, /etc/hosts and so on, so that /usr/bin/ftp would work. By sheer luck, I had a gnuemacs still running in one of my windows, which we could use to create passwd, etc., but the first step was to create a directory to put them in. Of course /bin/mkdir had gone, and so had /bin/mv, so we couldn't rename /tmp to /etc. However, this looked like a reasonable line of attack.
By now we had been joined by Alasdair, our resident UNIX guru, and as luck would have it, someone who knows VAX assembler. So our plan became this: write a program in assembler which would either rename /tmp to /etc, or make /etc, assemble it on another VAX, uuencode it, type in the uuencoded file using my gnu, uudecode it (some bright spark had thought to put uudecode in /usr/bin), run it, and hey presto, it would all be plain sailing from there. By yet another miracle of good fortune, the terminal from which the damage had been done was still su'd to root (su is in /bin, remember?), so at least we stood a chance of all this working.
Off we set on our merry way, and within only an hour we had managed to concoct the dozen or so lines of assembler to create /etc. The stripped binary was only 76 bytes long, so we converted it to hex (slightly more readable than the output of uuencode), and typed it in using my editor. If any of you ever have the same problem, here's the hex for future reference:
070100002c000000000000000000000000000000000000000000000000000000
0000dd8fff010000dd8f27000000fb02ef07000000fb01ef070000000000bc8f
8800040000bc012f65746300
I had a handy program around (doesn't everybody?) for converting ASCII hex to binary, and the output of /usr/bin/sum tallied with our original binary. But hang on---how do you set execute permission without /bin/chmod? A few seconds thought (which as usual, lasted a couple of minutes) suggested that we write the binary on top of an already existing binary, owned by me...problem solved.
So along we trotted to the terminal with the root login, carefully remembered to set the umask to 0 (so that I could create files in it using my gnu), and ran the binary. So now we had a /etc, writable by all. From there it was but a few easy steps to creating passwd, hosts, services, protocols, (etc), and then ftp was willing to play ball. Then we recovered the contents of /bin across the ether (it's amazing how much you come to miss ls after just a few, short hours), and selected files from /etc. The key file was /etc/rrestore, with which we recovered /dev from the dump tape, and the rest is history.
Now, you're asking yourself (as I am), what's the moral of this story? Well, for one thing, you must always remember the immortal words, DON'T PANIC. Our initial reaction was to reboot the machine and try everything as single user, but it's unlikely it would have come up without /etc/init and /bin/sh. Rational thought saved us from this one.
The next thing to remember is that UNIX tools really can be put to unusual purposes. Even without my gnuemacs, we could have survived by using, say, /usr/bin/grep as a substitute for /bin/cat.
And the final thing is, it's amazing how much of the system you can delete without it falling apart completely. Apart from the fact that nobody could login (/bin/login?), and most of the useful commands had gone, everything else seemed normal. Of course, some things can't stand life without say /etc/termcap, or /dev/kmem, or /etc/utmp, but by and large it all hangs together.
I shall leave you with this question: if you were placed in the same
situation, and had the presence of mind that always comes with
hindsight, could you have got out of it in a simpler or easier way?
Answers on a postage stamp to:
Mario Wolczko
Some time ago, I was editing our cron file to remove core more than a day old. Unfortunately, thru recursing into VI sessions, I ended up saving an intermediate (wron) version of this file with an extra '-o' in it.
find / -name core -o -atime +1 -exec /bin/rm {} \;
The cute thing about this is that it leaves ALL core files intact, and removes any OTHER file that hasn't been accessed in the last 24 hours.
Although the script ran at 4AM, I was the first person to notice this, in the early afternoon.. I started to get curious when I noticed that SOME man pages were missing, while others were. Up till then, I was pleased to see that we finally had some free disk space. Then I started to notice the pattern.
Really unpleasant was the fact that no system backups had taken place all summer (and this was a research lab).
The only saving grace is that most of the really active files had been accessed in the previous day (thank god I didn't do this on a saturday). I was also lucky that I'd used tar the previous day, as well.
I still felt sick having to tell people in the lab what happened.
As some older sys admins may remember, BSD 4.1 used to display unprintable characters as a questionmark.
An unfortunate friend of mine had managed to create an executable with a name consisting of a single DEL character, so it showed up as "?*".
He tried to remove it.
"rm ?*"
he was quite frustrated by the time he asked me for help, because
he had such a hard time getting his files restored. Every time he walked
up to a sys-admin type and explained what happened, they'd go "you did
WHAT?", he'd explain again, and they'd go into a state of uncontrolable
giggles, and he'd walk away. I only giggled controlably.
This was at a time (~star wars) when it was known to many as "the mythical rm star".
The SCO man page for the rm command states:
It is also forbidden to remove the root directory of a given file system.
Well, just to test it out, I one day decided to try "rm -r /" on one of our test machines. The man page is correct, but if you read carefully, it doesn't say anything about all of the files underneath that filesystem....
A while back I installed System V R4 on my 386 at home for development purposes... I was compiling programs both in my home directory, and in /usr/local/src ... so in order to reduce unnecessary disk space I decided to use cron to delete .o files that weren't accessed for over a day...
I put the following command in the root cron...
find / -type f -name \*.o -atime +1 -exec /usr/bin/rm -f {} \;
(instead of putting)
find /home/bcutter -type f -name \*.o -atime +1 -exec /usr/bin/rm -f {} \;
find /usr/local/src -type f -name \*.o -atime +1 -exec /usr/bin/rm -f {} \;
The result was that a short time later I was unable to compile software. What the first line was doing was zapping the files like /usr/lib/crt1.o .. and later I found out all the Kernel object files...
OOPS! After this happened a second time (after re-installing the files from tape) I tracked down the offending line and fixed it....
Yet another case of creating work by trying to avoid extra work (in this case a second find line).