Here's one that will show that you shouldn't work on a system that you don't thourghly understand.
At my "previous" employer I was instructed to install a new (larger) disk drive in a RS/6000 system. Since a full backup of the system was done the previous day I just looked at the file systems vi a df to see which were on the drive that I was replacing. After this I did a tape backup of these filesystems, ran smit and did a remove of these filesystems. I then installed the new disk and brought the system back up. When I ran smit and when I was able to do the installation of the new drive and setup the file systems I was figuring that this was going to be an easy one. WRONG!! I was aware that you could expand filesystems under AIX but was not aware that it would expand them 'across physical drives'!!! I first realized that I was in trouble when I went to read in the backup tape and cpio was not found. I did an ls of the /usr/bin directory and it said that the file was there but when I tried to run it it was not found. And of course when I went looking for the original install tape it was not to be found....
When I had first gotten my NeXTstation, it had the lil' 105M hard drive in it. I had a 330M external, but alas, no cable for it. (Life was not fun when I was essentially netbooting off a "test" machine.... ".. um, guys, did you just reboot is-next?")
Finally got the cable, just in time for the winter holiday (read: no network). Brought the machine home, and I figured I'd just copy the configuration files over from the internal to the external (as a nice gesture to my users so they wouldn't have to change their passwords and everything).
The external was a brand new BuildDisk'd disk (had stock NeXTstep on it). NeXT keeps the private information of each machine (/dev, /etc, stuff like that) in a /private directory to make netbooting easier.
Hey, I'll just move /private from the 105M to /private on the external. So I deleted the external's /private and tried to move it via the workspace.
/dev is in /private.
/dev contains device files. Can't move them.
BUT. The workspace happily deleted all the files it DID copy, so the internal couldn't boot (no /etc) and the external couldn't boot (no /dev). This is before the advent of boot floppies so I was stuck for about a week at home with $5000 of NeXT computer that I couldn't boot.
The moral? *NEVER* move something important. Copy, VERIFY, and THEN delete.
I'm currently trying to work out how ISC Unix/386 handles COFF files, and discovered the /shlib directory, which I suspected wasn't really used (*wrong*). So, to try it out, I did:
So far, so good. So, put it back:
Oops! So, tried it from a different system, but didn't have permission, so:
OK, so let's just cp them across.
Then I wrote a program which just did a link(2) of the directories. Yes, gcc and ld didn't have any problems, but even after the link was in place, it still didn't work. I had to reboot (but nothing else), after which it did work. No idea why that made any difference.
I run on a 386/25. Small system, 4 inbound lines, etc. I was installing a new SCSI drive to complement my 2 MFM's. Took me forever to get everything just right. Things finally worked, so I figured I would shutdown and play with the jumper settings to see what this thing could do. What did I do? Well, I just turned off the power, that's all.
erk. Just rebuilt the kernal, did not do a haltsys, or a shutdown, or anything. Just shut the power off. ARGH! Took me 3 weeks to clean up the mess.
You tend to get in this cycle of "try" "haltsys" "power off" "change jumpers" "power on" "try". Well, once everything worked, I guess I was a wee bit excited and forgot a step. :-)
Two miserable flubs:
My moral of the story is when you are doing some BIG type the command and reread what you've typed about 100 times to make sure its sunk in (:
After about four months as a Unix sysadm, and still feeling rather like a novice, I was asked to "upgrade" a Sun lab (3/280 server and ten 3/50 diskless clients) from SunOS 4.0.3 to 4.1 -- of course, this "upgrade" was actually a complete re-install.
Well, the server had no tape drive, not even any SCSI controller. There were no other machines on its subnet other than the clients, so I had no boothost (at that time, I did not know that the routers could be reconfigured to pass the appropriate rarp packets, nor do I think our network people would have taken kindly to such a hack!). The clients did have SCSI controllers, but I had no portable tape drive. Luckily, I had a portable disk.
So, with great trepidation (remember, I was still a novice), I set up one of the clients, with the spare disk, to be a boothost. I booted the server off the client and read the miniroot from a tape on a remote machine, and copied it to the server's swap partition. Then I manually booted the miniroot on the server by booting off the temporary boothost with the appropriate options, and specified the server's swap partition as containing the kernel to be loaded. Once in the miniroot, I started up routed to permit me to reach the tapehost, and finally invoked suninstall. From then on, it worked like a charm.
Needless to say, I was extremely pleased with myself for figuring all of this out. I then settled down to do the "easy stuff", and got around to configuring NIS (Yellow Pages). I decided to get rid of everything I didn't need, under the assumption that a smaller system is easier to understand and keep track of. The Sun System and Network Administration Manual, which is in many ways an admirable tome, had on page 476 a section on "Preparing Files on NIS Clients", which said:
Of course, I finally connected the error message "unknown protocol" with the removed /etc/protocols (and other) files, restored these files, after which everything was fine again. I was pretty mad, since I had wasted a whole day on this problem, but *technically*, the Sun manual above is correct.
It just neglected to mention that of course, *no* machine is running NIS at boot time, therefore *every* machine needs valid data in the networks, services, protocols, and ethers files *at boot time*. Grrr!
My story happened on a Sun Sparcstation 2
I once wanted to update the libc.so.1.7 to libc.so.1.8 by myself, so I got root, and then ftp the /lib/libc.so.1.8 to my /lib. Unfortunately there was not enough room on this partition. So all i got was a file with zero length.
The problem is that I ran /usr/etc/ldconfig in the directory /lib, and that was all. Every command could not be executed, cause ld.so checked for /libc.so.1.8, being the newest one. All i needed was a statically linked mv, but SUN does not provide usually the source. Even going single user didn't do anything. So i had to install a miniroot on the swap partition, and cp /bin/mv from the CD-ROM, and execute-it.
I have been trying to put a at&t 3b2/310 machine on the net for a while, I'll skip the unbelievable hardware problems. I'll skip the paranoid system admins that forced me to build a temporary net to show them that the ethernet board worked. Anyway, I get it up and running on the temp net - it works fine - a little slow, but hey. Ok, so I'm ready to stick it on the net - you need to power down to do that right. So, I powered down. Bad, bad bad mistake. I had been running a sysadm shell script - I needed to change a password so that I could get into an account. Well, would you believe that the script, despite the fact that I wasn't in the passwd option anymore held onto the passwd file! Stupid machine, stupid script. Anyway... what that means is that when I boot up the machine, it passes diagnostics (A small miracle) runs unix and doesn't let anyone log in! I almost freaked. Anyway, so...
There's an undocumented option on the installation disks called 'magic mode' At one point it offers 4 options (none of which is magic) If you type magic mode at that point, you can get it... believe it or not some at&t person had the nerve, and bizarre sense of humor to add one extra line to magic mode- you see when you type 'magic mode' it says
Poof!
That was just about the last thing I wanted to see... the rest was in a sense trivial... ran an fsck... it fixed it all for me. So the moral of the story... never ever assume that some prepackaged script that you are running does anything right.