fork: Resource temporarily unavailable
This afternoon I took the time to upgrade the userpoints module to version 3 merging in point expirations, point categorizations and an API change. Most of this work was already completed but was sitting on MNN's dev server. When I logged into MNN's server I was greeted by a rather unfriendly shell message of " fork: Resource temporarily unavailable". Although I didn't think much of it the minute my tune quickly changed as I was also given that message when attempting to vi my bits of code.
oh boy.. this is going to be fun
Having no idea what this error was truly referring to I immediately tried to get a process list to see what was running on the server. Again I was greeted by the same error. I then jumped over to /var/log to check the syslog.
The syslog was zero bytes!!! NOT GOOD
Now it turned from a minor inconvenience to a serious issue. The syslog file was hosed, I couldn't get a process list, in fact I couldn't do anything. Thoughts of hacking, being owned, etc. flew through my mind and also through Andy's, whom I had quickly contacted on iChat to help resolve the issue (after all its his box now).
I tried to su to root to elevate my authority. No luck. Not a password problem though no resources to spawn the shell.
Andy rushed over to the server room to try the direct terminal, same issue (which made sense).
Google to the rescue! Apparently we had a resource issue. We needed resources on the box before we would do anything. A quick reboot would have solved the issue but it would have also destroyed the evidence of whatever was causing the issue. The next step was to start killing off processes. but how?
ps wouldn't run due to lack of resources and I couldn't cat/more/less anything in /var/run. The init.d scripts would run due to lack of resources. Heck ls wouldn't even run. The only application I could run was "kill" but not "killall". What a wonderful predicament. I have a method to kill processes but no way to know of a process to kill.. or did I?....
It turns out that the most helpful feature in bash was actually the most helpful. <tab> <tab> would return a directory listing if coupled with cd. Even though I couldn't use ls I could do a cd <tab> <tab> to get the directory listing. The first thing I did was a cd /proc/ <tab> <tab>. 8,300 items!!! This is a development box it shouldn't have more than 150 items. Some daemon or application must have gone crazy and spawned many many child processes. But which one?
If I had cat I could look in the cmdline "file" of the /proc/<id>/cmdline to determine what was what but without resources the only thing I could do was "kill" and even that only worked every other time.
My solution, although not perfect, was: Kill off the highest numbered processes as they will, most likely, be related to runaway daemon/application. Again bash to the rescue! This quick script helped to save the day
for i in /proc/3????
do
sudo kill -9 ${i##*/}
done
Fortunately I was in the sudo file for the kill application so I had the authority to kill without remorse (-9). The script didn't complete on the first run, it ran out of resources but it DID kill processes. I kept running it and after about 3 runs the system start to breathe.. a little. I was eventually able to properly shut down apache2 with the init script (i.e. /etc/init.d/apache2 stop. This freed up enough resources for me to finally grab a ps listing.
RETROCLIENT from RetroSpect!!!! was all over the place like Jackson Pollack Linux edition
I quickly killed the parent process of retroclient which took all of the children with it. The listing of /proc immediately dropped from a listing of over 8,000 to 113. Oh Retrospect how do I love thee? Let me count the ways....0