Do not Panic! Remote Server (Hetzner) not rebooting any more – A Solution

I went through this experience recently. First of all, don’t panic! I panicked, and because of this, I made a mistake: I didn’t wait long enough for it to come online. Had I waited up to 60 minutes, it would probably have come online (see reason below). The story:

I had broken packages on my Ubuntu 10.04 server and decided to fix them by

apt-get update
apt-get upgrade

While updating, I noticed that the package grub-pc  also was upgraded, apparently a new bootloader (or bootloader configuration) was installed. This made me feel uncomfortable, since I didn’t know if the server would reboot after this upgrade. So, because of the saying “The devil you know is better than the devil you don’t know” (and a desire to sleep peacefully at night!) I decided to reboot the server and see what would happen. To my big dismay, it did not come online. SSH connections failed with “Port 22: Connection refused” .

I panicked and asked the Hetzner Support (which is very responsive and supportive btw!) to install a LARA Remote Console so that I could see the text output of the booting screen. After some regular startup text, the screen became blank. I panicked even more and took immediate steps to move all data to a new server. It took me 6 hours to complete the most important parts, and another full week of restless work to finish it. We will see that it was not neccessary.

Most regular servers at the Hetzner datacenter are running Software RAID. It seems that after a reboot (especially if you send a Hardware Reset) the OS needs some time to re-sync or check the file system. I am not sure what caused the delay in my case. Re-syncing the entire RAID array can take up to 1-2 hours, depending on your hardware and disc space.

So, wait at least 1 hour for it to come online, especially after a hardware reset! When you activate Hetzner’s Rescue system (which is very good btw!) it will stay active for a minimum of 1 hour, so your server will be down for 1 hour at least, in any event. So you are not losing much by waiting a bit longer.

Now, in my case, I assumed that Grub2 was broken. So I activated the Hetzner Rescue System, booted into it, and reinstalled Grub2. I have found the following method here and it worked for me. First you have to mount the regular RAID filesystem under /mnt :

mount /dev/md2 /mnt
mount /dev/md1 /mnt/boot
mount -t dev -o bind /dev /mnt/dev
mount -t proc -o bind /proc /mnt/proc
mount -t sys -o bind /sys /mnt/sys
chroot /mnt

At this point, you are in your regular root directory. To reinstall Grub2 the Debian Way, I did:

apt-get install --reinstall grub-pc

To make really sure, I reconfigured the package:

dpkg-reconfigure grub-pc

It will ask you where to install the bootloader. I selected:

[*] /dev/sda
[*] /dev/sdb

No errors were reported. I rebooted again and it did not come online immediately, for the reasons previously mentioned. I waited long enough (in my case, 15 minutes) and it did come online. So, rule number one is: Don’t panic!

A solution for MySQL Assertion failure FIL_NULL

A defective RAM module recently caused data corruption in MySQL tables. MySQL would log the following to /var/log/syslog  in regular intervals, about every few minutes:

140125 5:04:41 InnoDB: Assertion failure in thread 140046518261504 in file fut0lst.ic line 83
InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA
InnoDB: We intentionally generate a memory trap.

Reading MySQL documentation and various blogs didn’t help much. I ran CHECK TABLES  on all the tables and they all reported OK. Then I ran

mysqlcheck --all-databases

and still all tables reported OK. Nevertheless the Assertion Failures continued. Then I stumbled across this excellent blog post, which suggested to dump evertying into a .sql file, wipe /var/lib/mysql (not without making a backup, mind you!), and re-import everything from scratch. This is what I did and it worked.

I recorded the passwords for each database and collected it into a SQL script like this:

GRANT ALL PRIVILEGES ON database_name1 TO 'username1'@'localhost' IDENTIFIED BY 'password1';
GRANT ALL PRIVILEGES ON database_name2 TO 'username2'@'localhost' IDENTIFIED BY 'password2';
etc.

When you dump all databases into a .sql file, it will not dump the permissions, so you will need to restore them later with this script. Next, the dumping part, then removal and reinstallation of mysql (Danger here: When you remove mysql-server, all packges which depend on it also will be removed!):

cd ~
mysqldump -u root -ppass --all-databases > alldbs.sql
cd /var/lib
cp -vpr mysql mysql-backup
apt-get remove --purge mysql-server-5.5
apt-get install mysql-server

Here, I had to reset the MySQL admin password because it didn’t work any more, so I ran:

dpkg-reconfigure mysql-server-5.5

Then, I re-imported all databases from the dump file:

cd ~
mysql -u root -ppass < alldbs.sql

Then I run the SQL permission script that I mention above.

For me, this resulted in no more Assertion Failures. Yay!

 

 

Adventures with various Segfaults due to defective RAM

If you get various segfaults on your Linux server, like these:

spamd child[2656]: segfault at 200251c208 ip 00007fa039223684 sp 00007fff77953680 error 4 in libperl.so.5.14.2[7fa03916a000+177000]

or:

clamd[3311]: segfault at 1000000008 ip 00007f00200b3751 sp 00007fff3e2cef60 error 4 in libclamav.so.6.1.17[7f001fff1000+988000]

or

php5[14914]: segfault at 7fff7d2939c8 ip 00000000006bf04d sp 00007fff6d293860 error 6 in php5[400000+6f3000]

or

PassengerHelper[11644]: segfault at ffffffffca4ef420 ip 0000000000492fea sp 00007f5b81e991d0 error 7 in PassengerHelperAgent[400000+203000]

etc. etc., then, no, your system is not suddenly crazy. Nor are you. It is highly likely that you RAM is defective. You should reboot your server and run the  RAM test from your boot manager (Grub always has such a test) to see if it can detect faulty RAM.

If you are operating a server that you can’t reboot because you can’t tolerate downtime, there is an excellent tool calledmemtester , which is a memory test for a running system. It is part of the Debian distribution, installit with apt-get install memtester  Check top to see how much free RAM there is available. Say you have 10GB RAM free, then ask memterst to test 8GB of it (so that 2GB are remaining free for the running system to operate). In my case, memtester indeed detected faults.

I ran

memtester 8000 3

It outputted stuff like this:

Loop 1/3:
Stuck Address : ok 
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : testing  30FAILURE: 0xffffffffffffffff != 0xfffffffbffffffff at offset
0x36e77910.
Block Sequential : ok 
Checkerboard : ok 
Bit Spread : ok 
Bit Flip : ok 
Walking Ones : ok 
Walking Zeroes : ok 
8-bit Writes : ok
16-bit Writes : ok

So, when I replaced the RAM, the Segfaults stopped. You can runmemtester  regularly to make sure the RAM is okay. Healty RAM is a very crucial part of your successful hosting operation!

In my case however, the segfaults corrupted MySQL tables, which I had to clean up. All’s well that ends well!

 

How to install Guest Additions with Folder Sharing on VirtualBox – The Debian Way

Debian Wheezy as guest Operating System with Folder Sharing

I always forget where to download the proper version of the Virtualbox GuestAdditions, so here is a short tutorial about how to do it quickly and easily in the Debian way. I always need folder sharing between the guest and host OS, so you will see how this is done also. I will be using the current stable distribution Debian Wheezy as host system (without a running display manager, just the console), and  as the guest system also Debian Wheezy. It turns out that Debian has even packaged the Guest Additions for virtualbox.

First, install Virtualbox and the Guest Additions:

apt-get install virtualbox virtualbox-guest-additions

Then install Debian Wheezy as a guest system. Next, mount the Guest Additions ISO file by navigating through the menu system of Virtualbox:

  • Devices -> CD/DVD Drives -> Coose a virtual CD/DVD disk file …

Choose the ISO from this path:

/usr/share/virtuabox/VBoxGuestAdditions.iso

Next, in your guest OS, mount the virtual CD, and run the installer:

mount /dev/cdrom /mnt
cd /mnt
./VBoxLinuxAdditions.run

Next shut down your guest OS (don’t to this in your host OS 😉 ):

halt

Now set up the shared folder in the menu system of Virtualbox:

  • Settings -> Shared Folders -> +
  • Select Folder
  • Check “Auto-mount”

Re-start your guest OS, and you will have the shared folder auto-mounted in /media/sf_Public . Voila!

File permissions for successful SSH login via authorized_keys

If you want to ssh into your server without being repeatedly prompted for the password you can copy your public ssh key into a file called authorized_keys  in the .ssh subdirectory of the home directory of the remove server account. However, this works only if the permissions for this file are set correctly.

First, if you have not done so already, generate the public key for your local user:

ssh-keygen

 

This will create a file ~/.ssh/id_rsa.pub

Append the only line in this file into the file ~/.ssh/authorized_keys  of the remote user account. Create the directory and file if it does not exist.

Now try to ssh into your remote account. If ssh is still asking for the remote user’s password, check the permissions of the following files and directories:

  • The permissions of the home directory of the remote user must be 755
  • The permissions of the remote .ssh directory must be 700
  • The permissions of the remote authorized_keys file must be 600

… of course all of those must be owned by the remote user, and not by root.

Now, you should be able to ssh into the remote account without being asked for the password!