Adventures with various Segfaults due to defective RAM

Note: This post is 7 years old. Some information may no longer be correct or even relevant. Please, keep this in mind while reading.

If you get various segfaults on your Linux server, like these:

spamd child[2656]: segfault at 200251c208 ip 00007fa039223684 sp 00007fff77953680 error 4 in[7fa03916a000+177000]


clamd[3311]: segfault at 1000000008 ip 00007f00200b3751 sp 00007fff3e2cef60 error 4 in[7f001fff1000+988000]


php5[14914]: segfault at 7fff7d2939c8 ip 00000000006bf04d sp 00007fff6d293860 error 6 in php5[400000+6f3000]


PassengerHelper[11644]: segfault at ffffffffca4ef420 ip 0000000000492fea sp 00007f5b81e991d0 error 7 in PassengerHelperAgent[400000+203000]

etc. etc., then, no, your system is not suddenly crazy. Nor are you. It is highly likely that you RAM is defective. You should reboot your server and run the  RAM test from your boot manager (Grub always has such a test) to see if it can detect faulty RAM.

If you are operating a server that you can’t reboot because you can’t tolerate downtime, there is an excellent tool calledmemtester , which is a memory test for a running system. It is part of the Debian distribution, installit with apt-get install memtester  Check top to see how much free RAM there is available. Say you have 10GB RAM free, then ask memterst to test 8GB of it (so that 2GB are remaining free for the running system to operate). In my case, memtester indeed detected faults.

I ran

memtester 8000 3

It outputted stuff like this:

Loop 1/3:
Stuck Address : ok 
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : testing  30FAILURE: 0xffffffffffffffff != 0xfffffffbffffffff at offset
Block Sequential : ok 
Checkerboard : ok 
Bit Spread : ok 
Bit Flip : ok 
Walking Ones : ok 
Walking Zeroes : ok 
8-bit Writes : ok
16-bit Writes : ok

So, when I replaced the RAM, the Segfaults stopped. You can runmemtester  regularly to make sure the RAM is okay. Healty RAM is a very crucial part of your successful hosting operation!

In my case however, the segfaults corrupted MySQL tables, which I had to clean up. All’s well that ends well!