When Apache webserver won't start & semaphores

Our alerting system started telling us one of our Apache systems (Graphite front end) wasn't running. After a while I joined in the diagnosis of what why it still wasn't running and how to get it running again. Trying to restart Apache only produced this:

service httpd start  ; tail -f error_log
Starting httpd:                                            [  OK  ]
[Tue Mar XX 06:21:41 2016] [notice] Digest: done
Configuration Failed
[Tue Mar XX 06:23:33 2016] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Tue Mar XX 06:23:33 2016] [notice] Digest: generating secret for digest authentication ...
[Tue Mar XX 06:23:33 2016] [notice] Digest: done
Configuration Failed
[Tue Mar XX 06:24:39 2016] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Tue Mar XX 06:24:39 2016] [notice] Digest: generating secret for digest authentication ...
[Tue Mar XX 06:24:39 2016] [notice] Digest: done
Configuration Failed

It was looking fine for a fraction of a second and then died with "Configuration Failed". A quick check of apachectl -t showed that the conf file was fine and that no one had changed it, so it wasn't referring to that. I also checked directory permissions, disk space and ionodes available - all looked fine. One option was that we could move all of the conf/vhosts files and try a clean start, but that wouldn't explain the problem.

Going deeper instead, I ran strace (it's possible I could have used httpd -e DEBUG instead of strace):

strace -f -o /tmp/apache.trace /usr/sbin/httpd

34583 read(10, "FGF95a\t\timage/unknown\n#\n# GRR 95"..., 4096) = 4096
34583 read(10, " The contributor claims:\n#   I c"..., 4096) = 851
34583 read(10, "", 4096)                = 0
34583 close(10)                         = 0
34583 open("/etc/httpd/conf/magic", O_RDONLY|O_CLOEXEC) = 10
34583 fcntl(10, F_GETFD)                = 0x1 (flags FD_CLOEXEC)
34583 fcntl(10, F_SETFD, FD_CLOEXEC)    = 0
34583 read(10, "# Magic data for mod_mime_magic "..., 4096) = 4096
34583 read(10, "o figure out what's inside.\n\n# s"..., 4096) = 4096
34583 read(10, "FGF95a\t\timage/unknown\n#\n# GRR 95"..., 4096) = 4096
34583 read(10, " The contributor claims:\n#   I c"..., 4096) = 851
34583 read(10, "", 4096)                = 0
34583 close(10)                         = 0
34583 write(2, "[Tue Mar 07 00:29:06 2017] [noti"..., 92) = 92
34583 open("/dev/urandom", O_RDONLY)    = 10
34583 read(10, "#\27\356'm[7T\266\265\373\374\203\16/_\375\236\10\200", 20) = 20
34583 close(10)                         = 0
34583 write(2, "[Tue Mar 07 00:29:06 2017] [noti"..., 49) = 49
34583 mmap(NULL, 500008, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0x7f1cd2f02000
34583 semget(IPC_PRIVATE, 1, IPC_CREAT|0600) = -1 ENOSPC (No space left on device)
34583 write(2, "Configuration Failed\n", 21) = 21
34583 select(0, NULL, NULL, NULL, {0, 10000}) = 0 (Timeout)
34583 close(9)                          = 0
34583 close(8)                          = 0
34583 close(7)                          = 0
34583 munmap(0x7f1cd2f02000, 500008)    = 0
34583 close(6)                          = 0
34583 close(5)                          = 0
34583 munmap(0x7f1cc8690000, 2248032)   = 0
34583 munmap(0x7f1cc82ea000, 3823040)   = 0
....

That looked weird. These two lines were interesting:
34583 semget(IPC_PRIVATE, 1, IPC_CREAT|0600) = -1 ENOSPC (No space left on device)
34583 write(2, "Configuration Failed\n", 21) = 21

Checking ipcs -s showed a large number of apache owned semaphores still in the system. Since apache wasn't running and wouldn't run, we decided to clear these down with a simple:
ipcs -s | grep apache | awk '{print $2}' | while read id
do
   ipcrm sem "$id"
done

This cleared up the semaphore list and allowed apache to restart. Under normal working conditions, apache only uses 2-4 semaphores. Looking on the web, this appears to be related to abrupt stops of Apache. Off to check if the team had been doing this and not realizing :)
Hope this helps anyone else trying to solve this problem.

Comments

Popular Posts