This is a slightly technical post which I just wanted to put out there to assist anyone who may run into the problem. If you're not using Linux Containers, you can safely skip over this post.
While using lxc, you may find that certain LXC commands fail to run properly which means you are unable to run various lxc utility commands like lxc-stop
or lxc-ls
. If you're running lots of containers on a host (like we do with Viaduct), you need to identify which container is causing issues.
To do this, you need to find which container is causing the command to hang. To do this, you can use strace
on the command which is hanging.
strace lxc-ls --fancy
If this is caused by a crashed container, you should have hung on a line starting with recvmsg
. Look up a few lines to find the corresponding connect
line. This will look like the below and will contain the name of the container which isn't responding.
connect(3, {sa_family=AF_FILE, path=@"/var/lib/lxc/name-of-container/command"}, 50) = 0
From here you can determine which container is having issues. To resolve this, you may be able to simply the kill the associated processes. As lxc-stop
isn't working, you'll need to identify the processes using the ps
command on the host machine. The command below will return a list of processes in a tree format allowing you to see which processes belong to a given lxc-start
command. Just send a kill -9
to the child processes to stop the container.
ps axjf