This is a slightly technical post which I just wanted to put out there to assist anyone who may run into the problem. If you're not using Linux Containers, you can safely skip over this post.

While using lxc, you may find that certain LXC commands fail to run properly which means you are unable to run various lxc utility commands like lxc-stop or lxc-ls. If you're running lots of containers on a host (like we do with Viaduct), you need to identify which container is causing issues.

To do this, you need to find which container is causing the command to hang. To do this, you can use strace on the command which is hanging.

strace lxc-ls --fancy

If this is caused by a crashed container, you should have hung on a line starting with recvmsg. Look up a few lines to find the corresponding connect line. This will look like the below and will contain the name of the container which isn't responding.

connect(3, {sa_family=AF_FILE, path=@"/var/lib/lxc/name-of-container/command"}, 50) = 0

From here you can determine which container is having issues. To resolve this, you may be able to simply the kill the associated processes. As lxc-stop isn't working, you'll need to identify the processes using the ps command on the host machine. The command below will return a list of processes in a tree format allowing you to see which processes belong to a given lxc-start command. Just send a kill -9 to the child processes to stop the container.

ps axjf

Screenshot

Tell us how you feel about this post?