aboutsummaryrefslogtreecommitdiffstats
path: root/tools/blktap2
diff options
context:
space:
mode:
authorDr. Greg Wettstein <greg@wind.enjellic.com>2013-03-19 07:26:33 +0000
committerIan Jackson <Ian.Jackson@eu.citrix.com>2013-03-25 12:40:30 +0000
commit6cffb2b469a55032a2900ccb8776c0082f346758 (patch)
treeb91a7b4e516ca2ff16421f5ae1f558cfbeff1a29 /tools/blktap2
parent753d16c1d0d5e194546de1a9f67034d3e6576844 (diff)
downloadxen-6cffb2b469a55032a2900ccb8776c0082f346758.tar.gz
xen-6cffb2b469a55032a2900ccb8776c0082f346758.tar.bz2
xen-6cffb2b469a55032a2900ccb8776c0082f346758.zip
tools: Retry blktap2 tapdisk message on interrupt.
Re-start blktap2 IPC select call on interrupt. We hunted this miserable bug for a long time. The teardown of a blktap2 tapdisk instance is being carried out inconsistently up to and including the 4.2.1 release. The problem appears to be a classic 'Heisenbug' which disappears if a single function call is added to the tapdisk shutdown path. It is likely this bug has been in existence for the life of the blktap2 code. Control messages to manipulate a tapdisk instance are sent over a UNIX domain socket. A select call is used on both the read and write paths to wait on I/O and to set a timeout for the transmission and reception of the control plane messages. The existing code fails receipt or transmission of the control message on any type of error return from the select call. The xl control process receives an interrupt while waiting in the select call which in turn causes an error return with SIGINT as the return code. This prematurely terminates the teardown of the tapdisk instance leaving it in various states of shutdown. Since multiple messages are needed to implement a full teardown the tapdisk instance can be left in various states ranging from fully connected to only the minor being left allocated. The fix is straight forward. Check the return code from the select call and re-try read or write of the control message if errno is sent to EINTR. The problem manifests itself in the read path but there appears to be little reason to not add the fix to the write path as well. Both paths appear to be cut-and-paste copies of each other. Signed-off-by: Dr. Greg Wettstein <greg@enjellic.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Diffstat (limited to 'tools/blktap2')
-rw-r--r--tools/blktap2/control/tap-ctl-ipc.c10
1 files changed, 8 insertions, 2 deletions
diff --git a/tools/blktap2/control/tap-ctl-ipc.c b/tools/blktap2/control/tap-ctl-ipc.c
index cc6160e9da..c8aad1ccda 100644
--- a/tools/blktap2/control/tap-ctl-ipc.c
+++ b/tools/blktap2/control/tap-ctl-ipc.c
@@ -64,8 +64,11 @@ tap_ctl_read_message(int fd, tapdisk_message_t *message, int timeout)
FD_SET(fd, &readfds);
ret = select(fd + 1, &readfds, NULL, NULL, t);
- if (ret == -1)
+ if (ret == -1) {
+ if (errno == EINTR)
+ continue;
break;
+ }
else if (FD_ISSET(fd, &readfds)) {
ret = read(fd, message + offset, len - offset);
if (ret <= 0)
@@ -114,8 +117,11 @@ tap_ctl_write_message(int fd, tapdisk_message_t *message, int timeout)
* bit more time than expected. */
ret = select(fd + 1, NULL, &writefds, NULL, t);
- if (ret == -1)
+ if (ret == -1) {
+ if (errno == EINTR)
+ continue;
break;
+ }
else if (FD_ISSET(fd, &writefds)) {
ret = write(fd, message + offset, len - offset);
if (ret <= 0)