Block Tap User-level Interfaces Andrew Warfield andrew.warfield@cl.cam.ac.uk February 8, 2005 NOTE #1: The blktap is _experimental_ code. It works for me. Your mileage may vary. Don't use it for anything important. Please. ;) NOTE #2: All of the interfaces here are likely to change. This is all early code, and I am checking it in because others want to play with it. If you use it for anything, please let me know! Overview: --------- This directory contains a library and set of example applications for the block tap device. The block tap hooks into the split block device interfaces above Xen allowing them to be extended. This extension can be done in userspace with the help of a library. The tap can be installed either as an interposition domain in between a frontend and backend driver pair, or as a terminating backend, in which case it is responsible for serving all requests itself. There are two reasons that you might want to use the tap, corresponding to these configurations: 1. To examine or modify a stream of block requests while they are in-flight (e.g. to encrypt data, or add data-driven watchpoints) 2. To prototype a new backend driver, serving requests from the tap rather than passing them along to the XenLinux blkback driver. (e.g. to forward block requests to a remote host) Interface: ---------- At the moment, the tap interface is similar in spirit to that of the Linux netfilter. Requests are messages from a client (frontend) domain to a disk (backend) domain. Responses are messages travelling back, acknowledging the completion of a request. the library allows chains of functions to be attached to these events. In addition, hooks may be attached to handle control messages, which signify things like connections from new domains. At present the control messages especially expose a lot of the underlying driver interfaces. This may change in the future in order to simplify writing hooks. Here are the public interfaces: These allow hook functions to be chained: void blktap_register_ctrl_hook(char *name, int (*ch)(control_msg_t *)); void blktap_register_request_hook(char *name, int (*rh)(blkif_request_t *)); void blktap_register_response_hook(char *name, int (*rh)(blkif_response_t *)); This allows a response to be injected, in the case where a request has been removed using BLKTAP_STOLEN. void blktap_inject_response(blkif_response_t *); These let you add file descriptors and handlers to the main poll loop: int blktap_attach_poll(int fd, short events, int (*func)(int)); void blktap_detach_poll(int fd); This starts the main poll loop: int blktap_listen(void); Example: -------- blkimage.c uses an image on the local file system to serve requests to a domain. Here's what it looks like: ---[blkimg.c]--- /* blkimg.c * * file-backed disk. */ #include "blktaplib.h" #include "blkimglib.h" int main(int argc, char *argv[]) { image_init(); blktap_register_ctrl_hook("image_control", image_control); blktap_register_request_hook("image_request", image_request); blktap_listen(); return 0; } ---------------- All of the real work is in blkimglib.c, but this illustrates the actual tap interface well enough. image_control() will be called with all control messages. image_request() handles requests. As it reads from an on-disk image file, no requests are ever passed on to a backend, and so there will be no responses to process -- so there is nothing registered as a response hook. Other examples: --------------- Here is a list of other examples in the directory: Things that terminate a block request stream: blkimg - Use a image file/device to serve requests blkgnbd - Use a remote gnbd server to serve requests blkaio - Use libaio... (DOES NOT WORK) Things that don't: blkdump - Print in-flight requests. blkcow - Really inefficient copy-on-write disks using libdb to store writes. There are examples of plugging these things together, for instance blkcowgnbd is a read-only gnbd device with copy-on-write to a local file. TODO: ----- - Make session tracking work. At the moment these generally just handle a single front-end client at a time. - Integrate with Xend. Need to cleanly pass a image identifier in the connect message. - Make an asynchronous file-io terminator. The libaio attempt is tragically stalled because mapped foreign pages make pfn_valid fail (they are VM_IO), and so cannot be passed to aio as targets. A better solution may be to tear the disk interfaces out of the real backend and expose them somehow. - Make CoW suck less. - Do something more along the lines of dynamic linking for the plugins, so thatthey don't all need a new main().