Remaining Checkpoint work

Kip Macy listed a number of tasks left for his checkpoint/restart work, not all of which he will be covering. If this sounds interesting to you, jump in!

“The following are the steps are required to make the mechanism complete, not all of which I neccessarily intend to do in the near future.

1) set a default disposition for SIGCKPT and SIGCKPTEXIT for non-checkpoint-aware applications, thus allowing them to be checkpointed/migrated

2) write out the inode and dev_t for the application itself

3) add new version of ckpt_restore system call that will exec the file

4) at checkpoint, iterate through the file descriptor table and write out the index, inode and dev_t for each vnode right after the point where the signal state is stored in the checkpoint file

5) reopen files at the appropriate indexes from the inode+dev_t on restore

6) re-factor elf_coredump to take a struct file * so that one write checkpoint state to a socket

7) re-factor new ckpt_restore function to ignore offsets so that it can read from a socket

8) write a simple daemon to accept connections and pass the descriptor to the new version of ckpt_restore

9) add support for multi-threaded core dumps to DragonFly (5 line change) The only reason I put this last is because 95% of the work is in downloading the LinuxThreads library and writing a test application.

At this point DragonFly will have support for process migration of multi-threaded processes. If someone wants to, adding a unified pid space (bproc) would not be hard. The above mentioned process migration support provides substantially more functionality than bproc’s vmadump.

If someone else wants to chip in I’d be happy to provide guidance. For me all the fun is in figuring out how to do something. At this point the remainder of the work is a SMOP :-).”