- About Us
Xen 3.0 and the Art of Virtualisation
Ian Pratt of the University of Cambridge described features both in the upcoming version 3.0 release of the Xen virtualisation system, and of virtualisation more generally. Xen's current stable release is 2.4. I walked away with a better understanding of virtualisation than I previously had.
Virtualisation, Pratt explained, is a single operating system image creating the appearance of multiple operating systems on one system. In essence, it is chroot on steroids. Full virtualisation is the comprehensive emulation of an existing system.
Para-virtualisation is similar, but in this scenario, a guest operating system running on top of a real operating system is aware that it is not in actual control of the computer and is only a virtual machine. Xen and User-mode Linux both fall under this category of virtualisation.
The x86 architecture common to most desktop computers today is not designed for virtualisation, and Pratt described it as a bit of a pig to work with for it.
Pratt asked the question, "why virtualise?" and provided fairly straightforward answers to the question.
Many data-centres have hundreds or thousands of machines running single operating systems, often each running a single piece of software or service. With virtualisation, each one of those machines can host several operating systems, each running their own set of services, and thus massively reduce the amount of hardware needed for the operation.
Xen takes this one step further and allows clusters of virtual machine hosts with load balancing and fail-over systems.
Pratt explained that if a Xen virtual machine host in a Xen cluster detects imminent hardware failure, it can hand off its virtual machine guest operating systems to another node and die peacefully, without taking the services it was hosting with it. Meanwhile, people using the services may not even be aware that anything changed as they would continue more or less uninterrupted.
Using the same principal, the Xen virtual machine hosting clusters allow load balancing. If several virtual machines are running across a few hosts, the host cluster can transfer busier virtual machines to less busy hosts to avoid overloading any one node in that cluster. This allows an even higher number of virtual machines to run on the same amount of hardware and can serve to further reduce hardware costs for an organisation.
Within a virtual machine host server, each virtual machine should be contained, explained Pratt, to reduce any risk should a virtual machine become infected with malicious software or otherwise suffer some kind of problem to other virtual machines on the same server.
In order to run Xen, only the kernel needs replacing. No software above that has to be aware of its new role as a slave operating system within a larger system. Xen currently works with Linux versions 2.4, 2.6(.12), OpenBSD, FreeBSD, Plan 9, and Solaris at this point. Because guest kernels have to communicate with hardware long any other kernels, they must be patched to be aware of their parent operating system and talk to it through Xen. A guest kernel attempting to make direct contact with the hardware on the system will likely fail.
Modifications to the Linux 2.6 kernel to make it work with Xen were limited to changes in the arch/ kernel source subdirectory, claimed Pratt. Linux, he said, is very portable.
Virtualised kernels have to understand two sets of times, while normal kernels only have to be aware of one, noted Pratt.
A normal kernel that is not in a virtual machine has full access to all the hardware at all times on the system. Its sense of time is real. A second going by in kernel time is a second going by on the clock on the wall. However, when a kernel is being virtualised, a second going by for the kernel can be several seconds of real time as it is sharing the hardware with all the other kernels on that same computer. Therefore a virtualised kernel must be aware of both real wall clock time, and virtual processor time - the time which it has actual access to the hardware.
Among the features coming in Xen 3.0 is support for X86_64 and for SMP systems. Coming soon to a Xen near you is the ability for guest kernels to use virtual CPUs up to a maximum of 32 per system (even if there are not that many real CPUs!) and add and remove them while running, taking hot swapping to a whole new virtual level.
While I do not fully understand memory rings, perhaps someone who does can elaborate in comments, Pratt explained how Xen runs under 32-bit x86 versus 64-bit x86 in the context of memory rings. In X86_32, Xen runs in ring 0, the guest kernel runs in ring 1, and the user-space provided to the virtual machine runs in ring 3. In X86_64, Xen runs in ring 0 and the virtual machine's user-space runs in ring 3, but this time, the guest kernel also runs in ring 3 because of the massive memory address space provided by the extra 32 bits. With 8 terabytes of memory address space available, Xen can assign different large blocks of memory using widely separate addresses where it would be more constrained under the 32 bit model.
The goal of the SMP support system in Xen is to make it both decent and secure. SMP scheduling, however, is difficult. Gang scheduling, where multiple jobs are sent to multiple CPUs at the same time, said Pratt, can cause CPU cycles to be wasted, and so processes have to be dynamically managed to maintain efficiency.
For memory management, Pratt said, Xen operates differently from other virtualisation systems. It assigns page-tables for kernel and user-space in virtual machines to use, but does not control them once assigned. For discussion between kernel-space and user-space memory, however, requests do have to be made through the Xen server. Virtual machines are restricted to memory they own and cannot leave that memory space, except under special, controlled shared memory circumstances between virtual machines.
The Xen team is working toward the goal of having unmodified, original kernels run under Xen, allowing legacy Linux kernels, Windows, and other operating systems to run on top of Xen without knowing that they are inside a virtual machine. Before that can happen though, Xen needs to be able to intercept all system calls from the guest kernels that can cause failures and handle them as if Xen is not there.
Pratt returned to the topic of load balancing and explained the process of transferring a virtual machine from one host in a Xen cluster to another.
Assuming two nodes of a cluster are on a good network together, a 1GB memory image would take 8 seconds in ideal circumstances to transfer to another host before it could be resumed. This is a lengthly down-time that can be noticed by mission critical services and users, so a better system had to be created to transfer a running virtual machine from one node to another.
The solution they came up with was to take ten percent of the resources used by the process moving to transfer it to its new home, thus not significantly impacting its performance in the meantime. The entire memory block in which the virtual machine is operating is then transferred to its new home -- repeatedly. Each time, only those things in memory which have changed since the last copy are transferred, and because not everything changes, each cycle goes a little bit faster, and fewer things change. Eventually, there are so few differences between the old and new host's memory for the virtual machine that the virtual machine is killed off, the last changes in memory are copied over, and the virtual machine is restarted at its new location. Total down-time in the case of a busy webserver he showed statistics for was on the order of 165 milliseconds, after approximately a minute and a half of copying memory over in preparation.
A virtual machine running a Quake 3 server while grad students played the game managed the transition with down-time ranging from 40 to 50 milliseconds, causing the grad students to not even be aware that any changes were taking place.
Pratt said that the road-map for Xen 3.1 sees improved performance, enhanced control tools, improved tuning and optimisation, and less manual configuration to make it work.
He commented that Xen has a vibrant developer community and strong vendor support which is assisting in the development of the project.
Intel architect Gordon McFadden ran another virtualisation-related talk in the afternoon entitled: "Case study: Usage of Virtualised GNU/Linux to Support Binary Testing Across Multiple Distributions".
The basic problem that faced McFadden was that he was charged with running multiple Linux Standard Base tests on multiple distributions on multiple platforms, repeatedly, and could not acquire additional hardware to perform the task.
He described the LSB tests as time consuming, taking up to eight hours each, but not hard on the CPU. The logical solution was to run the tests concurrently using virtual machines. As a test was launched and set under way on one virtual machine on a real machine, instead of waiting for it to finish all day or for several hours, another test could be launched in another virtual machine on the same machine. McFadden's virtual machine of choice for the project was the User-Mode Linux (UML) virtual machine.
The setup McFadden and his team used was the Gentoo Linux distribution riding on top of kernel 2.6.11 and an XFS file-system. His reasoning for using Gentoo was not philosophical, but simply that he had not used it before and wanted to try something new. The file-systems of the virtual machines were ext2 or ext3, but appeared to the host system as flat files on the XFS file-system.
The tests were run on a 4GHz hyper-threaded system with 1GB of RAM, and tested Novell Linux Desktop 10, Red Hat Enterprise Linux 3 and 4, and Red Flag Linux. Each test case ran on 8GB virtual file-systems and were assigned either 384 or 512MB of RAM.
To setup the systems they were installed normally and dd'ed into flat files to be mounted and used by the UML kernel.
The guest kernels were instantiated, loaded, and popped an X-term for management. Each test could then be run by logging into the x-term, starting NFS on the guest system, and running a test.
The result of the whole processes was a quickly reusable hardware platform that was economic both fiscally and in lab and desk space, though McFadden did not relate the results of the LSB tests themselves.
Using virtual machines for testing has limitations as well, McFadden noted. For one, it can not be used to test hardware, and resource sharing can sometimes become a problem. For example, if two kernels are vying for control of one network interface, performance will be below par for both.
McFadden said he had alternatives to using virtualisation to run his tests, but using boot loaders to continually be loading different operating systems meant it would have taken a lot longer with long delays when multiple tasks could not be performed at the same time. His other alternative of using vmware was to be avoided as he was already familiar with vmware and wanted to learn something new.
More on page 2...