In my last blog post, I covered three frequently asked questions about using the Xilinx Zynq-7000 Virtual Platform as a VirtualBox appliance. Today, I'll cover the next most frequently asked question. It is related to simulation performance. This should not be considered an official benchmark as it is only a quick trial of various configurations on a single computer, but hopefully it provides some useful performance information.
How much slower is it to run the Xilinx Zynq Virtual Platform with VirtualBox compared to a native Linux machine?
For some time I couldn't give a good answer because doing a comparison requires simulations to be run on the same hardware. Normally, VirtualBox users have a Windows host machine and a Linux guest. Other users have native Linux and no ability to run Windows. This makes it difficult to get an apples-to-apples comparison of various configurations. Since I'm not brave enough to turn my Cadence supplied Windows laptop into a dual-boot system, I purchased a new laptop and installed native Linux so I can compare both scenarios on the same hardware:
- Native Ubuntu
- Ubuntu Host with VirtualBox Ubuntu Linux guest
This should give some insight into the speed difference between native Linux and VirtualBox. Since the same machine can also boot Windows 7, I may also run VirtualBox with the Windows 7 host to see if there is any difference between the Ubuntu host and the Windows host, but there are only so many hours in a day so I'm going to skip this for now.
The hardware is a Lenovo Z575 laptop. It's a quad-core AMD A8 processor and 8 GB RAM. There are also some operating system differences. The VirtualBox guest is Ubuntu 10.10 and the native Linux is Ubuntu 12.04.
To compare performance I used two tests running on the Zynq Virtual Platform. The first is a standard SMP Linux booting and running an application, and the second is a graphics test. The graphics test is a bare metal executable that tests a particular Xylon graphics IP that can be added to the Zynq Virtual Platform. The graphics test runs only on a single ARM Cortex-A9 core.
The Linux test was run for 5 minutes of simulation time during which Linux was booted and an application was run; this is the now famous MRI brain picture. The graphics test was run to completion which is a little less than 3 minutes of simulated time. To keep things simple I recorded only the elapsed time reported by the simulation.
The results are shown below; all elapsed times are in seconds.
SMP Linux (300 simulated sec)
Graphics (174 simulated sec)
VirtualBox with Linux Host
The Linux test was as expected; there was a performance penalty for using VirtualBox, but the graphics test was actually faster on VirtualBox!
Thinking it over, I realized the virtual machine was configured for 2 processors and the native Linux machine has 4 processors. I wondered if there was a scheduling issue on the native Linux making it slower with more cores because the process was migrating back and forth between cores.
Using the system performance monitor I did see some process migration, but not too much. I used the Linux taskset command to set the processor affinity for the ncsim process to run only on 1 core. This removed the migration, but had little or no effect on the elapsed time of the simulation. As expected, the Linux scheduler is fine and doesn't need any help.
The simulator is essentially a single-threaded program that will take 100% of one CPU when running. There is little or no disk or network I/O on these tests so it's a CPU bound process.
Here is a screen shot of the system monitor for the dual-core VirtualBox configuration. One CPU is pegged at 100% and there is some process migration as the test is running.
Still somewhat perplexed about the graphics test, I tried setting the number of processors in VirtualBox to 1 and then 4 and ran the graphics test again. The results are below.
Number of VirtualBox CPUs
Graphics Test Elapsed Time in sec
The results show the dual-core configuration is the best (which is why I configured the Zynq virtual appliance this way in the first place). Using only 1 core doesn't leave any room for the other processes to run, and ncsim doesn't get a full 100% of a CPU. Using 4 cores just adds scheduling overhead and doesn't help a single-threaded program.
Still perplexed, I went back to the Native Linux configuration to figure out why the graphics test was slower. One thing I noticed looking at the system monitor was that the compiz window manager was taking quite a lot of CPU time even when not much was happening. As a guess I logged out of the machine and logged back in again using the Unity 2D option and ran the graphics tests again.
I clicked the Ubuntu logo on the login screen to get the choices. I could also have tried GNOME Classic, but I didn't.
Sure enough, this was the cause of the slow graphics test. I ran both the SMP Linux test and the graphics test again using Unity 2D. The results are below.
Although the Linux test was the same for both 3D and 2D Unity, the graphics test was much different. Using Unity 2D, it was now much faster than the VirtualBox elapsed time.
In summary, I was able to answer the question of native Linux compared to VirtualBox. There is definitely a performance penalty for VirtualBox, and it was actually more than I would have expected.
I also enjoy computers and software because whenever I set out to learning something, there is always something more that wasn't obvious at the start.
My goal was to provide a rough estimate of the performance difference of native Linux vs. VirtualBox for those engineers who are deciding how to setup and use the Zynq Virtual Platform. Clearly, the VirtualBox image is very easy to setup and run, but for those engineers looking for maximum performance a native Linux installation is the best way to go.