Xen problems with VMs on 2.6.32-5-xen-amd64?

On Saturday there were some updates for Squeeze/Stable, for example a new/updated version of Xen hypervisor and kernels were downloaded and installed: 

gate:~# dir /var/cache/apt/archives/
base-files_6.0squeeze7_amd64.deb                 libxenstore3.0_4.0.1-5.6_amd64.deb
bind9-host_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb   linux-base_2.6.32-48_all.deb
dbus_1.2.24-4+squeeze2_amd64.deb                 linux-image-2.6.32-5-amd64_2.6.32-48_amd64.deb
dbus-x11_1.2.24-4+squeeze2_amd64.deb             linux-image-2.6.32-5-xen-amd64_2.6.32-48_amd64.deb
firmware-linux-free_2.6.32-48_all.deb            lock
gzip_1.3.12-9+squeeze1_amd64.deb                 openssh-client_1%3a5.5p1-6+squeeze3_amd64.deb
host_1%3a9.7.3.dfsg-1~squeeze9_all.deb           openssh-server_1%3a5.5p1-6+squeeze3_amd64.deb
libbind9-60_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb  openssl_0.9.8o-4squeeze14_amd64.deb
libcups2_1.4.4-7+squeeze3_amd64.deb              partial
libdbus-1-3_1.2.24-4+squeeze2_amd64.deb          perl_5.10.1-17squeeze5_amd64.deb
libdbus-glib-1-2_0.88-2.1+squeeze1_amd64.deb     perl-base_5.10.1-17squeeze5_amd64.deb
libdns69_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb     perl-modules_5.10.1-17squeeze5_all.deb
libisc62_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb     ssh_1%3a5.5p1-6+squeeze3_all.deb
libisccc60_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb   tzdata_2012g-0squeeze1_all.deb
libisccfg62_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb  xen-hypervisor-4.0-amd64_4.0.1-5.6_amd64.deb
libldap-2.4-2_2.4.23-7.3_amd64.deb               xen-linux-system-2.6.32-5-xen-amd64_2.6.32-48_amd64.deb
liblwres60_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb   xenstore-utils_4.0.1-5.6_amd64.deb
libperl5.10_5.10.1-17squeeze5_amd64.deb          xen-utils-4.0_4.0.1-5.6_amd64.deb
libssl0.9.8_0.9.8o-4squeeze14_amd64.deb

Unfortunately this update appears to be problematic on my Xen hosting server. This night it happened the second time that some of the virtual network interfaces disappeared or turned out to be non-working. For example I have two VMs: one running the webserver and one running the databases. Between these two VMs there’s a bridge on the dom0 and both VMs have a VIF to that (internal) bridge. What happens is that this bridge becomes inaccessible from within the webserver VM.

Sadly there’s not much to see in the log files. I just spotted this on dom0: 

Feb 26 01:01:29 gate kernel: [12697.907512] vif3.1: Frag is bigger than frame.
Feb 26 01:01:29 gate kernel: [12697.907550] vif3.1: fatal error; disabling device
Feb 26 01:01:29 gate kernel: [12697.919921] xenbr1: port 3(vif3.1) entering disabled state
Feb 26 01:22:00 gate kernel: [13928.644888] vif2.1: Frag is bigger than frame.
Feb 26 01:22:00 gate kernel: [13928.644920] vif2.1: fatal error; disabling device
Feb 26 01:22:00 gate kernel: [13928.663571] xenbr1: port 2(vif2.1) entering disabled state
Feb 26 01:40:44 gate kernel: [15052.629280] vif7.1: Frag is bigger than frame.
Feb 26 01:40:44 gate kernel: [15052.629314] vif7.1: fatal error; disabling device
Feb 26 01:40:44 gate kernel: [15052.641725] xenbr1: port 6(vif7.1) entering disabled state

This corresponds to the number of VMs having lost their internal connection to the bridge. On the webserver VM I see this output: 

Feb 26 01:59:01 vserv1 kernel: [16113.539767] IPv6: sending pkt_too_big to self
Feb 26 01:59:01 vserv1 kernel: [16113.539794] IPv6: sending pkt_too_big to self
Feb 26 02:30:54 vserv1 kernel: [18026.407517] IPv6: sending pkt_too_big to self
Feb 26 02:30:54 vserv1 kernel: [18026.407546] IPv6: sending pkt_too_big to self
Feb 26 02:30:54 vserv1 kernel: [18026.434761] IPv6: sending pkt_too_big to self
Feb 26 02:30:54 vserv1 kernel: [18026.434787] IPv6: sending pkt_too_big to self
Feb 26 03:39:16 vserv1 kernel: [22128.768214] IPv6: sending pkt_too_big to self
Feb 26 03:39:16 vserv1 kernel: [22128.768240] IPv6: sending pkt_too_big to self
Feb 26 04:39:51 vserv1 kernel: [25764.250170] IPv6: sending pkt_too_big to self
Feb 26 04:39:51 vserv1 kernel: [25764.250196] IPv6: sending pkt_too_big to self

Rebooting the VMs will result in a non-working VM as it will get paused on creation and Xen scripts complain about not working hotplug scripts and Xen logs shows this: 

[2013-02-25 13:06:34 5470] DEBUG (XendDomainInfo:101)
XendDomainInfo.create(['vm', ['name', 'vserv1'], ['memory', '2048'],
['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash',
'restart'], ['on_xend_start', 'ignore'], ['on_xend_stop', 'ignore'],
['vcpus', '2'], ['oos', 1], ['bootloader', '/usr/lib/xen-4.0/bin/pygrub'],
['bootloader_args', ''], ['image', ['linux', ['root', '/dev/xvdb '],
['videoram', 4], ['tsc_mode', 0], ['nomigrate', 0]]], ['s3_integrity', 1],
['device', ['vbd', ['uname', 'phy:/dev/lv/vserv1-boot'], ['dev', 'xvda'],
['mode', 'w']]], ['device', ['vbd', ['uname', 'phy:/dev/lv/vserv1-disk'],
['dev', 'xvdb'], ['mode', 'w']]], ['device', ['vbd', ['uname',
'phy:/dev/lv/vserv1-swap'], ['dev', 'xvdc'], ['mode', 'w']]], ['device',
['vbd', ['uname', 'phy:/dev/lv/vserv1mirror'], ['dev', 'xvdd'], ['mode',
'w']]]])
[2013-02-25 13:06:34 5470] DEBUG (XendDomainInfo:2508)
XendDomainInfo.constructDomain
[2013-02-25 13:06:34 5470] DEBUG (balloon:220) Balloon: 2100000 KiB free;
need 16384; done.
[2013-02-25 13:06:34 5470] DEBUG (XendDomain:464) Adding Domain: 39
[2013-02-25 13:06:34 5470] DEBUG (XendDomainInfo:2818)
XendDomainInfo.initDomain: 39 256
[2013-02-25 13:06:34 5781] DEBUG (XendBootloader:113) Launching bootloader
as ['/usr/lib/xen-4.0/bin/pygrub', '--args=root=/dev/xvdb  ',
'--output=/var/run/xend/boot/xenbl.6040', '/dev/lv/vserv1-boot'].
[2013-02-25 13:06:39 5470] DEBUG (XendDomainInfo:2845)
_initDomain:shadow_memory=0x0, memory_static_max=0x80000000,
memory_static_min=0x0.
[2013-02-25 13:06:39 5470] INFO (image:182) buildDomain os=linux dom=39
vcpus=2
[2013-02-25 13:06:39 5470] DEBUG (image:721) domid	    = 39
[2013-02-25 13:06:39 5470] DEBUG (image:722) memsize	    = 2048
[2013-02-25 13:06:39 5470] DEBUG (image:723) image	    =
/var/run/xend/boot/boot_kernel.xj7W_t
[2013-02-25 13:06:39 5470] DEBUG (image:724) store_evtchn   = 1
[2013-02-25 13:06:39 5470] DEBUG (image:725) console_evtchn = 2
[2013-02-25 13:06:39 5470] DEBUG (image:726) cmdline	    =
root=UUID=ed71a39f-fd2e-4035-8557-493686baa151 ro root=/dev/xvdb
[2013-02-25 13:06:39 5470] DEBUG (image:727) ramdisk	    =
/var/run/xend/boot/boot_ramdisk.QavuAo
[2013-02-25 13:06:39 5470] DEBUG (image:728) vcpus	    = 2
[2013-02-25 13:06:39 5470] DEBUG (image:729) features	    =
[2013-02-25 13:06:39 5470] DEBUG (image:730) flags	    = 0
[2013-02-25 13:06:39 5470] DEBUG (image:731) superpages     = 0
[2013-02-25 13:06:40 5470] INFO (XendDomainInfo:2367) createDevice: vbd :
{'uuid': '04d99772-cf27-aecf-2d1b-c73eaf657410', 'bootable': 1, 'driver':
'paravirtualised', 'dev': 'xvda', 'uname': 'phy:/dev/lv/vserv1-boot',
'mode': 'w'}
[2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing
{'virtual-device': '51712', 'device-type': 'disk', 'protocol':
'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend':
'/local/domain/0/backend/vbd/39/51712'} to
/local/domain/39/device/vbd/51712.
[2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing
{'domain': 'vserv1', 'frontend': '/local/domain/39/device/vbd/51712',
'uuid': '04d99772-cf27-aecf-2d1b-c73eaf657410', 'bootable': '1', 'dev':
'xvda', 'state': '1', 'params': '/dev/lv/vserv1-boot', 'mode': 'w',
'online': '1', 'frontend-id': '39', 'type': 'phy'} to
/local/domain/0/backend/vbd/39/51712.
[2013-02-25 13:06:40 5470] INFO (XendDomainInfo:2367) createDevice: vbd :
{'uuid': 'e46cb89f-3e54-41d2-53bd-759ed6c690d2', 'bootable': 0, 'driver':
'paravirtualised', 'dev': 'xvdb', 'uname': 'phy:/dev/lv/vserv1-disk',
'mode': 'w'}
[2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing
{'virtual-device': '51728', 'device-type': 'disk', 'protocol':
'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend':
'/local/domain/0/backend/vbd/39/51728'} to
/local/domain/39/device/vbd/51728.
[2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing
{'domain': 'vserv1', 'frontend': '/local/domain/39/device/vbd/51728',
'uuid': 'e46cb89f-3e54-41d2-53bd-759ed6c690d2', 'bootable': '0', 'dev':
'xvdb', 'state': '1', 'params': '/dev/lv/vserv1-disk', 'mode': 'w',
'online': '1', 'frontend-id': '39', 'type': 'phy'} to
/local/domain/0/backend/vbd/39/51728.
[2013-02-25 13:06:40 5470] INFO (XendDomainInfo:2367) createDevice: vbd :
{'uuid': 'e2d61860-7448-1843-3935-6b63c5d2878e', 'bootable': 0, 'driver':
'paravirtualised', 'dev': 'xvdc', 'uname': 'phy:/dev/lv/vserv1-swap',
'mode': 'w'}
[2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing
{'virtual-device': '51744', 'device-type': 'disk', 'protocol':
'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend':
'/local/domain/0/backend/vbd/39/51744'} to
/local/domain/39/device/vbd/51744.
[2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing
{'domain': 'vserv1', 'frontend': '/local/domain/39/device/vbd/51744',
'uuid': 'e2d61860-7448-1843-3935-6b63c5d2878e', 'bootable': '0', 'dev':
'xvdc', 'state': '1', 'params': '/dev/lv/vserv1-swap', 'mode': 'w',
'online': '1', 'frontend-id': '39', 'type': 'phy'} to
/local/domain/0/backend/vbd/39/51744.
[2013-02-25 13:06:40 5470] INFO (XendDomainInfo:2367) createDevice: vbd :
{'uuid': 'd314a46e-1ce9-0e8d-b009-3f08e29735f5', 'bootable': 0, 'driver':
'paravirtualised', 'dev': 'xvdd', 'uname': 'phy:/dev/lv/vserv1mirror',
'mode': 'w'}
[2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing
{'virtual-device': '51760', 'device-type': 'disk', 'protocol':
'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend':
'/local/domain/0/backend/vbd/39/51760'} to
/local/domain/39/device/vbd/51760.
[2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing
{'domain': 'vserv1', 'frontend': '/local/domain/39/device/vbd/51760',
'uuid': 'd314a46e-1ce9-0e8d-b009-3f08e29735f5', 'bootable': '0', 'dev':
'xvdd', 'state': '1', 'params': '/dev/lv/vserv1mirror', 'mode': 'w',
'online': '1', 'frontend-id': '39', 'type': 'phy'} to
/local/domain/0/backend/vbd/39/51760.
[2013-02-25 13:06:40 5470] DEBUG (XendDomainInfo:3400) Storing VM details:
{'on_xend_stop': 'ignore', 'shadow_memory': '0', 'uuid':
'04541225-6d3c-3cae-a4c4-0b6d4ccfac7a', 'on_reboot': 'restart',
'start_time': '1361794000.37', 'on_poweroff': 'destroy', 'bootloader_args':
'', 'on_xend_start': 'ignore', 'on_crash': 'restart', 'xend/restart_count':
'0', 'vcpus': '2', 'vcpu_avail': '3', 'bootloader':
'/usr/lib/xen-4.0/bin/pygrub', 'image': "(linux (kernel ) (args
'root=/dev/xvdb  ') (superpages 0) (tsc_mode 0) (videoram 4) (pci ())
(nomigrate 0) (notes (HV_START_LOW 18446603336221196288) (FEATURES
'!writable_page_tables|pae_pgdir_above_4gb') (VIRT_BASE
18446744071562067968) (GUEST_VERSION 2.6) (PADDR_OFFSET 0) (GUEST_OS linux)
(HYPERCALL_PAGE 18446744071578882048) (LOADER generic) (SUSPEND_CANCEL 1)
(PAE_MODE yes) (ENTRY 18446744071584289280) (XEN_VERSION xen-3.0)))",
'name': 'vserv1'}
[2013-02-25 13:06:40 5470] DEBUG (XendDomainInfo:1804) Storing domain
details: {'console/ring-ref': '2143834', 'image/entry':
'18446744071584289280', 'console/port': '2', 'store/ring-ref': '2143835',
'image/loader': 'generic', 'vm':
'/vm/04541225-6d3c-3cae-a4c4-0b6d4ccfac7a',
'control/platform-feature-multiprocessor-suspend': '1',
'image/hv-start-low': '18446603336221196288', 'image/guest-os': 'linux',
'cpu/1/availability': 'online', 'image/virt-base': '18446744071562067968',
'memory/target': '2097152', 'image/guest-version': '2.6', 'image/pae-mode':
'yes', 'description': '', 'console/limit': '1048576', 'image/paddr-offset':
'0', 'image/hypercall-page': '18446744071578882048',
'image/suspend-cancel': '1', 'cpu/0/availability': 'online',
'image/features/pae-pgdir-above-4gb': '1',
'image/features/writable-page-tables': '0', 'console/type': 'xenconsoled',
'name': 'vserv1', 'domid': '39', 'image/xen-version': 'xen-3.0',
'store/port': '1'}
[2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing
{'protocol': 'x86_64-abi', 'state': '1', 'backend-id': '0', 'backend':
'/local/domain/0/backend/console/39/0'} to
/local/domain/39/device/console/0.
[2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing
{'domain': 'vserv1', 'frontend': '/local/domain/39/device/console/0',
'uuid': 'c8819aed-c78f-02b8-0ef7-1600abd15add', 'frontend-id': '39',
'state': '1', 'location': '2', 'online': '1', 'protocol': 'vt100'} to
/local/domain/0/backend/console/39/0.
[2013-02-25 13:06:40 5470] DEBUG (XendDomainInfo:1891)
XendDomainInfo.handleShutdownWatch
[2013-02-25 13:06:40 5470] DEBUG (DevController:139) Waiting for devices
vif2.
[2013-02-25 13:06:40 5470] DEBUG (DevController:139) Waiting for devices
vif.
[2013-02-25 13:06:40 5470] DEBUG (DevController:139) Waiting for devices
vscsi.
[2013-02-25 13:06:40 5470] DEBUG (DevController:139) Waiting for devices
vbd.
[2013-02-25 13:06:40 5470] DEBUG (DevController:144) Waiting for 51712.
[2013-02-25 13:06:40 5470] DEBUG (DevController:628) hotplugStatusCallback
/local/domain/0/backend/vbd/39/51712/hotplug-status.

From my point of view, either Xen hypervisor or the kernel seems to be broken, but it’s hard to tell for me. Maybe it would be easier to update the system from Squeeze to Wheezy and get rid off this problem that way? On the other hand this would solve the problem.

Are there any other people experiencing problems with that latest update of Xen and kernel?

UPDATE: Bug #701744 filed.

UPDATE 2: Downgrading kernel and hypervisor on dom0 to the following packages from snapshot.debian.org seems to have solved that problem.

  • xen-hypervisor-4.0-amd64_4.0.1-5.4_amd64.deb
  • linux-image-2.6.32-5-xen-amd64_2.6.32-46_amd64.deb

Note that I haven’t tested yet with the newest kernel update from DSA-2632-1.

UPDATE 3: After running with older versions of hypervisor and kernel I now have upgraded the hypervisor to xen-hypervisor-4.0-amd64_4.0.1-5.6_amd64.deb and rebooted. Let’s see whether it’s running stable or not. If not it’s the hypervisor, if yes, it’s the kernel.

UPDATE 4: apparently it’s the kernel which is buggy. And the kernel from DSA-2632-1 is affected as well. So, the current workaround is to downgrad to linux-image-2.6.32-5-xen-amd64_2.6.32-46_amd64.deb.

Uncategorized

5 thoughts on “Xen problems with VMs on 2.6.32-5-xen-amd64?

  1. Xen broken… maybe
    Hi,

    Same bad experience for me too. No clue in log files.
    After upgrade and reboot, no network for any DomU. No access via the console too.

    I have disable “auto-start” of DomU, and I have rebooted the Dom0… And then, I have started all DomU by the hands. And it’s worked. And I don’t known why, it’s the first time I experience this problem.

    I notice that there, already, a new upgrade for the xen kernel !

    Kind regards

    1. For me it works when I just

      For me it works when I just reboot the dom0. The previously broken VIFs will then work again. This makes me believe that the cause might be within the Xen code for the VIFs and xend as it manages starting/removing VIFs when a new domU is started or shutdown.

      Can you try the new kernel? I’ll just be able to do so today evening…

  2. tagged “Bug” but no bug filed/referenced?!?
    Hi,

    ranting on ones blog is surely a way to vent some frustration, but if you really want to see this fixed, you should (search for an existing bug or) file a bug and refer to that bug in your blog post.

    Else this is just useless noise for most people.

    cheers,
    Holger

    1. Usually I only do file

      Usually I only do file bugreports when I either know what is causing the issue or what package is affected. In this case I don’t have any hard evidence that it’s the fault of Xen hypervisor or if it’s a problem within the kernel package. Filing a bug with just the comment “It’s broken!” doesn’t help anyone!

      Instead I seek for comments whether this is really a bug and in which package. Maybe it could be also a hardware fault? It’s more useless noise to file useless bugreports. It’s not the first time that I started with a blog post and reported a bug afterwards when I collected more data for the bugreport.

  3. It’s the kernel
    A relief to finally find this. I saw the same thing a few months after an upgrade on i686 for the same version (simultaneous with the hypervisor)
    linux-image-2.6.32-5-686:i386 2.6.32-46 -> 2.6.32-48squeeze1

    Looks like the regression and some patches have been identified: http://lists.xen.org/archives/html/xen-devel/2013-05/msg02604.html

    Jun 14 02:22:14 redacted kernel: [8686890.476295] vif12.0: Frag is bigger than frame.
    Jun 14 02:22:14 redacted kernel: [8686890.480277] vif12.0: fatal error; disabling device
    Jun 14 02:22:14 redacted kernel: [8686890.500255] eth0: port 3(vif12.0) entering disabled state

    and on trying to reconfigure I get “Error: Device 0 (vif) could not be connected. Hotplug scripts not working.” If this happens frequently we’ll need to downgrade.

Comments are closed.