Sie sind hier


Xen randomly crashing server - part 2

Some weeks ago I blogged about "Xen randomly crashing server". The problem back then was that I couldn't get any information why the server reboots. Using a netconsole was not possible, because netconsole refused to work with the bridge that is used for Xen networking. Luckily my colocation partner connected the second network port of my server to the network so that I could use eth1 instead of the bridged eth0 for netconsole.

Today the server crashed several times and I was able to collect some more information than just the screenshots from IPMI/KVM console as shown in my last blog entry (full netconsole output is attached as a file): 

May 12 11:56:39 [829681.040596] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-2
May 12 11:56:39 [829681.040647] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.0a 01/03/2014
May 12 11:56:39 [829681.040701] task: ffffffff8181a460 ti: ffffffff81800000 task.ti: ffffffff81800000
May 12 11:56:39 [829681.040749] RIP: e030:[<ffffffff812b7e56>]
May 12 11:56:39  [<ffffffff812b7e56>] memcpy+0x6/0x110
May 12 11:56:39 [829681.040802] RSP: e02b:ffff880280e03a58  EFLAGS: 00010286
May 12 11:56:39 [829681.040834] RAX: ffff88026eec9070 RBX: ffff88023c8f6b00 RCX: 00000000000000ee
May 12 11:56:39 [829681.040880] RDX: 00000000000004a0 RSI: ffff88006cd1f000 RDI: ffff88026eec9422
May 12 11:56:39 [829681.040927] RBP: ffff880280e03b38 R08: 00000000000006c0 R09: ffff88026eec9062
May 12 11:56:39 [829681.040973] R10: 0100000000000000 R11: 00000000af9a2116 R12: ffff88023f440d00
May 12 11:56:39 [829681.041020] R13: ffff88006cd1ec66 R14: ffff88025dcf1cc0 R15: 00000000000004a8
May 12 11:56:39 [829681.041075] FS:  0000000000000000(0000) GS:ffff880280e00000(0000) knlGS:ffff880280e00000
May 12 11:56:39 [829681.041124] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
May 12 11:56:39 [829681.041153] CR2: ffff88006cd1f000 CR3: 0000000271ae8000 CR4: 0000000000042660
May 12 11:56:39 [829681.041202] Stack:
May 12 11:56:39 [829681.041225]  ffffffff814d38ff
May 12 11:56:39  ffff88025b5fa400
May 12 11:56:39  ffff880280e03aa8
May 12 11:56:39  9401294600a7012a
May 12 11:56:39 
May 12 11:56:39 [829681.041287]  0100000000000000
May 12 11:56:39  ffffffff814a000a
May 12 11:56:39  000000008181a460
May 12 11:56:39  00000000000080fe
May 12 11:56:39 
May 12 11:56:39 [829681.041346]  1ad902feff7ac40e
May 12 11:56:39  ffff88006c5fd980
May 12 11:56:39  ffff224afc3e1600
May 12 11:56:39  ffff88023f440d00
May 12 11:56:39 
May 12 11:56:39 [829681.041407] Call Trace:
May 12 11:56:39 [829681.041435]  <IRQ>
May 12 11:56:39 
May 12 11:56:39 [829681.041441]
May 12 11:56:39  [<ffffffff814d38ff>] ? ndisc_send_redirect+0x3bf/0x410
May 12 11:56:39 [829681.041506]  [<ffffffff814a000a>] ? ipmr_device_event+0x7a/0xd0
May 12 11:56:39 [829681.041548]  [<ffffffff814bc74c>] ? ip6_forward+0x71c/0x850
May 12 11:56:39 [829681.041585]  [<ffffffff814c9e54>] ? ip6_route_input+0xa4/0xd0
May 12 11:56:39 [829681.041621]  [<ffffffff8141f1a3>] ? __netif_receive_skb_core+0x543/0x750
May 12 11:56:39 [829681.041729]  [<ffffffff8141f42f>] ? netif_receive_skb_internal+0x1f/0x80
May 12 11:56:39 [829681.041771]  [<ffffffffa0585eb2>] ? br_handle_frame_finish+0x1c2/0x3c0 [bridge]
May 12 11:56:39 [829681.041821]  [<ffffffffa058c757>] ? br_nf_pre_routing_finish_ipv6+0xc7/0x160 [bridge]
May 12 11:56:39 [829681.041872]  [<ffffffffa058d0e2>] ? br_nf_pre_routing+0x562/0x630 [bridge]
May 12 11:56:39 [829681.041907]  [<ffffffffa0585cf0>] ? br_handle_local_finish+0x80/0x80 [bridge]
May 12 11:56:39 [829681.041955]  [<ffffffff8144fb65>] ? nf_iterate+0x65/0xa0
May 12 11:56:39 [829681.041987]  [<ffffffffa0585cf0>] ? br_handle_local_finish+0x80/0x80 [bridge]
May 12 11:56:39 [829681.042035]  [<ffffffff8144fc16>] ? nf_hook_slow+0x76/0x130
May 12 11:56:39 [829681.042067]  [<ffffffffa0585cf0>] ? br_handle_local_finish+0x80/0x80 [bridge]
May 12 11:56:39 [829681.042116]  [<ffffffffa0586220>] ? br_handle_frame+0x170/0x240 [bridge]
May 12 11:56:39 [829681.042148]  [<ffffffff8141ee24>] ? __netif_receive_skb_core+0x1c4/0x750
May 12 11:56:39 [829681.042185]  [<ffffffff81009f9c>] ? xen_clocksource_get_cycles+0x1c/0x20
May 12 11:56:39 [829681.042217]  [<ffffffff8141f42f>] ? netif_receive_skb_internal+0x1f/0x80
May 12 11:56:39 [829681.042251]  [<ffffffffa063f50f>] ? xenvif_tx_action+0x49f/0x920 [xen_netback]
May 12 11:56:39 [829681.042299]  [<ffffffffa06422f8>] ? xenvif_poll+0x28/0x70 [xen_netback]
May 12 11:56:39 [829681.042331]  [<ffffffff8141f7b0>] ? net_rx_action+0x140/0x240
May 12 11:56:39 [829681.042367]  [<ffffffff8106c6a1>] ? __do_softirq+0xf1/0x290
May 12 11:56:39 [829681.042397]  [<ffffffff8106ca75>] ? irq_exit+0x95/0xa0
May 12 11:56:39 [829681.042432]  [<ffffffff8135a285>] ? xen_evtchn_do_upcall+0x35/0x50
May 12 11:56:39 [829681.042469]  [<ffffffff8151669e>] ? xen_do_hypervisor_callback+0x1e/0x30
May 12 11:56:39 [829681.042499]  <EOI>
May 12 11:56:39 
May 12 11:56:39 [829681.042506]
May 12 11:56:39  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
May 12 11:56:39 [829681.042561]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
May 12 11:56:39 [829681.042592]  [<ffffffff81009e7c>] ? xen_safe_halt+0xc/0x20
May 12 11:56:39 [829681.042627]  [<ffffffff8101c8c9>] ? default_idle+0x19/0xb0
May 12 11:56:39 [829681.042666]  [<ffffffff810a83e0>] ? cpu_startup_entry+0x340/0x400
May 12 11:56:39 [829681.042705]  [<ffffffff81903076>] ? start_kernel+0x497/0x4a2
May 12 11:56:39 [829681.042735]  [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e
May 12 11:56:39 [829681.042767]  [<ffffffff81904f69>] ? xen_start_kernel+0x569/0x573
May 12 11:56:39 [829681.042797] Code:
May 12 11:56:39  <f3>
May 12 11:56:39 
May 12 11:56:39 [829681.043113] RIP
May 12 11:56:39  [<ffffffff812b7e56>] memcpy+0x6/0x110
May 12 11:56:39 [829681.043145]  RSP <ffff880280e03a58>
May 12 11:56:39 [829681.043170] CR2: ffff88006cd1f000
May 12 11:56:39 [829681.043488] ---[ end trace 1838cb62fe32daad ]---
May 12 11:56:39 [829681.048905] Kernel panic - not syncing: Fatal exception in interrupt
May 12 11:56:39 [829681.048978] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

I'm not that good at reading this kind of output, but to me it seems that ndisc_send_redirect is at fault. When googling for "ndisc_send_redirect" you can find a patch on and Debian bug #804079, both seem to be related to IPv6.

When looking at the linux kernel source mentioned in the lkml patch I see that this patch is already applied (line 1510): 

        if (ha) 
                ndisc_fill_addr_option(buff, ND_OPT_TARGET_LL_ADDR, ha);

So, when the patch was intended to prevent "leading to data corruption or in the worst case a panic when the skb_put failed" it does not help in my case or in the case of #804079.

Any tips are appreciated!

PS: I'll contribute to that bug in the BTS, of course!

Symbol für unformatierten Text syslog-xen-crash.txt24.27 KB

Xen randomly crashing server

It's a long story... an oddessey of almost two years...

But to start from the beginning: Back then I rented a server at Hetzner until they decided to bill for every IP address you got from them. I got a /26 in the past and so I would have to pay for every IP address of that subnet in addition to the server rent of 79.- EUR/month. That would have meant nearly doubling the monthly costs. So I moved with my server from Hetzner to rrbone Net, which offered me a /26 on a rented Cisco C200 M2 server for a competitve price.

After migrating the VMs from Hetzner to rrbone with the same setup that was running just fine at Hetzner I experienced spontaneous reboots of the server, sometimes several times per day and in short time frame. The hosting provider was very, very helpful in debugging this like exchanging the memory, setting up a remote logging service for the CIMC and such. But in the end we found no root cause for this. The CIMC logs showed that the OS was rebooting the machine.

Anyway, I then bought my own server and exchanged the Cisco C200 by my own hardware, but the reboots still happen as before. Sometimes the servers runs for weeks, sometimes the server crashes 4-6 times a day, but usually it's like a pattern: when it crashes and reboots, it will do that again within a few hours and after the second reboot the chances are high that the server will run for several days without a reboot - or even weeks.

The strange thing is, that there are absolutely no hints in the logs, neither syslog or in the Xen logs, so I assume that's something quite deep in the kernel that causes the reboot. Another hint is, that the reboots fairly often happened, when I used my Squid proxy on one of the VMs to access the net. I'm connecting for example by SSH with portforwarding to one VM, whereas the proxy runs on another VM, which led to network traffic between the VMs. Sometimes the server crashed on the very firsts proxy requests. So, I exchanged Squid by tinyproxy or other proxies, moved the proxy from one VM to that VM I connect to using SSH, because I thought that the inter-VM traffic may cause the machine to reboot. Moving the proxy to another virtual server I rented at another hosting provider to host my secondary nameserver did help a little bit, but with no real hard proof and statistics, just an impression of mine.

I moved from xm toolstack to xl toolstack as well, but didn't help either. The reboots are still happening and in the last few days very frequent. Even with the new server I exchanged the memory, used memory mirroring as well, because I thought that it might be a faulty memory module or something, but still rebooting out of the blue.

During the last weekend I configured grub to include "noreboot" command line and then got my first proof that somehow the Xen network stack is causing the reboots: 

This is a screenshot of the IPMI console, so it's not showing the full information of that kernel oops, but as you can see, there are most likely such parts involved like bridge, netif, xenvif and the physical igb NIC.

Here's another screenshot of a crash from this night: 

Slightly different information, but still somehow network involved as you can see in the first line (net_rx_action).

So the big question is: is this a bug Xen or with my setup? I'm using xl toolstack, the xl.conf is basically the default, I think: 

## Global XL config file ##

# automatically balloon down dom0 when xen doesn't have enough free
# memory to create a domain

# full path of the lockfile used by xl during domain creation

# default vif script

With this the default network scripts of the distribution (i.e. Debian stable) should be used. The network setup consists of two brdiges: 

auto xenbr0
iface xenbr0 inet static
        bridge_ports eth0
        pre-up brctl addbr xenbr0

auto xenbr1
iface xenbr1 inet static
        pre-up brctl addbr xenbr1

There are some more lines to that config like setting up some iptables rules with up commands and such. But as you can see my eth0 NIC is part of the "main" xen bridge with all the IP addresses that are reachable from the outside. The second bridge is used for internal networking like database connections and such.

I would rather like to use a netconsole to capture the full debug output in case of a new crash, but unfortunately this only works until the bridge is brought up and in place: 

[    0.000000] Command line: placeholder root=UUID=c3....22 ro debug ignore_loglevel loglevel=7 netconsole=port@,514@5.45.x.y/e0:ac:f1:4c:y:x
[   32.565624] netpoll: netconsole: local port $port
[   32.565683] netpoll: netconsole: local IPv4 address
[   32.565742] netpoll: netconsole: interface 'eth0'
[   32.565799] netpoll: netconsole: remote port 514
[   32.565855] netpoll: netconsole: remote IPv4 address 5.45.x.y
[   32.565914] netpoll: netconsole: remote ethernet address e0:ac:f1:4c:y:x
[   32.565982] netpoll: netconsole: device eth0 not up yet, forcing it
[   36.126294] netconsole: network logging started
[   49.802600] netconsole: network logging stopped on interface eth0 as it is joining a master device

So, the first question is: how to use netconsole with an interface that is used on a bridge?

The second question is: is the setup with two bridges with Xen ok? I've been using this setup for years now and it worked fairly well on the Hetzner server as well, although I used there xm toolstack with a mix of bridge and routed setup, because Hetzner didn't like to see the MAC addresses of the other VMs on the switch and shut the port down if that happens.


Letsencrypt - when your blog entries don't show up on Planet Debian

Recently there is much talk on Planet Debian about LetsEncrypt certs. This is great, because using HTTPS everywhere improves security and gives the NSA some more work to decrypt the traffic.

However, when you enabled your blog with a LetsEncrypt cert, you might run into the same problem as I: your new article won't show up on Planet Debian after changing your feed URI to HTTPS. The reason seems to be quite simple: planet-venus, which is the software behind Planet Debian seems to have problems with SNI enabled websites.

When following the steps outlined in the Debian Wiki, you can check this by yourself: 

INFO:planet.runner:Fetching via 5
ERROR:planet.runner:HttpLib2Error: Server presented certificate that does not match host {'subjectAltName': (('DNS', ''), ('DNS', '')), 'notBefore': u'Jan 26 18:05:00 2016 GMT', 'caIssuers': (u'',), 'OCSP': (u'',), 'serialNumber': u'01839A051BF9D2873C0A3BAA9FD0227C54D1', 'notAfter': 'Apr 25 18:05:00 2016 GMT', 'version': 3L, 'subject': ((('commonName', u''),),), 'issuer': ((('countryName', u'US'),), (('organizationName', u"Let's Encrypt"),), (('commonName', u"Let's Encrypt Authority X1"),))} via 5

I've filed bug #813313 for this. So, this might explain why your blog post doesn't appear on Planet Debian. Currently there seem 18 sites to be affected by this cert mismatch.


rpcbind listening on all interfaces

Currently I'm testing GlusterFS as a replicating network filesystem. GlusterFS depends on rpcbind package. No problem with that, but I usually want to have the services that run on my machines to only listen on those addresses/interfaces that are needed to fulfill the task. This is especially important, because rpcbind can be abused by remote attackers for rpc amplification attacks (dDoS). So, the rpcbind man page states: 

-h : Specify specific IP addresses to bind to for UDP requests. This option may be specified multiple times and is typically necessary when running on a multi-homed host. If no -h option is specified, rpcbind will bind to INADDR_ANY, which could lead to problems on a multi-homed host due to rpcbind returning a UDP packet from a different IP address than it was sent to. Note that when specifying IP addresses with -h, rpcbind will automatically add and if IPv6 is enabled, ::1 to the list.

Ok, although there is neither a /etc/default/rpcbind.conf nor a /etc/rpcbind.conf nor a sample-rpcbind.conf under /usr/share/doc/rpcbind, some quick websearch revealed a sample config file. I'm using this one: 

# /etc/init.d/rpcbind

# Cause rpcbind to do a "warm start" utilizing a state file (default)
# OPTIONS="-w "

# Uncomment the following line to restrict rpcbind to localhost only for UDP requests
# -h ::1"

# Uncomment the following line to enable libwrap TCP-Wrapper connection logging

As you can see, I want to bind to After a /etc/init.d/rpcbind restart verifying that everything works as desired with netstat is showing...

tcp 0 0* LISTEN 0 2084266 30777/rpcbind
tcp6 0 0 :::111 :::* LISTEN 0 2084272 30777/rpcbind
udp 0 0* 0 2084265 30777/rpcbind
udp 0 0* 0 2084264 30777/rpcbind
udp 0 0* 0 2084260 30777/rpcbind
udp6 0 0 :::848 :::* 0 2084271 30777/rpcbind
udp6 0 0 ::1:111 :::* 0 2084267 30777/rpcbind

Whoooops! Although I've specified that rpcbind should only listen to (and localhost as described by the man page) rpcbind is still listening on all addresses. Quick check if the process is using the correct options: 

root     30777  0.0  0.0  37228  2360 ?        Ss   16:11   0:00 /sbin/rpcbind -h -l

Hmmm, yes, -h is specified. Ok, something is going wrong here...

According to an entry in Ubuntus Launchpad I'm not the only one that experienced this problem. However this Launchpad entry mentioned that upstream seems to have a fix in version 0.2.3, but I experienced the same behaviour in stable as well as in unstable, where the package version is 0.2.3-0.2. Apparently the problem still exists in Debian unstable.

I'm somewhat undecided whether to file a normal bug against rpcbind or if I should label it as a security bug, because it opens a service to the public that can be abused for amplification attacks, although you might have configured rpcbind to just listen on internal addresses.


Problems with DaviCal after Wheezy Upgrade

It's been a while since Wheezy was released, but the problems with DaviCal started with that upgrade. I don't know whether this is a DaviCal bug or not, maybe partly. This is just a informational note, before I'll file a bug report (or not).

First problem was that I couldn't add any contacts anymore (CardDAV) from OS X. A friend of mine has the same issue as he's using my server for that. He mailed that he's getting the following error of the Contacts app under OS X: 

[NSInvalidArgumentException] -[CoreDAVNullParser rootElement]:
unrecognized selector sent to instance 0x7f91bad6b1d0

When I looked at the webfrontend I discovered that the database of DaviCal didn't get updated. I don't know whether this is a general problem or it just happened to me. Anyway. Executing the proper database upgrade script delivered by DaviCal was no problem at all. Adding new contacts worked again.

But then I discovered somewhen later that my calendars on the iPhone didn't update anymore. That was a bigger problem to solve, because it appears that it's an IOS 6.x problem and not a DaviCal issue. Finally I found on Google this mail on the DaviCal mailing list: 

As I started using DAViCal, I created calenders with "named" paths, not these
long names, like you stated in your message. With iOS 6 this was not a good

I created every calender new with the iOS 6 device and copied all
calender-items to the new calendars. Was a work auf 15 minutes for me. I
described it in a blog-post (german):

Not a real solution for the problem, but a workaround.

So, as linked page is in German, I'll rephrase the "solution" here: 

The problem seems to be that the old principal path names like user/calendar or user/home do not work anymore under iOS 6. Instead you'll need to create a new calender from your iPhone. So, just configure your calendar account as usually. You'll end with an empyt calendar. Now create a new event in your calendar on the iPhone. This new calendar should show up under OS X (or other clients). There you can export your existing appointments to an *.ics file which you can import in DaviCal webfrontend to the new prinicipal collection. Your dates should now show up on your iPhone again. But you'll end up with duplicate entries in iCal app under OS X. You need to define your new principal as default calendar to be able to delete your old default calendar, if that's not possible. If everything went well, you can share your dates between OS X and your iPhone via DaviCal again. At least this worked for me. :-)

It's late over here, so I'll postpone writing the bugreport for now...


Xen problems with VMs on 2.6.32-5-xen-amd64?

On Saturday there were some updates for Squeeze/Stable, for example a new/updated version of Xen hypervisor and kernels were downloaded and installed: 

gate:~# dir /var/cache/apt/archives/
base-files_6.0squeeze7_amd64.deb                 libxenstore3.0_4.0.1-5.6_amd64.deb
bind9-host_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb   linux-base_2.6.32-48_all.deb
dbus_1.2.24-4+squeeze2_amd64.deb                 linux-image-2.6.32-5-amd64_2.6.32-48_amd64.deb
dbus-x11_1.2.24-4+squeeze2_amd64.deb             linux-image-2.6.32-5-xen-amd64_2.6.32-48_amd64.deb
firmware-linux-free_2.6.32-48_all.deb            lock
gzip_1.3.12-9+squeeze1_amd64.deb                 openssh-client_1%3a5.5p1-6+squeeze3_amd64.deb
host_1%3a9.7.3.dfsg-1~squeeze9_all.deb           openssh-server_1%3a5.5p1-6+squeeze3_amd64.deb
libbind9-60_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb  openssl_0.9.8o-4squeeze14_amd64.deb
libcups2_1.4.4-7+squeeze3_amd64.deb              partial
libdbus-1-3_1.2.24-4+squeeze2_amd64.deb          perl_5.10.1-17squeeze5_amd64.deb
libdbus-glib-1-2_0.88-2.1+squeeze1_amd64.deb     perl-base_5.10.1-17squeeze5_amd64.deb
libdns69_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb     perl-modules_5.10.1-17squeeze5_all.deb
libisc62_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb     ssh_1%3a5.5p1-6+squeeze3_all.deb
libisccc60_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb   tzdata_2012g-0squeeze1_all.deb
libisccfg62_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb  xen-hypervisor-4.0-amd64_4.0.1-5.6_amd64.deb
libldap-2.4-2_2.4.23-7.3_amd64.deb               xen-linux-system-2.6.32-5-xen-amd64_2.6.32-48_amd64.deb
liblwres60_1%3a9.7.3.dfsg-1~squeeze9_amd64.deb   xenstore-utils_4.0.1-5.6_amd64.deb
libperl5.10_5.10.1-17squeeze5_amd64.deb          xen-utils-4.0_4.0.1-5.6_amd64.deb

Unfortunately this update appears to be problematic on my Xen hosting server. This night it happened the second time that some of the virtual network interfaces disappeared or turned out to be non-working. For example I have two VMs: one running the webserver and one running the databases. Between these two VMs there's a bridge on the dom0 and both VMs have a VIF to that (internal) bridge. What happens is that this bridge becomes inaccessible from within the webserver VM.

Sadly there's not much to see in the log files. I just spotted this on dom0: 

Feb 26 01:01:29 gate kernel: [12697.907512] vif3.1: Frag is bigger than frame.
Feb 26 01:01:29 gate kernel: [12697.907550] vif3.1: fatal error; disabling device
Feb 26 01:01:29 gate kernel: [12697.919921] xenbr1: port 3(vif3.1) entering disabled state
Feb 26 01:22:00 gate kernel: [13928.644888] vif2.1: Frag is bigger than frame.
Feb 26 01:22:00 gate kernel: [13928.644920] vif2.1: fatal error; disabling device
Feb 26 01:22:00 gate kernel: [13928.663571] xenbr1: port 2(vif2.1) entering disabled state
Feb 26 01:40:44 gate kernel: [15052.629280] vif7.1: Frag is bigger than frame.
Feb 26 01:40:44 gate kernel: [15052.629314] vif7.1: fatal error; disabling device
Feb 26 01:40:44 gate kernel: [15052.641725] xenbr1: port 6(vif7.1) entering disabled state

This corresponds to the number of VMs having lost their internal connection to the bridge. On the webserver VM I see this output: 

Feb 26 01:59:01 vserv1 kernel: [16113.539767] IPv6: sending pkt_too_big to self
Feb 26 01:59:01 vserv1 kernel: [16113.539794] IPv6: sending pkt_too_big to self
Feb 26 02:30:54 vserv1 kernel: [18026.407517] IPv6: sending pkt_too_big to self
Feb 26 02:30:54 vserv1 kernel: [18026.407546] IPv6: sending pkt_too_big to self
Feb 26 02:30:54 vserv1 kernel: [18026.434761] IPv6: sending pkt_too_big to self
Feb 26 02:30:54 vserv1 kernel: [18026.434787] IPv6: sending pkt_too_big to self
Feb 26 03:39:16 vserv1 kernel: [22128.768214] IPv6: sending pkt_too_big to self
Feb 26 03:39:16 vserv1 kernel: [22128.768240] IPv6: sending pkt_too_big to self
Feb 26 04:39:51 vserv1 kernel: [25764.250170] IPv6: sending pkt_too_big to self
Feb 26 04:39:51 vserv1 kernel: [25764.250196] IPv6: sending pkt_too_big to self

Rebooting the VMs will result in a non-working VM as it will get paused on creation and Xen scripts complain about not working hotplug scripts and Xen logs shows this: 

[2013-02-25 13:06:34 5470] DEBUG (XendDomainInfo:101)
XendDomainInfo.create(['vm', ['name', 'vserv1'], ['memory', '2048'],
['on_poweroff', 'destroy'], ['on_reboot', 'restart'], ['on_crash',
'restart'], ['on_xend_start', 'ignore'], ['on_xend_stop', 'ignore'],
['vcpus', '2'], ['oos', 1], ['bootloader', '/usr/lib/xen-4.0/bin/pygrub'],
['bootloader_args', ''], ['image', ['linux', ['root', '/dev/xvdb '],
['videoram', 4], ['tsc_mode', 0], ['nomigrate', 0]]], ['s3_integrity', 1],
['device', ['vbd', ['uname', 'phy:/dev/lv/vserv1-boot'], ['dev', 'xvda'],
['mode', 'w']]], ['device', ['vbd', ['uname', 'phy:/dev/lv/vserv1-disk'],
['dev', 'xvdb'], ['mode', 'w']]], ['device', ['vbd', ['uname',
'phy:/dev/lv/vserv1-swap'], ['dev', 'xvdc'], ['mode', 'w']]], ['device',
['vbd', ['uname', 'phy:/dev/lv/vserv1mirror'], ['dev', 'xvdd'], ['mode',
[2013-02-25 13:06:34 5470] DEBUG (XendDomainInfo:2508)
[2013-02-25 13:06:34 5470] DEBUG (balloon:220) Balloon: 2100000 KiB free;
need 16384; done.
[2013-02-25 13:06:34 5470] DEBUG (XendDomain:464) Adding Domain: 39
[2013-02-25 13:06:34 5470] DEBUG (XendDomainInfo:2818)
XendDomainInfo.initDomain: 39 256
[2013-02-25 13:06:34 5781] DEBUG (XendBootloader:113) Launching bootloader
as ['/usr/lib/xen-4.0/bin/pygrub', '--args=root=/dev/xvdb  ',
'--output=/var/run/xend/boot/xenbl.6040', '/dev/lv/vserv1-boot'].
[2013-02-25 13:06:39 5470] DEBUG (XendDomainInfo:2845)
_initDomain:shadow_memory=0x0, memory_static_max=0x80000000,
[2013-02-25 13:06:39 5470] INFO (image:182) buildDomain os=linux dom=39
[2013-02-25 13:06:39 5470] DEBUG (image:721) domid	    = 39
[2013-02-25 13:06:39 5470] DEBUG (image:722) memsize	    = 2048
[2013-02-25 13:06:39 5470] DEBUG (image:723) image	    =
[2013-02-25 13:06:39 5470] DEBUG (image:724) store_evtchn   = 1
[2013-02-25 13:06:39 5470] DEBUG (image:725) console_evtchn = 2
[2013-02-25 13:06:39 5470] DEBUG (image:726) cmdline	    =
root=UUID=ed71a39f-fd2e-4035-8557-493686baa151 ro root=/dev/xvdb
[2013-02-25 13:06:39 5470] DEBUG (image:727) ramdisk	    =
[2013-02-25 13:06:39 5470] DEBUG (image:728) vcpus	    = 2
[2013-02-25 13:06:39 5470] DEBUG (image:729) features	    =
[2013-02-25 13:06:39 5470] DEBUG (image:730) flags	    = 0
[2013-02-25 13:06:39 5470] DEBUG (image:731) superpages     = 0
[2013-02-25 13:06:40 5470] INFO (XendDomainInfo:2367) createDevice: vbd :
{'uuid': '04d99772-cf27-aecf-2d1b-c73eaf657410', 'bootable': 1, 'driver':
'paravirtualised', 'dev': 'xvda', 'uname': 'phy:/dev/lv/vserv1-boot',
'mode': 'w'}
[2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing
{'virtual-device': '51712', 'device-type': 'disk', 'protocol':
'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend':
'/local/domain/0/backend/vbd/39/51712'} to
[2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing
{'domain': 'vserv1', 'frontend': '/local/domain/39/device/vbd/51712',
'uuid': '04d99772-cf27-aecf-2d1b-c73eaf657410', 'bootable': '1', 'dev':
'xvda', 'state': '1', 'params': '/dev/lv/vserv1-boot', 'mode': 'w',
'online': '1', 'frontend-id': '39', 'type': 'phy'} to
[2013-02-25 13:06:40 5470] INFO (XendDomainInfo:2367) createDevice: vbd :
{'uuid': 'e46cb89f-3e54-41d2-53bd-759ed6c690d2', 'bootable': 0, 'driver':
'paravirtualised', 'dev': 'xvdb', 'uname': 'phy:/dev/lv/vserv1-disk',
'mode': 'w'}
[2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing
{'virtual-device': '51728', 'device-type': 'disk', 'protocol':
'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend':
'/local/domain/0/backend/vbd/39/51728'} to
[2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing
{'domain': 'vserv1', 'frontend': '/local/domain/39/device/vbd/51728',
'uuid': 'e46cb89f-3e54-41d2-53bd-759ed6c690d2', 'bootable': '0', 'dev':
'xvdb', 'state': '1', 'params': '/dev/lv/vserv1-disk', 'mode': 'w',
'online': '1', 'frontend-id': '39', 'type': 'phy'} to
[2013-02-25 13:06:40 5470] INFO (XendDomainInfo:2367) createDevice: vbd :
{'uuid': 'e2d61860-7448-1843-3935-6b63c5d2878e', 'bootable': 0, 'driver':
'paravirtualised', 'dev': 'xvdc', 'uname': 'phy:/dev/lv/vserv1-swap',
'mode': 'w'}
[2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing
{'virtual-device': '51744', 'device-type': 'disk', 'protocol':
'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend':
'/local/domain/0/backend/vbd/39/51744'} to
[2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing
{'domain': 'vserv1', 'frontend': '/local/domain/39/device/vbd/51744',
'uuid': 'e2d61860-7448-1843-3935-6b63c5d2878e', 'bootable': '0', 'dev':
'xvdc', 'state': '1', 'params': '/dev/lv/vserv1-swap', 'mode': 'w',
'online': '1', 'frontend-id': '39', 'type': 'phy'} to
[2013-02-25 13:06:40 5470] INFO (XendDomainInfo:2367) createDevice: vbd :
{'uuid': 'd314a46e-1ce9-0e8d-b009-3f08e29735f5', 'bootable': 0, 'driver':
'paravirtualised', 'dev': 'xvdd', 'uname': 'phy:/dev/lv/vserv1mirror',
'mode': 'w'}
[2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing
{'virtual-device': '51760', 'device-type': 'disk', 'protocol':
'x86_64-abi', 'backend-id': '0', 'state': '1', 'backend':
'/local/domain/0/backend/vbd/39/51760'} to
[2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing
{'domain': 'vserv1', 'frontend': '/local/domain/39/device/vbd/51760',
'uuid': 'd314a46e-1ce9-0e8d-b009-3f08e29735f5', 'bootable': '0', 'dev':
'xvdd', 'state': '1', 'params': '/dev/lv/vserv1mirror', 'mode': 'w',
'online': '1', 'frontend-id': '39', 'type': 'phy'} to
[2013-02-25 13:06:40 5470] DEBUG (XendDomainInfo:3400) Storing VM details:
{'on_xend_stop': 'ignore', 'shadow_memory': '0', 'uuid':
'04541225-6d3c-3cae-a4c4-0b6d4ccfac7a', 'on_reboot': 'restart',
'start_time': '1361794000.37', 'on_poweroff': 'destroy', 'bootloader_args':
'', 'on_xend_start': 'ignore', 'on_crash': 'restart', 'xend/restart_count':
'0', 'vcpus': '2', 'vcpu_avail': '3', 'bootloader':
'/usr/lib/xen-4.0/bin/pygrub', 'image': "(linux (kernel ) (args
'root=/dev/xvdb  ') (superpages 0) (tsc_mode 0) (videoram 4) (pci ())
(nomigrate 0) (notes (HV_START_LOW 18446603336221196288) (FEATURES
'!writable_page_tables|pae_pgdir_above_4gb') (VIRT_BASE
18446744071562067968) (GUEST_VERSION 2.6) (PADDR_OFFSET 0) (GUEST_OS linux)
(HYPERCALL_PAGE 18446744071578882048) (LOADER generic) (SUSPEND_CANCEL 1)
(PAE_MODE yes) (ENTRY 18446744071584289280) (XEN_VERSION xen-3.0)))",
'name': 'vserv1'}
[2013-02-25 13:06:40 5470] DEBUG (XendDomainInfo:1804) Storing domain
details: {'console/ring-ref': '2143834', 'image/entry':
'18446744071584289280', 'console/port': '2', 'store/ring-ref': '2143835',
'image/loader': 'generic', 'vm':
'control/platform-feature-multiprocessor-suspend': '1',
'image/hv-start-low': '18446603336221196288', 'image/guest-os': 'linux',
'cpu/1/availability': 'online', 'image/virt-base': '18446744071562067968',
'memory/target': '2097152', 'image/guest-version': '2.6', 'image/pae-mode':
'yes', 'description': '', 'console/limit': '1048576', 'image/paddr-offset':
'0', 'image/hypercall-page': '18446744071578882048',
'image/suspend-cancel': '1', 'cpu/0/availability': 'online',
'image/features/pae-pgdir-above-4gb': '1',
'image/features/writable-page-tables': '0', 'console/type': 'xenconsoled',
'name': 'vserv1', 'domid': '39', 'image/xen-version': 'xen-3.0',
'store/port': '1'}
[2013-02-25 13:06:40 5470] DEBUG (DevController:95) DevController: writing
{'protocol': 'x86_64-abi', 'state': '1', 'backend-id': '0', 'backend':
'/local/domain/0/backend/console/39/0'} to
[2013-02-25 13:06:40 5470] DEBUG (DevController:97) DevController: writing
{'domain': 'vserv1', 'frontend': '/local/domain/39/device/console/0',
'uuid': 'c8819aed-c78f-02b8-0ef7-1600abd15add', 'frontend-id': '39',
'state': '1', 'location': '2', 'online': '1', 'protocol': 'vt100'} to
[2013-02-25 13:06:40 5470] DEBUG (XendDomainInfo:1891)
[2013-02-25 13:06:40 5470] DEBUG (DevController:139) Waiting for devices
[2013-02-25 13:06:40 5470] DEBUG (DevController:139) Waiting for devices
[2013-02-25 13:06:40 5470] DEBUG (DevController:139) Waiting for devices
[2013-02-25 13:06:40 5470] DEBUG (DevController:139) Waiting for devices
[2013-02-25 13:06:40 5470] DEBUG (DevController:144) Waiting for 51712.
[2013-02-25 13:06:40 5470] DEBUG (DevController:628) hotplugStatusCallback

From my point of view, either Xen hypervisor or the kernel seems to be broken, but it's hard to tell for me. Maybe it would be easier to update the system from Squeeze to Wheezy and get rid off this problem that way? On the other hand this would solve the problem.

Are there any other people experiencing problems with that latest update of Xen and kernel?

UPDATE: Bug #701744 filed.

UPDATE 2: Downgrading kernel and hypervisor on dom0 to the following packages from seems to have solved that problem.

  • xen-hypervisor-4.0-amd64_4.0.1-5.4_amd64.deb
  • linux-image-2.6.32-5-xen-amd64_2.6.32-46_amd64.deb

Note that I haven't tested yet with the newest kernel update from DSA-2632-1.

UPDATE 3: After running with older versions of hypervisor and kernel I now have upgraded the hypervisor to xen-hypervisor-4.0-amd64_4.0.1-5.6_amd64.deb and rebooted. Let's see whether it's running stable or not. If not it's the hypervisor, if yes, it's the kernel.

UPDATE 4: apparently it's the kernel which is buggy. And the kernel from DSA-2632-1 is affected as well. So, the current workaround is to downgrad to linux-image-2.6.32-5-xen-amd64_2.6.32-46_amd64.deb.


100% CPU load due to Leap Second

This morning Gregor Samsa woke up... oh, pardon! This morning I woke up and found myself puzzled, because my home server was eating up all of my 4 cores CPU cycles. Especially mysqld was high on CPU load. 100% CPU load for the mysql-server instance and 100% CPU load for akonadiservers own mysqld instance. Restarting KDE and mysql-server didn't help on my Debian unstable machine. Next step was upgrading the system. Sometimes this helps indeed, but not today.

Looking at for mysql-server didn't reveal any help as well. So my next logical step was to ask on #debian-devel in IRC. And my question was very quick answered: 

11:28 < ij> since tonight I've got two mysqld processes running at 100% CPU, one spawned by akonadi and
            the other is the mysqld from mysql-server (unstable that is). is this an already known issue?
            haven't found anything on b.d.o for mysql-server, though
11:29 < mrvn> ij: topic
11:29 < mrvn> you need to set the time
11:30 < ij> waaaaah!
11:30 < mrvn> ij: indeed.

The topic was at that time: 

 100% CPU? Reset leap second

So, it was caused by the leap second. Although you might suspect mysql doing some nasty things (which, IMHO, is always a good guess ;)), the issue is this time within the Linux kernel itself, as a commit on clearifies.

To fix this issue you need to set the time manually using the following command or just reboot: 

date -s "`date`"

So far I found these applications being hit by this kernel bug: 

  • mysql-server
  • akonadi (as it uses own mysql instances)
  • Firefox
  • Openfire Jabber server (because it's using Java, which seems to trigger the problem as mysql does)
  • Virtualbox' VBoxSVC process
  • puppetmaster from package puppet, reported by Michael
  • mythfrontend, reported by pos on #debian-devel
  • Jetty, Hudson, Puppet agent and master, reported by Christian
  • milter-greylist, reported by E. Recio
  • dovecot, reported by Diogo Resende
  • Google Chrome, reported by Erik B. Andersen
  • if you find more apps, please comment and I'll include them here...

So, hope this helps and many thanks to mrvn and infinity on #debian-devel for the help!


Grub2 fails booting from second VG

I just got a new harddisk, because my old RAID of 3x 1 TB disks are getting filled and I'm running out of space. In the end I'll have replaced my old disks by 3x 4 TB disks. Currently the setup is as this: 

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x38d543a1

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *          63      514079      257008+  83  Linux
/dev/sda2          514080     4707044     2096482+  82  Linux swap / Solaris
/dev/sda3         4707045    13092974     4192965   83  Linux
/dev/sda4        13092975  1953520064   970213545    5  Extended
/dev/sda5        13093038    15197489     1052226   83  Linux
/dev/sda6        15197553  1953520064   969161256   83  Linux

Same to the other 2 disks, of course. So, I have (for x=[abc]): 

  • sdx1: RAID1 for /boot
  • sdx2: RAID1 for swap
  • sdx3: RAID1 for a rescue system
  • sdx5: RAID1 for /
  • sdx6: RAID5 for LVM for /usr, /var, /home & everything else

For the new disks I want a more simple layout like this maybe:

  • sdx1: RAID1 for /boot
  • sdx2: RAID1 for swap
  • sdx3: RAID5 for LVM for /, /usr, /var, /home & everything else

If it works, I would be fine without any special /boot partition anyway, but there's another problem: on the old disk set I have LVM and a Volume Group named "vg" and on the new disk I have a Volume Group named "lv": 

PV         VG   Fmt  Attr PSize PFree 
/dev/md4   vg   lvm2 a--  1.81t 129.53g
/dev/md6   lv   lvm2 a--  3.64t   3.58t

I made some Logical Volumes on "lv", copied over my system, ran update-grub and grub-install /dev/sdX and rebooted. In the grub menu I can select my desired new root partition, but when I try to boot from it, grub is not able to find the root device. Config lines within grub looks like this (from memory): 

search --no-floppy  --fs-uuid --set-root=0d54bd89-6628-499f-b835-e146a6fd895f

The UUID within grub matches the same UUID of /dev/lv/root, but grub states that I can't find the disk. The needed modules like for LVM and RAID are loaded, so I assume a problem with multiple Volume Groups, because a simple "ls" from within grub just shows Logical Volumes for Volume Group "vg", but not a single one of the second Volume Group "vg".

Is there a limitation in Grub2 for the number of supported Volume Groups? Does it only find the first created Volume Group? Is there a way to work around this limitation or is it a bug?


  • The mainboard is an Asus P5P43TD PRO with latest BIOS firmware (version 0710) from Asus website.
  • The 4 TB disk is a Hitachi HDS724040ALE640 and the disk is recognized in BIOS as 4 TB
  • The disk is labeled with a GPT partition table.
  • Filesystem on /boot is ext3 and XFS on all other partitons/LVs.

Roundcube doesn't work anymore because of suhosin

Well, yesterday out of nothing my webmailer roundcube started to refuse to work. At least as I remember it. For some reasons reloading the Inbox just showed the "Loading..." message on the screen, but there was no list of mails anymore. Funny enough other folders do actually work as before. But anyway, doing an update did not help and improve anything. (I really don't know whether I updated before or after because of the first occurence of this issue.)

There's an entry in syslog when loading the Inbox folder: 

Oct 26 07:24:59 muaddib suhosin[32432]: ALERT - Include filename (' ?.php') is an URL that is not allowed (attacker '', file '/usr/share/roundcube/program/include/iniset.php', line 110

This lead to bug #1488086 in the Roundcube issue tracker which states: 

This messages made me wonder why suhosin thinks there's an include going on. Line 111 of iniset.php shows:


It seems like roundcube wants to include what is displayed in the subject, which happens to be a url - and suhosin legitimately blocks this attempt.

In short, I can send an email to a user on a suhosin protected mail server and make his inbox unavailable. Needless to say, the user cannot delete this email himself via RoundCube. In my case, I had to delete the email file on the server to make roundcube show the inbox again.

In Debian there's bug #619411 that is related to PATH setting in iniset.php, but I'm not sure if this is really related to #1488086 in the Roundcube issue tracker and my problem? However, disabling suhosin doesn't seem the right way to "solve" this issue and the trac issue tracker suggests a security related problem.

Anyway, I filed this as bug #646675 in Debian, waiting for the bug number. But when someone else knows some quick fixes or something I can try, please speak up! :-) 

UPDATE: It seems as if some mail triggered this issue like reported in the Roundcube ticket. After filtering my mails with Iceweasel, I'm being able to read my Inbox now again.


Probleme mit Views und iCal Feed in Drupal6

Normalerweise bin ich ja super zufrieden mit meinen Drupal-installationen, aber ab und zu treibt mich auch mal ein Zusatzmodul in den Wahnsinn. Seit kurzem ist es wieder so weit und die Protagonisten dieses Mal sind Views und dessen iCal Feed. Das Problem ist: eigentlich ist Views ordnungsgemäß konfiguriert, nur - es funktioniert nicht. Das heißt, daß im iCal Feed keine Daten enthalten sind. Erst einmal die Konfiguration des Views: 

Das ganze resultiert dann in einem SQL-Statement, das wie folgt ausschaut, aber halt kein Ergebnis zur Folge hat, außer dem Header des iCal-Feeds: 

Das Spannende oder auch Unverständliche ist aber dann das Resultat, wenn man das SQL-Statement direkt in die Datenbank absetzt:

drupal6_sideburns=# SELECT node.title AS node_title,
drupal6_sideburns-# node.type AS node_type
drupal6_sideburns-#  FROM node node
drupal6_sideburns-#  LEFT JOIN content_type_event node_data_field_datetime ON node.vid = node_data_field_datetime.vid
drupal6_sideburns-#  WHERE (node.status <> 0)
drupal6_sideburns-#     AND ((TO_CHAR(node_data_field_datetime.field_datetime_value::timestamp with time zone AT TIME ZONE 'Europe/Berlin', 'YYYY-MM-DD') >= '2011-01-01') OR ((TO_CHAR(node_data_field_datetime.field_datetime_value::timestamp with time zone AT TIME ZONE 'Europe/Berlin', 'YYYY') <= '2011' AND TO_CHAR(node_data_field_datetime.field_datetime_value::timestamp with time zone AT TIME ZONE 'Europe/Berlin', 'YYYY') >= '2011')));
             node_title              | node_type
 Rockaholics Birthday Bash           | event
 Tribute to Johnny Cash              | event
 Fifties Rock'n'Roll und Hula Party  | event
 A Tribute to Johnny Cash            | event
 Live Rock'n'Roll Musik!             | event
 Herbstparty im Herberts             | event
 Oldies, Schlager, Evergreens & Hits | event
 Hansesailparty                      | event
 Live in der Moccabar im KTC Rostock | event
 Live im Herberts in Warnemünde      | event
 Alabama Neujahrs Party              | event
(11 rows)

Während also Views selber keine Ergebnisse findet bzw. darstellt, gibt es die gewünschten Ergebnisse sehr wohl - allerdings nur in der Datenbank. Es ist also die Frage, warum Views diese Ergebnisse nicht darstellen kann? 

Zur Info sei noch erklärt, daß im View field_datatime als Argument die Granularität "Jahr" (YYYY) hat, während es als Filter die Granularität "Tag" (YYYY-MM-DD) hat. Somit kann man im SQL nachvollziehen, welcher Teil des Statements nun vom Argument und welcher vom Filter kommt.



Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer