You are here

Debian

Upgrade to Debian Stretch - GlusterFS fails to mount

Before I upgrade from Jessie to Stretch everything worked as a charme with glusterfs in Debian. But after I upgraded the first VM to Debian Stretch I discovered that glusterfs-client was unable to mount the storage on Jessie servers. I got this in glusterfs log:

[2017-06-24 12:51:53.240389] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.8 (args: /usr/sbin/glusterfs --read-only --fuse-mountopts=nodev,noexec --volfile-server=192.168.254.254 --volfile-id=/le --fuse-mountopts=nodev,noexec /etc/letsencrypt.sh/certs)
[2017-06-24 12:51:54.534826] E [mount.c:318:fuse_mount_sys] 0-glusterfs-fuse: ret = -1

[2017-06-24 12:51:54.534896] I [mount.c:365:gf_fuse_mount] 0-glusterfs-fuse: direct mount failed (Invalid argument) errno 22, retry to mount via fusermount
[2017-06-24 12:51:56.668254] I [MSGID: 101190] [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2017-06-24 12:51:56.671649] E [glusterfsd-mgmt.c:1590:mgmt_getspec_cbk] 0-glusterfs: failed to get the 'volume file' from server
[2017-06-24 12:51:56.671669] E [glusterfsd-mgmt.c:1690:mgmt_getspec_cbk] 0-mgmt: failed to fetch volume file (key:/le)
[2017-06-24 12:51:57.014502] W [glusterfsd.c:1327:cleanup_and_exit] (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_handle_reply+0x90) [0x7fbea36c4a20] -->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x494) [0x55fbbaed06f4] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54) [0x55fbbaeca444] ) 0-: received signum (0), shutting down
[2017-06-24 12:51:57.014564] I [fuse-bridge.c:5794:fini] 0-fuse: Unmounting '/etc/letsencrypt.sh/certs'.
[2017-06-24 16:44:45.501056] I [MSGID: 100030] [glusterfsd.c:2454:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.8.8 (args: /usr/sbin/glusterfs --read-only --fuse-mountopts=nodev,noexec --volfile-server=192.168.254.254 --volfile-id=/le --fuse-mountopts=nodev,noexec /etc/letsencrypt.sh/certs)
[2017-06-24 16:44:45.504038] E [mount.c:318:fuse_mount_sys] 0-glusterfs-fuse: ret = -1

[2017-06-24 16:44:45.504084] I [mount.c:365:gf_fuse_mount] 0-glusterfs-fuse: direct mount failed (Invalid argument) errno 22, retry to mount via fusermount

After some searches on the Internet I found Debian #858495, but no solution for my problem. Some search results recommended to set "option rpc-auth-allow-insecure on", but this didn't help. In the end I joined #gluster on Freenode and got some hints there:

JoeJulian | ij__: debian breaks apart ipv4 and ipv6. You'll need to remove the ipv6 ::1 address from localhost in /etc/hosts or recombine your ip stack (it's a sysctl thing)
JoeJulian | It has to do with the decisions made by the debian distro designers. All debian versions should have that problem. (yes, server side).

Removing ::1 from /etc/hosts and from lo interface did the trick and I could mount glusterfs storage from Jessie servers in my Stretch VMs again. However, when I upgraded the glusterfs storages to Stretch as well, this "workaround" didn't work anymore. Some more searching on the Internet made me found this posting on glusterfs mailing list:

We had seen a similar issue and Rajesh has provided a detailed explanation on why at [1]. I'd suggest you to not to change glusterd.vol but execute "gluster volume set <volname> transport.address-family inet" to allow Gluster to listen on IPv4 by default.

Setting this option instantly fixed my issues with mounting glusterfs storages.

So, whatever is wrong with glusterfs in Debian, it seems to have something to do with IPv4 and IPv6. When disabling IPv6 in glusterfs, it works. I added information to #858495.

Kategorie: 
 

Back to the roots: FidoNet - I'm back!

Last month I blogged about Fidonet. This month I can report that I'm back in FidoNet. While I was 2:2449/413 back then, my new node number is now 2:2452/413@fidonet. The old network 2:2449 is still listed in the Fidonet nodelist, but no longer active, but maybe I can revive that network at a later time. Who knows.

The other problem I complained last month about was missing software in Debian. There is binkd and ifcico as mailer software and crashmail and ifmail as a tosser, but no reader software. So how did I get started again? First, I got into mood by watching all parts of the BBS documentary about BBSes:

 

It's a nice watch, so even when you don't plan to start a BBS or join Fidonet like I did, you can see Tom Jennings and others talking about BBSes in general and Fidonet. It's somewhat a nice way-back machine and it made me to actually start my comeback to Fidonet. I tried to compile some projects from Sourceforge like fidoip or GoldEdPlus, but all projects were in a state where they didn't compile without additional work under Debian. At least with those included debian/rules that have. 

So I decided to reactivate my old Fidonet software on my Amiga. Instead of GMS_Mailer I found AmiBinkd on Aminet which runs quite well. With that setup I was able to call to other Fidonet nodes and do some filerequests. That way I found out that 2:2452/250 is one of the still reachable Fidonet boxes in Germany and soon I became 2:2452/413. Still running on my Amiga with Mailmanager as Tosser and Reader and AmiBinkd as a mailer. Using Fidonet is quite different nowadays as you don't need to call out via phone line anymore, but use Internet connections instead. Although this is nice and much faster and with no additional costs and you can use "crash mail", it's not the same fun as dialing into a mailbox by modem and hear the typical sqeaking sound of a modem connecting. So I bought a Zyxel U-1496E modem on Ebay for € 5.50 and connected it to my FritzBox 7490. This works quite well and I could place calls via the modem using TrapDoor as a mailer on my Amiga.

Anyway, using my Amiga was only a temporary solution to get me up & running again. The goal is to run a full featured Fidonet node on Debian on my colocated server in the datacenter and in the meanwhile I was able to switch the DNS record from my Amiga to the server in the datacenter, running with binkd from Debian and Husky suite as tosser.

Husky is complete Fidonet suite, including tosser, areafix, filefix, tic-file processor, etc. However there are no Debian packages available - at least not easily to find. Philipp Giebel pointed me in an Fidonet echoarea to his own personal repository for Debian and Raspbian: 

https://www.kuehlbox.wtf/index.php#repo

He was very helpful in getting me started on Linux with Husky and shared many of his config files with me. Big thanks for that! He also used our discussions to write a blog article about this. Although it's German only you can find the necessary config files. You can find that on:

https://www.stimpyrama.org/blog/17-computer/138-ftnsetup

It covers nearly all necessary aspects:

  • how to setup his repo in your apt sources
  • install the necessary packages
  • configuration of husky, binkd and goldedplus with example configs
  • some tips & tricks like some keyboard shortcuts for goldedplus, etc.

So, this is really helpful for everyone that wants to join Fidonet as well.

You can use goldedplus as a reader for Fidonet, or when you just want to be a point and not a full node, you might want to try OpenXP on Linux. OpenXP includes everything you'll need for a point, like a mailer, reader and tosser. You can even use it as a mail reader via POP3/IMAP or to read Internet News (aka newsgroups).

It's still possible to run a Fidonet node on Amiga, on Linux and of course other operations systems like Windows and even OS/2. And with HotdogEd there is even Fidonet software available on your Android smartphone!

But why Fidonet if you already have the Internet at your fingertips? Well, this is something you need to decide for yourself, but for me there are several reasons why I joined Fidonet after 17 years of inactivity again:

  • It's not the Internet! :-)  This means basically no spam mails. At least I didn't experience any spam so far.
  • It's a small and welcoming community.
  • There is not only Fidonet itself (with zone 1:* to 5:*), but other zones as well, like for example AmigaNet with zone 39:* or fsxNet with zone 21:*. FTN technology makes it easy to setup a own network based on a certain topic. 
  • It's a technology that enabled people to communicate worldwide with each other, long before the Internet was available for everyone! This is some kind of technical heritage I find worthwhile to preserve.
  • Although most people of us can enjoy a free and open Internet, this is not valid for everyone in the world. Nowadays some regimes decide to block and censor the Internet for their citizens. Fidonet or FTN technology can enable those citizens to still communicate free and without censorship when even Tor is not working anymore because the Internet at all has been taken down in a country. Often enough you can still use phone lines and therefor you can use modems to connect to mailboxes and exchange mails and files. FTN is optimized for this kind of dialup connections and this is one of the main reasons why I don't want to only offer connections via Internet but also by modem to my Fidonet node.

So, be invited to join Fidonet as well!

Kategorie: 
 

Back to the roots: FidoNet

Back in the good old days there was no Facebook, Google+, Skype and no XMPP servers for people to communicate with each other. The first "social communities" were Bulletin Board Systems (BBS), if you want to see those as social communities. Often those BBS not only offered communication possibilities to online users but also ways to communicate with others when being offline. Being offline is from todays point of view a strange concept, but back then it was a common scenario 20-30 years ago, because being online meant to dial via a modem and a phone line into a BBS or - at a later time - Internet provider. Those BBS interconnected with each others and some networks grew that allowed to exchange messages between different BBS - or mailboxes. One of those networks was FidoNet

When I went "online" back then, I called into a BBS, a mailbox. I don't know why, but when reading messages from others the mailbox crashed quite frequently. So the "sysop" of that mailbox offered me to become a FidoNet point - just to prevent that I'd keep crashing his mailbox all the time. So, there I was: a FidoNet point, reachable under the FidoNet address 2:2449/413.19. At some time I took over the mailbox from the old sysop, because he moved out of town. Despite the fact that the Internet arose in the late 1990s, making all those BBS, mailboxes, and networks such as FidoNet obsolete.

However, it was a whole lot of fun back then. So much fun that I plan to join FidoNet again. Yes, it's still there! Instead of using dial-up connections via modems most nodes in FidoNet now offers connection via Internet as well.

A FidoNet system (node) usually consists of a mailer that does the exchange with other systems, a tosser that "routes" the mail to the recipients, and a reader with which you can finally read and write messages to others. Back in the old days I ran my mailbox on my Amiga 3000 with a Zyxel U-1496E+ modem, later with an ISDN card called ISDN-Master. The software used was first TrapDoor as mailer and TrapToss as a tosser. Later replaced by GMS Mailer as a mailer and MailManager as a tosser and reader.

Unfortunately GMS Mailer is not able to handle connections via Internet. For this you'll need something like binkd, which is a Debian package. So, doing a quick search for FidoNet packages on Debian reveals this:

# apt-cache search fidonets  0.00 %  0.00 % [kdevtmpfs]
crashmail - JAM and *.MSG capable Fidonet tosser
fortunes-es - Spanish fortune database
htag - A tagline/.signature adder for email, news and FidoNet messages
ifcico - Fidonet Technology transport package
ifgate - Internet to Fidonet gateway
ifmail - Internet to Fidonet gateway
jamnntpd - NNTP Server allowing newsreaders to access a JAM messagebase
jamnntpd-dbg - debugging symbols for jamnntpd
lbdb - Little Brother's DataBase for the mutt mail reader

So, there are at least two different mailer (ifcico and binkd) and crashmail as a tosser. What is missing is a FidoNet reader. In older Debian releases there was GoldEd+, but this package got removed from Debian some years ago. There's still some upstream development of GoldEd+, but when I tried to compile it fails. So there is no easy way to have a full FidoNet node running on Debian, which is sad. 

Yes, FidoNet is maybe outdated technology, but it's still alive and I would like to get a FidoNet node running again. Are there any other FidoNet nodes running on Debian and give assistance in setting up? There are maybe some fully integrated solutions like MysticBBS, but I'm unsure about those.

So, any tips and hints are welcome! :-)

Kategorie: 
 

Migrating from Owncloud 7 on Debian to Nextcloud 11

These days I got a mail by my hosting provider stating that my Owncloud instance is unsecure, because the online scan from scan.nextcloud.com mailed them. However the scan seemed quite bogus: it reported some issues that were listed as already solved in Debians changelog file. But unfortunately the last entry in changelog was on January 5th, 2016. So, there has been more than a whole year without security updates for Owncloud in Debian stable.

In an discussion with the Nextcloud team I complained a little bit that the scan/check is not appropriate. The Nextcloud team replied very helpful with additional information, such as two bug reports in Debian to clarify that the Owncloud package will most likely be removed in the next release: #816376 and #822681.

So, as there is no nextcloud package in Debian unstable as of now, there was no other way to manually upgrade & migrate to Nextcloud. This went fairly well:

ownCloud 7 -> ownCloud 8.0 -> ownCloud 8.1 -> ownCloud 8.2 -> ownCloud 9.0 -> ownCloud 9.1 -> Nextcloud 10 -> Nextcloud 11

There were some smaller caveats:

  1. When migrating from OC 9.0 to OC 9.1 you need to migrate your addressbooks and calendars as described in the OC 9.0 Release Notes
  2. When migrating from OC 9.1 to Nextcloud 10, the OC 9.1 is higher than expected by the Mextcloud upgrade script, so it warns about that you can't downgrade your installation. The fix was simply to change the OC version in the config.php
  3. The Documents App of OC 7 is no longer available in Nextcloud 11 and is replaced by Collabora App, which is way more complex to setup

The installation and setup of the Docker image for collabora/code was the main issue, because I wanted to be able to edit documents in my cloud. For some reason Nextcloud couldn't connect to my docker installation. After some web searches I found "Can't connect to Collabora Online" which led me to the next entry in the Nextcloud support forum. But in the end it was this posting that finally made it work for me. So, in short I needed to add...

DOCKER_OPTS="--storage-driver=devicemapper"

to /etc/default/docker.

So, in the end everything worked out well and my cloud instance is secure again. :-)

UPDATE 2016-02-18 10:52:
Sadly with that working Collabora Online container from Docker I now face this issue of zombie processes for loolforkit inside of that container.

Kategorie: 
 

Automatically update TLSA records on new Letsencrypt Certs

I've been using DNSSEC for some quite time now and it is working quite well. When LetsEncrypt went public beta I jumped on the train and migrated many services to LE-based TLS. However there was still one small problem with LE certs: 

When there is a new cert, all of the old TLSA resource records are not valid anymore and might give problems to strict DNSSEC checking clients. It took some while until my pain was big enough to finally fix it by some scripts.

There are at least two scripts involved:

1) dnssec.sh
This script does all of my DNSSEC handling. You can just do a "dnssec.sh enable-dnssec domain.tld" and everything is configured so that you only need to copy the appropriate keys into the webinterface of your DNS registry.

host:~/bin# dnssec.sh
No parameter given.
Usage: dnsec.sh MODE DOMAIN

MODE can be one of the following:
enable-dnssec : perform all steps to enable DNSSEC for your domain
edit-zone     : safely edit your zone after enabling DNSSEC
create-dnskey : create new dnskey only
load-dnskey   : loads new dnskeys and signs the zone with them
show-ds       : shows DS records of zone
zoneadd-ds    : adds DS records to the zone file
show-dnskey   : extract DNSKEY record that needs to uploaded to your registrar
update-tlsa   : update TLSA records with new TLSA hash, needs old and new TLSA hashes as additional parameters

For updating zone-files just do a "dnssech.sh edit-zone domain.tld" to add new records and such and the script will take care e.g. of increasing the serial of the zone file. I find this very convenient, so I often use this script for non-DNSSEC-enabled domains as well.

However you can spot the command line option "update-tlsa". This option needs the old and the new TLSA hashes beside the domain.tld parameter. However, this option is used from the second script: 

2) check_tlsa.sh
This is a quite simple Bash script that parses the domains.txt from letsencrypt.sh script, looking up the old TLSA hash in the zone files (structured in TLD/domain.tld directories), compare the old with the new hash (by invoking tlsagen.sh) and if there is a difference in hashes, call dnssec.sh with the proper parameters: 

#!/bin/bash
set -e
LEPATH="/etc/letsencrypt.sh"
for i in `cat /etc/letsencrypt.sh/domains.txt | awk '{print $1}'` ; do
        domain=`echo $i | awk 'BEGIN {FS="."} ; {print $(NF-1)"."$NF}'`
        #echo -n "Domain: $domain"
        TLD=`echo $i | awk 'BEGIN {FS="."}; {print $NF}'`
        #echo ", TLD: $TLD"
        OLDTLSA=`grep -i "in.*tlsa" /etc/bind/${TLD}/${domain} | grep ${i} | head -n 1 | awk '{print $NF}'`
        if [ -n "${OLDTLSA}" ] ; then
                #echo "--> ${OLDTLSA}"
                # Usage: tlsagen.sh cert.pem host[:port] usage selector mtype
                NEWTLSA=`/path/to/tlsagen.sh $LEPATH/certs/${i}/fullchain.pem ${i} 3 1 1 | awk '{print $NF}'`
                #echo "==> $NEWTLSA"
                if [ "${OLDTLSA}" != "${NEWTLSA}" ] ; then
                        /path/to/dnssec.sh update-tlsa ${domain} ${OLDTLSA} ${NEWTLSA} > /dev/null
                        echo "TLSA RR update for ${i}"
                fi
        fi
done

So, quite simple and obviously a quick hack. For sure someone else can write a cleaner and more sophisticated implementation to do the same stuff, but at least it works for meTM. Use it on your own risk and do whatever you want with these scripts (licensed under public domain).

You can invoke check_tlsa.sh right after your crontab call for letsencrypt.sh. In a more sophisticated way it should be fairly easy to invoke these scripts from letsencrypt.sh post hooks as well.
Please find the files attached to this page (remove the .txt extension after saving, of course).

 

Kategorie: 
 
AttachmentSize
Plain text icon check_tlsa.sh.txt812 bytes
Plain text icon dnssec.sh.txt3.88 KB

Request for Adoption: Buildd.Net project

I've been running Buildd.Net for quite a long time. Buildd.Net is a project that focusses on the autobuilders, not the packages. It started back then when the m68k port had a small website running on kullervo, a m68k buildd. Kullervo was too loaded to deal with the increased load of that website, so together with Stephen Marenka we moved the page from kullervo to my server under the domain m68k.bluespice.org. Over time I got many requests if that page could do the same for other archs as well, so I started to hack the code to be able to deal with different archs: Buildd.Net was born.

Since then many years passed by and Buildd.Net evolved into a rather complex project, being capable to deal with different archs and different releases, such as unstable, backports, non-free, etc. Sadly the wanna-build output changed over the years as well, but I didn't have the time anymore to keep up with the changes.

Buildd.Net is based on: 

  • some Bash scripts
  • some Python scripts
  • a PostgreSQL database
  • gnuplot for some graphs
  • some small Perl scripts
  • ... and maybe more...

As long as I was more deeply involved with the m68k autobuilders and others, I found Buildd.Net quite informative as I could get a quick overview how all of the buildds were performing. Based on the PostgreSQL database we could easily spot if a package was stuck on one of the buildds without directly watching the buildd logs.

Storing the information from the buildds about the built packages in a SQL database can give you some benefit. Originally my plan was to use that kind of information for a better autobuilder setup. In the past it happened that large packages were built by buildds with, let's say, 64 MB of RAM and smaller packages were built on the buildds with 128 MB of RAM. Eventually this led to failed builds or excessive build times. Or m68k buildds like Apple Centris boxes or so suffered from slow disk I/O, while some Amiga buildds had reasonable disk speeds (consider 160 kB/s vs. 2 MB/s). 

As you can see there is/was a lot room for optimization of how packages can be distributed between buildds. This could have been done by analyzing the statistics and some scripting, but was never implemented because of missing skills and time on my side.

The lack of time to keep up with the changes of the official wanna-build output (like new package states) is the main reason why I want to give Buildd.Net into good hands. If you are interested in this project, please contact me! I still believe that Buildd.Net can be beneficial to the Debian project. :-)

Kategorie: 
 

Xen randomly crashing server - part 2

Some weeks ago I blogged about "Xen randomly crashing server". The problem back then was that I couldn't get any information why the server reboots. Using a netconsole was not possible, because netconsole refused to work with the bridge that is used for Xen networking. Luckily my colocation partner rrbone.net connected the second network port of my server to the network so that I could use eth1 instead of the bridged eth0 for netconsole.

Today the server crashed several times and I was able to collect some more information than just the screenshots from IPMI/KVM console as shown in my last blog entry (full netconsole output is attached as a file): 

May 12 11:56:39 31.172.31.251 [829681.040596] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-4-amd64 #1 Debian 3.16.7-ckt25-2
May 12 11:56:39 31.172.31.251 [829681.040647] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.0a 01/03/2014
May 12 11:56:39 31.172.31.251 [829681.040701] task: ffffffff8181a460 ti: ffffffff81800000 task.ti: ffffffff81800000
May 12 11:56:39 31.172.31.251 [829681.040749] RIP: e030:[<ffffffff812b7e56>]
May 12 11:56:39 31.172.31.251  [<ffffffff812b7e56>] memcpy+0x6/0x110
May 12 11:56:39 31.172.31.251 [829681.040802] RSP: e02b:ffff880280e03a58  EFLAGS: 00010286
May 12 11:56:39 31.172.31.251 [829681.040834] RAX: ffff88026eec9070 RBX: ffff88023c8f6b00 RCX: 00000000000000ee
May 12 11:56:39 31.172.31.251 [829681.040880] RDX: 00000000000004a0 RSI: ffff88006cd1f000 RDI: ffff88026eec9422
May 12 11:56:39 31.172.31.251 [829681.040927] RBP: ffff880280e03b38 R08: 00000000000006c0 R09: ffff88026eec9062
May 12 11:56:39 31.172.31.251 [829681.040973] R10: 0100000000000000 R11: 00000000af9a2116 R12: ffff88023f440d00
May 12 11:56:39 31.172.31.251 [829681.041020] R13: ffff88006cd1ec66 R14: ffff88025dcf1cc0 R15: 00000000000004a8
May 12 11:56:39 31.172.31.251 [829681.041075] FS:  0000000000000000(0000) GS:ffff880280e00000(0000) knlGS:ffff880280e00000
May 12 11:56:39 31.172.31.251 [829681.041124] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
May 12 11:56:39 31.172.31.251 [829681.041153] CR2: ffff88006cd1f000 CR3: 0000000271ae8000 CR4: 0000000000042660
May 12 11:56:39 31.172.31.251 [829681.041202] Stack:
May 12 11:56:39 31.172.31.251 [829681.041225]  ffffffff814d38ff
May 12 11:56:39 31.172.31.251  ffff88025b5fa400
May 12 11:56:39 31.172.31.251  ffff880280e03aa8
May 12 11:56:39 31.172.31.251  9401294600a7012a
May 12 11:56:39 31.172.31.251 
May 12 11:56:39 31.172.31.251 [829681.041287]  0100000000000000
May 12 11:56:39 31.172.31.251  ffffffff814a000a
May 12 11:56:39 31.172.31.251  000000008181a460
May 12 11:56:39 31.172.31.251  00000000000080fe
May 12 11:56:39 31.172.31.251 
May 12 11:56:39 31.172.31.251 [829681.041346]  1ad902feff7ac40e
May 12 11:56:39 31.172.31.251  ffff88006c5fd980
May 12 11:56:39 31.172.31.251  ffff224afc3e1600
May 12 11:56:39 31.172.31.251  ffff88023f440d00
May 12 11:56:39 31.172.31.251 
May 12 11:56:39 31.172.31.251 [829681.041407] Call Trace:
May 12 11:56:39 31.172.31.251 [829681.041435]  <IRQ>
May 12 11:56:39 31.172.31.251 
May 12 11:56:39 31.172.31.251 [829681.041441]
May 12 11:56:39 31.172.31.251  [<ffffffff814d38ff>] ? ndisc_send_redirect+0x3bf/0x410
May 12 11:56:39 31.172.31.251 [829681.041506]  [<ffffffff814a000a>] ? ipmr_device_event+0x7a/0xd0
May 12 11:56:39 31.172.31.251 [829681.041548]  [<ffffffff814bc74c>] ? ip6_forward+0x71c/0x850
May 12 11:56:39 31.172.31.251 [829681.041585]  [<ffffffff814c9e54>] ? ip6_route_input+0xa4/0xd0
May 12 11:56:39 31.172.31.251 [829681.041621]  [<ffffffff8141f1a3>] ? __netif_receive_skb_core+0x543/0x750
May 12 11:56:39 31.172.31.251 [829681.041729]  [<ffffffff8141f42f>] ? netif_receive_skb_internal+0x1f/0x80
May 12 11:56:39 31.172.31.251 [829681.041771]  [<ffffffffa0585eb2>] ? br_handle_frame_finish+0x1c2/0x3c0 [bridge]
May 12 11:56:39 31.172.31.251 [829681.041821]  [<ffffffffa058c757>] ? br_nf_pre_routing_finish_ipv6+0xc7/0x160 [bridge]
May 12 11:56:39 31.172.31.251 [829681.041872]  [<ffffffffa058d0e2>] ? br_nf_pre_routing+0x562/0x630 [bridge]
May 12 11:56:39 31.172.31.251 [829681.041907]  [<ffffffffa0585cf0>] ? br_handle_local_finish+0x80/0x80 [bridge]
May 12 11:56:39 31.172.31.251 [829681.041955]  [<ffffffff8144fb65>] ? nf_iterate+0x65/0xa0
May 12 11:56:39 31.172.31.251 [829681.041987]  [<ffffffffa0585cf0>] ? br_handle_local_finish+0x80/0x80 [bridge]
May 12 11:56:39 31.172.31.251 [829681.042035]  [<ffffffff8144fc16>] ? nf_hook_slow+0x76/0x130
May 12 11:56:39 31.172.31.251 [829681.042067]  [<ffffffffa0585cf0>] ? br_handle_local_finish+0x80/0x80 [bridge]
May 12 11:56:39 31.172.31.251 [829681.042116]  [<ffffffffa0586220>] ? br_handle_frame+0x170/0x240 [bridge]
May 12 11:56:39 31.172.31.251 [829681.042148]  [<ffffffff8141ee24>] ? __netif_receive_skb_core+0x1c4/0x750
May 12 11:56:39 31.172.31.251 [829681.042185]  [<ffffffff81009f9c>] ? xen_clocksource_get_cycles+0x1c/0x20
May 12 11:56:39 31.172.31.251 [829681.042217]  [<ffffffff8141f42f>] ? netif_receive_skb_internal+0x1f/0x80
May 12 11:56:39 31.172.31.251 [829681.042251]  [<ffffffffa063f50f>] ? xenvif_tx_action+0x49f/0x920 [xen_netback]
May 12 11:56:39 31.172.31.251 [829681.042299]  [<ffffffffa06422f8>] ? xenvif_poll+0x28/0x70 [xen_netback]
May 12 11:56:39 31.172.31.251 [829681.042331]  [<ffffffff8141f7b0>] ? net_rx_action+0x140/0x240
May 12 11:56:39 31.172.31.251 [829681.042367]  [<ffffffff8106c6a1>] ? __do_softirq+0xf1/0x290
May 12 11:56:39 31.172.31.251 [829681.042397]  [<ffffffff8106ca75>] ? irq_exit+0x95/0xa0
May 12 11:56:39 31.172.31.251 [829681.042432]  [<ffffffff8135a285>] ? xen_evtchn_do_upcall+0x35/0x50
May 12 11:56:39 31.172.31.251 [829681.042469]  [<ffffffff8151669e>] ? xen_do_hypervisor_callback+0x1e/0x30
May 12 11:56:39 31.172.31.251 [829681.042499]  <EOI>
May 12 11:56:39 31.172.31.251 
May 12 11:56:39 31.172.31.251 [829681.042506]
May 12 11:56:39 31.172.31.251  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
May 12 11:56:39 31.172.31.251 [829681.042561]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
May 12 11:56:39 31.172.31.251 [829681.042592]  [<ffffffff81009e7c>] ? xen_safe_halt+0xc/0x20
May 12 11:56:39 31.172.31.251 [829681.042627]  [<ffffffff8101c8c9>] ? default_idle+0x19/0xb0
May 12 11:56:39 31.172.31.251 [829681.042666]  [<ffffffff810a83e0>] ? cpu_startup_entry+0x340/0x400
May 12 11:56:39 31.172.31.251 [829681.042705]  [<ffffffff81903076>] ? start_kernel+0x497/0x4a2
May 12 11:56:39 31.172.31.251 [829681.042735]  [<ffffffff81902a04>] ? set_init_arg+0x4e/0x4e
May 12 11:56:39 31.172.31.251 [829681.042767]  [<ffffffff81904f69>] ? xen_start_kernel+0x569/0x573
May 12 11:56:39 31.172.31.251 [829681.042797] Code:
May 12 11:56:39 31.172.31.251  <f3>
May 12 11:56:39 31.172.31.251 
May 12 11:56:39 31.172.31.251 [829681.043113] RIP
May 12 11:56:39 31.172.31.251  [<ffffffff812b7e56>] memcpy+0x6/0x110
May 12 11:56:39 31.172.31.251 [829681.043145]  RSP <ffff880280e03a58>
May 12 11:56:39 31.172.31.251 [829681.043170] CR2: ffff88006cd1f000
May 12 11:56:39 31.172.31.251 [829681.043488] ---[ end trace 1838cb62fe32daad ]---
May 12 11:56:39 31.172.31.251 [829681.048905] Kernel panic - not syncing: Fatal exception in interrupt
May 12 11:56:39 31.172.31.251 [829681.048978] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)

I'm not that good at reading this kind of output, but to me it seems that ndisc_send_redirect is at fault. When googling for "ndisc_send_redirect" you can find a patch on lkml.org and Debian bug #804079, both seem to be related to IPv6.

When looking at the linux kernel source mentioned in the lkml patch I see that this patch is already applied (line 1510): 

        if (ha) 
                ndisc_fill_addr_option(buff, ND_OPT_TARGET_LL_ADDR, ha);

So, when the patch was intended to prevent "leading to data corruption or in the worst case a panic when the skb_put failed" it does not help in my case or in the case of #804079.

Any tips are appreciated!

PS: I'll contribute to that bug in the BTS, of course!

Kategorie: 
 
AttachmentSize
Plain text icon syslog-xen-crash.txt24.27 KB

Avoiding Gated Communities with Diaspora, Friendica and others

At the Chaos Communication Congress 32c3 in Hamburg last year, there was a talk by Katharina Nocun named "A New Kid on the Block - Conditions for a Successful Market Entry of Decentralized Social Networks". The short abstract is this: 

The leading social networks are the powerful new gatekeepers of the digital age. Proprietary de facto standards of the dominant companies have lead to the emergence of virtual “information silos” that can barely communicate with one another. Has Diaspora really lost the war? Or is there still a chance to succeed?

Maybe some of you attended that talk or have already seen the recording. For those who haven't, here it is for your convenience: 

It's all about Social Networks and Gated Communities vs. open communities. It's like Facebook on the Gated Community side and Diaspora as an example on the other, the open side.

At timecode 17:20 Katharina mentions that the Top10 of Diaspora pods have more than half a million users. But when you look more closely at the statistics from the-federation.info you can spot a different result that is most likely true for marketing statistic of Facebook as well: there is a difference between total users and current active users. Whereas indeed the total users are easily surpassing the half million users mark, it's a total different issue for the active users count of the last month: 15488 active users in total versus 546783 total users of the Top10 Diaspora sites. That's only 2.83% of active users. A quite awful turnaround rate. 

Many users are just quick lurkers, that came passing by, looking at Diaspora (and other alternative networks), get a quick login and a first try-out and never come back after a few days. I can confirm this from my own Friendica node at Nerdica.net where I currently have a total of 13 users: 7 users never posted any content, 1 user is already automatically set to expired because of this, and 8 users never came back after first day of registration. 

Therefor I cannot confirm with Katharinas conclusion that Diaspora "is not dead, it's pretty alive". All these alternative Social Networks are pretty much dead or - to put it in more friendly words - are alive in a rather small niche or small communities like data/privacy aware peoples.

Am I happy about this?

No, definitely not, because I am one of these data/privacy aware activits. I'm no big fan of such monolithic and centralized networks like Facebook. I'm a enthusiastic advocate of self-hosting and decentralized platforms and communication protocols, such as XMPP.

So, what can be done about these kind of Gated Communities like Facebook? Are you still on Facebook, because most of your family and friends are over there and not on Diaspora/Friendica? Are you still using Skype instead of XMPP? Why are you doing this? I'm really interested in this, because I don't understand it.

PS: please watch the video in full length! Katharina has some other good points as well! :)

Kategorie: 
 

Xen randomly crashing server

It's a long story... an oddessey of almost two years...

But to start from the beginning: Back then I rented a server at Hetzner until they decided to bill for every IP address you got from them. I got a /26 in the past and so I would have to pay for every IP address of that subnet in addition to the server rent of 79.- EUR/month. That would have meant nearly doubling the monthly costs. So I moved with my server from Hetzner to rrbone Net, which offered me a /26 on a rented Cisco C200 M2 server for a competitve price.

After migrating the VMs from Hetzner to rrbone with the same setup that was running just fine at Hetzner I experienced spontaneous reboots of the server, sometimes several times per day and in short time frame. The hosting provider was very, very helpful in debugging this like exchanging the memory, setting up a remote logging service for the CIMC and such. But in the end we found no root cause for this. The CIMC logs showed that the OS was rebooting the machine.

Anyway, I then bought my own server and exchanged the Cisco C200 by my own hardware, but the reboots still happen as before. Sometimes the servers runs for weeks, sometimes the server crashes 4-6 times a day, but usually it's like a pattern: when it crashes and reboots, it will do that again within a few hours and after the second reboot the chances are high that the server will run for several days without a reboot - or even weeks.

The strange thing is, that there are absolutely no hints in the logs, neither syslog or in the Xen logs, so I assume that's something quite deep in the kernel that causes the reboot. Another hint is, that the reboots fairly often happened, when I used my Squid proxy on one of the VMs to access the net. I'm connecting for example by SSH with portforwarding to one VM, whereas the proxy runs on another VM, which led to network traffic between the VMs. Sometimes the server crashed on the very firsts proxy requests. So, I exchanged Squid by tinyproxy or other proxies, moved the proxy from one VM to that VM I connect to using SSH, because I thought that the inter-VM traffic may cause the machine to reboot. Moving the proxy to another virtual server I rented at another hosting provider to host my secondary nameserver did help a little bit, but with no real hard proof and statistics, just an impression of mine.

I moved from xm toolstack to xl toolstack as well, but didn't help either. The reboots are still happening and in the last few days very frequent. Even with the new server I exchanged the memory, used memory mirroring as well, because I thought that it might be a faulty memory module or something, but still rebooting out of the blue.

During the last weekend I configured grub to include "noreboot" command line and then got my first proof that somehow the Xen network stack is causing the reboots: 

This is a screenshot of the IPMI console, so it's not showing the full information of that kernel oops, but as you can see, there are most likely such parts involved like bridge, netif, xenvif and the physical igb NIC.

Here's another screenshot of a crash from this night: 

Slightly different information, but still somehow network involved as you can see in the first line (net_rx_action).

So the big question is: is this a bug Xen or with my setup? I'm using xl toolstack, the xl.conf is basically the default, I think: 

## Global XL config file ##

# automatically balloon down dom0 when xen doesn't have enough free
# memory to create a domain
autoballoon=0

# full path of the lockfile used by xl during domain creation
#lockfile="/var/lock/xl"

# default vif script
#vif.default.script="vif-bridge"

With this the default network scripts of the distribution (i.e. Debian stable) should be used. The network setup consists of two brdiges: 

auto xenbr0
iface xenbr0 inet static
        address 31.172.31.193
        netmask 255.255.255.192
        gateway 31.172.31.254
        bridge_ports eth0
        pre-up brctl addbr xenbr0

auto xenbr1
iface xenbr1 inet static
        address 192.168.254.254
        netmask 255.255.255.0
        pre-up brctl addbr xenbr1

There are some more lines to that config like setting up some iptables rules with up commands and such. But as you can see my eth0 NIC is part of the "main" xen bridge with all the IP addresses that are reachable from the outside. The second bridge is used for internal networking like database connections and such.

I would rather like to use a netconsole to capture the full debug output in case of a new crash, but unfortunately this only works until the bridge is brought up and in place: 

[    0.000000] Command line: placeholder root=UUID=c3....22 ro debug ignore_loglevel loglevel=7 netconsole=port@31.172.31.193/eth0,514@5.45.x.y/e0:ac:f1:4c:y:x
[   32.565624] netpoll: netconsole: local port $port
[   32.565683] netpoll: netconsole: local IPv4 address 31.172.31.193
[   32.565742] netpoll: netconsole: interface 'eth0'
[   32.565799] netpoll: netconsole: remote port 514
[   32.565855] netpoll: netconsole: remote IPv4 address 5.45.x.y
[   32.565914] netpoll: netconsole: remote ethernet address e0:ac:f1:4c:y:x
[   32.565982] netpoll: netconsole: device eth0 not up yet, forcing it
[   36.126294] netconsole: network logging started
[   49.802600] netconsole: network logging stopped on interface eth0 as it is joining a master device

So, the first question is: how to use netconsole with an interface that is used on a bridge?

The second question is: is the setup with two bridges with Xen ok? I've been using this setup for years now and it worked fairly well on the Hetzner server as well, although I used there xm toolstack with a mix of bridge and routed setup, because Hetzner didn't like to see the MAC addresses of the other VMs on the switch and shut the port down if that happens.

Kategorie: 
 

Letsencrypt: challenging challenges solved

A few weeks ago I was wondering in Letsencrypt: challenging challenges about how to setup Letsencrypt when a domain is spread across several virtual machines (VM). One of the possible solutions would be to consolidate everything on one single VM, which is nothing I would like to do. The second option would need to generate the Letsencrypt certs on the webserver and copy over the certs to the appropriate VM on a regular basis or event driven. The third option is to use a network share - and this is what I'm using right now.

So, my setup is as following after I solved the GlusterFS issue with rpcbind binding to all interfaces, although it has been configured to only listen to certain interfaces (solution was: simply remove all NFS related stuff):

On Dom0 (or the host machine) I run GlusterFS as a server on a small 1 GB LVM as part of a replicate with the VM that will do the actual Letsencrypt work: 

Volume Name: le
Type: Replicate
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.x.254:/srv/gfs/le
Brick2: 192.168.x.1:/srv/gfs/le

This is to ensure that on reboot of the machine every other VM using Letsencrypt certs can mount the GlusterFS share, because the host machine will be there for sure whereas the other VM generating the certs with the letsencrypt.sh script might still be booting. And when the GlusterFS share is missing services will not start on the other VMs because of the missing certs, of course. So, the replica on the virtualization host (Dom0) is only acting as some kind of always-being-available network share, because, well, the other VM will not always be there... for example during a kernel update when a reboot is required.

The same setup is on my mailserver, acting as the second GlusterFS brick of that replica drive. The mailserver hosts the bind9 nameserver as well and I might do something that new domains with Letsencrypt certs get added to my DNSSEC setup as well. Of course, when the letsencrypt.sh script creates or updates the certs, it needs the certs being mounted in that configured location, so I needed to add a line to /etc/fstab: 

192.168.x.254:/le /etc/letsencrypt.sh/certs glusterfs noexec,nodev,_netdev 0 0

Basically the same needs to be done on the other VMs where you want to use the certs as well, but you may want to mount the share as read-only there.

The next step was a little more tricky. When letsencrypt.sh generates new certs, Letsencrypt will contact the webserver for that domain to respond to the ACME challenge. This requires that on each VM you want to use letsencrypt you have to run a webserver. Well, actually at least that there is somewhere a webserver that can answer these requests for that specific domain...

Now, the setup of the webserver (Apache in my case) is like this: 

I'm using the Apache macro module to make it more easy, so I generated two small configs in /etc/apache/conf-available and enabled them bei a2enconf: letsencrypt-proxy.conf to do some setup for proxying the ACME challenges to a common website called acme.example.org. And then letsencrypt-sslredir.conf to setup SSL redirection when everything is in place and the domain can be switched over to HTTPS-only.

letsencrypt-proxy.conf: 

<Macro le_proxy>
     ProxyRequests Off
     <Proxy *>
            Order deny,allow
            Allow from all
     </Proxy>
     ProxyPass /.well-known/acme-challenge/ http://acme.windfluechter.net/
     ProxyPassReverse / http://%{HTTP_HOST}/.well-known/acme-challenge/
</Macro>

letsencrypt-sslredir.conf:

<Macro le_sslredir>
    RewriteEngine on
    RewriteCond %{HTTPS} !=on
    RewriteRule . https://%{HTTP_HOST}%{REQUEST_URI}  [L]
</Macro>

So, after all the setup of a virtual host for Apache looks like this: 

<Macro example.org>
(lots of setup stuff)
</Macro>
<VirtualHost 31.172.31.x:443 [2a01:a700:4629:x::1]:443>
        Header always set Strict-Transport-Security "max-age=31556926; includeSubDomains"
        SSLEngine on
        # letsencrypt certs:
        SSLCertificateFile /etc/letsencrypt.sh/certs/example.org/fullchain.pem
        SSLCertificateKeyFile /etc/letsencrypt.sh/certs/example.org/privkey.pem
        SSLHonorCipherOrder On
    Use example.org
    Use le_proxy
</VirtualHost>
<VirtualHost 31.172.31.x:80 [2a01:a700:4629:x::1]:80>
    Use example.org
    Use le_proxy
    Use le_sslredir
</VirtualHost>

le_sslredir is only needed when you are sure that you want all traffic being redirected to HTTPS. For example when your blog is listed on planet.debian.org or other Planets you might want to omit this from your HTTP config because bug #813313 is not yet solved. 

In the end, when creating a new Letsencrypt cert, you need to add the le_proxy macro to your website, add the domain to letsencrypt.sh config in /etc/letsencrypt.sh/domains.txt and then the scripts will request a new cert from Letsencrypt, handling the ACME challenge stuff via the URL redirection in le_proxy being redirected to your acme.exmaple.org site and finally writes your new cert to the GlusterFS share. From that share you can then use the new cert on all needed VMs, be it your mailserver, webserver or XMPP/SIP server VMs. 

At least this works for me.

UPDATE:
Of course you should be careful about your file permissions on that GlusterFS share, so that the automatic key renewal works, but also without too many permissions granted that everyone can obtain your private keys.

Kategorie: 
 

Pages

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer