Sie sind hier

2008 M68k Porter Meeting

Bei Isotopp fragt man sich, wie man es vermeiden kann, daß zwei "Profis" an der Hotline aneinander vorbeireden bzw. sinnlos Zeit vergeuden:

Ich beobachte trotz langer Zusammenarbeit an vielen Stellen einen ganzen oder teilweisen Zusammenbruch der Kommunikation. Es geht sehr viel Zeit verloren, weil der verstandene Fehler häufig ein ganz anderer ist, als der der gemeldet werden sollte. Fehlermeldungen verlieren sich in Details, während das darüberliegende Szenario unklar bleibt. Der Helpdesk glaubt das Problem schon erfasst zu haben, und verpasst die wichtigen Teile der Fehlermeldung.

Im Kommentar schreibt z.B. deltatango u.a.:

Erste Regel: "Nimm nichts als gegeben an, hinterfrage alles." Zu Anfang kommt man sich bissle doof vor, weil man elementarste Dinge erfragt, aber es hilft.

Zweite Regel: "Nur weil am anderen Ende ein vermeintlicher IT-ler sitzt, bedeutet das noch lange nicht, daß der Probleme, Fehler, Symptome konkret und mit korrekten Termini beschreiben kann." Siehe Regel eins.

Dritte Regel: "Wiederhole das Gehörte dem Fragesteller mit eigenen Worten und laß' ihn bestätigen, daß du sein Problem richtig verstanden hast." In der Luftfahrt macht man das auch und weiß warum: es vermeidet teure Mißverständnisse.

Alle drei Regeln sollte man beherzigen, insbesondere die dritte, weil dadurch Missverständnisse vermieden werden. Hinzufügen sollte man aber auch, daß man sich gegenseitig ausreden läßt. Mir ist es bei diversen Hotlines auch schon passiert, daß mich das Gegenüber nicht hat ausreden lassen und dann Dinge hinterfragte, die ich alleine gesagt hätte.
Ansonsten hat TPOSANA (The Practice of System and Network Administration - Limoncelli, Hogan, Challup) auch ein eigenes Kapitel zum Thema Hotline.

Das ganze kann man sicherlich noch verfeinern, wenn es sich nicht um eine allgemeine Hotline handelt. Wenn es sich z.B. wie beim Fragesteller Martin um eine Firmenbeziehung Kunde und Hersteller handelt, könnte folgende Verbesserungen in Absprache zwischen den beiden Parteien eingeführt werden:

  1. Feste Ansprechpartner: Beim Hersteller wird ein fester Ansprechpartner benannt, an den man sich mit Problemen wenden kann. Idealerweise sollte vom Kunden dann auch nicht jeder dort anrufen, sondern auch nur eine begrenzte Anzahl von Personen.
  2. Checkliste/Ablaufplan: Normalerweise gehen Hotliner aus gutem Grund nach Schema F bzw. ihrem Ablaufplan vor. So ist es durchaus berechtigt zu fragen, ob das Netzwerkkabel steckt und die LEDs an Karte und Switch/Router leuchten, wenn der Kunde nicht ins Internet kommt. Man sollte meinen, dass das Unsinn ist und man "selbstverständlich" das Kabel im richtigen Port stecken hat, aber leider ist dem eben nicht immer so.
    Hat der Kunde nun die gleiche bzw. eine ähnliche Checkliste bzw. einen entsprechenden Ablaufplan, kann er sowas selber checken und dem Ansprechpartner dann mitteilen, bis zu welchem Punkt der Checkliste er gekommen ist und ab wann dann ein Fehler auftritt.
  3. gemeinsame Sprache: auch wenn es offensichtlich sein sollte, hilft es, die gleiche Sprache bzw. die gleichen Begriffe zu verwenden. Wenn der Kunde von einem Icon in der Taskbar spricht, tatsächlich aber das Systray meint, mag der Hotliner sich wundern, wo der Kunde da ein Icon sieht. Insofern sollte man sich also auf gemeinsame Begriffe einigen, damit man sich auch gegenseitig versteht und nicht aneinander vorbei redet.

Natürlich gibt es noch mehr Möglichkeiten, die Kommunikation zwischen Kunde und Hotline zu verbessern. Letztendlich profitieren beide davon, wenn die Kommunikation gut funktioniert. Und nicht immer gibt es ehrliche Kunden, die löblicherweise zugeben, daß sie keine Ahnung haben. Solche Kunden sind mir jedenfalls lieber als solche, die vorgeben, daß sie Ahnung haben. ;-)

Kategorie: 

PostgreSQL on SMP machines

Joey mailed lately just another announcement for the m68k port ermeeting. This year in Kiel from August 29th - 31st:

Executive summary:

2008 M68k Linux Porter Meeting
August 29th - 31st
University of Kiel, Germany

This summer we are organising a Linux porter meeting especially
targetting the m68k architecture. During the meeting current problems
of the m68k architecture, its integration in Debian, releases
etc. will be discussed.

The meeting will take place at the last weekend in August (29-31) at
the University of Kiel, Germany. Details and participants are
collected here:

This meeting is evolved from the Oldenburg porter meeting that has
started with the m68k architecture but had to stop two years ago.

Interested developers and supporters are invited to join the meeting
and help develop the m68k port of Linux.

If you are interested to attend this meeting, please drop Christian
(cts) or me a note or add yourself to the Wiki page.

Regards,

Joey

Despite all the problems the m68k faced in the last years, we're still alive and performing fine. Part of the meeting will be discussion about the current changes of the m68k port like Aranym buildds and other stuff.

So, if you're interested in the m68k port or the experience of the m68k porters for your embedded architecture, come to Kiel and visit us! :-)

Kategorie: 

No wanna-build access since DSA-1571 for m68k

Dear lazyweb,

it's well known that PostgreSQL is not multi-threaded, but runs multiple instances that can run on different CPUs/cores.
We have lately bought a dual quad core system as our new database server and everything is neat and fine as long I'm testing the database locally via socket or TCP on either IP and the database scales about all 8 cores:

postgres@dhcp-140:~$ /usr/lib/postgresql/8.3/bin/pgbench -h 192.168.1.140 -c 8 -t 30000 testdb

Cpu0 : 24.1%us, 3.1%sy, 0.0%ni, 70.9%id, 0.0%wa, 0.0%hi, 1.9%si, 0.0%st
Cpu1 : 24.5%us, 4.8%sy, 0.0%ni, 69.0%id, 0.0%wa, 0.0%hi, 1.8%si, 0.0%st
Cpu2 : 17.7%us, 3.2%sy, 0.0%ni, 77.3%id, 0.0%wa, 0.0%hi, 1.9%si, 0.0%st
Cpu3 : 13.5%us, 2.3%sy, 0.0%ni, 83.0%id, 0.0%wa, 0.0%hi, 1.3%si, 0.0%st
Cpu4 : 22.0%us, 5.3%sy, 0.0%ni, 70.9%id, 0.0%wa, 0.0%hi, 1.9%si, 0.0%st
Cpu5 : 23.5%us, 4.2%sy, 0.0%ni, 70.0%id, 0.0%wa, 0.0%hi, 2.3%si, 0.0%st
Cpu6 : 28.3%us, 4.3%sy, 0.0%ni, 63.7%id, 0.0%wa, 0.0%hi, 3.7%si, 0.0%st
Cpu7 : 23.9%us, 4.3%sy, 0.0%ni, 69.1%id, 0.0%wa, 0.0%hi, 2.8%si, 0.0%st

PID USER PR NI VIRT RES SHR S %CPU P %MEM TIME+ COMMAND
3551 postgres 20 0 98.2m 18m 16m R 11 1 0.2 0:05.30 postgres
3553 postgres 20 0 98.2m 18m 16m R 11 3 0.2 0:05.38 postgres
3546 postgres 20 0 98.2m 18m 16m S 7 7 0.2 0:05.52 postgres
3547 postgres 20 0 98.2m 18m 16m S 7 0 0.2 0:04.22 postgres
3548 postgres 20 0 98.2m 18m 16m S 7 2 0.2 0:04.62 postgres
3549 postgres 20 0 98.2m 18m 16m S 7 4 0.2 0:04.92 postgres
3550 postgres 20 0 98.2m 18m 16m S 7 0 0.2 0:05.84 postgres
3552 postgres 20 0 98.2m 18m 16m S 5 6 0.2 0:04.96 postgres

Column P marks the last used processor that process was running on. As you can see, postgres runs basically on all cores. But when I try to access the database from a remote host something strange happens:

Cpu0 : 22.5%us, 3.7%sy, 0.0%ni, 72.8%id, 0.0%wa, 0.0%hi, 0.9%si, 0.0%st
Cpu1 : 17.8%us, 2.8%sy, 0.0%ni, 79.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu2 : 5.6%us, 1.3%sy, 0.0%ni, 92.7%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu3 : 18.7%us, 3.2%sy, 0.0%ni, 76.5%id, 0.0%wa, 0.0%hi, 1.6%si, 0.0%st
Cpu4 : 1.0%us, 0.3%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu5 : 27.0%us, 2.5%sy, 0.0%ni, 69.8%id, 0.0%wa, 0.0%hi, 0.6%si, 0.0%st
Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 34.1%us, 3.7%sy, 0.0%ni, 57.0%id, 0.0%wa, 0.6%hi, 4.6%si, 0.0%st

PID USER PR NI VIRT RES SHR S %CPU P %MEM TIME+ COMMAND
3580 postgres 20 0 98.2m 18m 16m S 13 7 0.2 0:03.56 postgres
3582 postgres 20 0 98.2m 18m 16m S 12 1 0.2 0:03.54 postgres
3586 postgres 20 0 98.2m 18m 16m S 11 7 0.2 0:03.34 postgres
3587 postgres 20 0 98.2m 18m 16m S 11 5 0.2 0:03.36 postgres
3581 postgres 20 0 98.2m 18m 16m S 11 5 0.2 0:03.46 postgres
3583 postgres 20 0 98.2m 18m 16m S 11 7 0.2 0:03.40 postgres
3584 postgres 20 0 98.2m 18m 16m R 11 3 0.2 0:03.36 postgres
3585 postgres 20 0 98.2m 18m 16m S 11 1 0.2 0:03.34 postgres

Postgres runs only on all core with an odd cpuid, that is core 1, 3, 5 and 7. This is reproducible and doesn't change when I connect more then one remote client or by raising the -c parameter. I'm running Lenny:

dhcp-140:~# dpkg -l | grep postgres
ii postgresql-8.3 8.3.3-1 object-relational SQL database, version 8.3
ii postgresql-client-8.3 8.3.3-1 front-end programs for PostgreSQL 8.3
ii postgresql-client-common 88 manager for multiple PostgreSQL client versi
ii postgresql-common 88 PostgreSQL database-cluster manager
ii postgresql-contrib-8.3 8.3.3-1 additional facilities for PostgreSQL

So, dear lazyweb, when you have some tips to debug or solve the issue, please comment! I don't know yet if this is a bug and when, if it's in Postgres or maybe in the kernel (2.6.25-2-amd64)?

Kategorie: 

Google StreetView in Berlin

Most people will know that it's hard for the m68k port to keep up with unstable. Mostly because the hardware is not the fastest anymore, but usually we could work around this problem by throwing more hardware onto it. But there are sometimes non-m68k problems that prevent the port from keeping up like, let's say, no wanna-build access anymore, because all ssh pubkeys have been revoked due to DSA-1571 (OpenSSL).
The problem was mentioned on debian-68k mailing list, starting a discussion about the implications for m68k.
Some weeks after the incidence, the problem still existed and there was some discussion again how to proceed and it was tried to get the keys in again. And luckily there was some progress.

But that didn't last long and finally there was yet another attempt to get wanna-build working again for m68k.
As of this writing, there wasn't any success in getting wanna-build access back and after 2 months since the DSA-1571 incident, m68k is not allowed to build packages from wanna-build, because somebody didn't feel like adding some ssh pubkeys.

Maybe that person is overloaded with other work or such, but this means that this person puts extra workload on other people that are scheduling packages on multiple buildds for about 2 months now, although there is other work to do for the involved porters than acting as a human wanna-build.

I'm very disappointed by those who have failed to update the wanna-build ACLs in a timely manner for this long. And I'm very thankful of everyone who tried to help and especially of Stephen Marenka who acted as a human wanna-build in the meanwhile!

Kategorie: 

Font-Rendering: to file or not to fle!

Within the last few days, Spiegel Online (German) reported that Google is taking pictures in Berlin for their StreetView service. Today I spotted from my balcony this one:

Google StreetView car Google StreetView car (zoomed)

So, if you these cars driving around it might be a good idea to hide yourself from being photographed and put onto Google StreetView. Yes, I'm no big fan of Google and its giant data collection. I would rather see Google stopping StreetView in Germany to protect peoples privacy.

Kategorie: 
Tags: 

Xen and NFS performance

When I just started to read planet.debian.org today, I realized that Russel Coker was speaking of an unknown tool to print the UUID, but that Debian version doesn't seem to do that.
broken font rendering
It's called fle apparently.
Sadly, an apt-cache showed that there's no package named fle. Then I realized that he was actually speaking of the good old tool called file - with an I between F and L!
Really, I wish the font rendering will improve in Lenny as this was on Etch... *sigh*

Kategorie: 

UPS - it's always nice to have one!

Today I discovered that one of my domUs at work is performing slow on its mounted NFS share. Bonnie++ and dd tests showed a network throughput of just 300 kB/s whereas the throughput was up to 110 MB/s from dom0 to NFS server. Some Google searches revealed that Xen has problems with NFS performance with non-standard rsize and wsize setting and especially with NFS over UDP.
Reducing rsize and wsize settings didn't help at all. The performance was still awful. After remounting the NFS share via TCP the performance was as expected. Such a huge difference in perfomance surprised me. Still, a strange bug in Xen, at least in Etch. Maybe it's already fixed in Sid?

Kategorie: 

Flash on PPC

After I arrived at home after work today I noticed that the machines at my parents home network were all down and just recovering from the outage. Two of my m68k autobuilders are located there. Nothing serious, I thought, because the machines are located behind DSL and maybe there was a short DSL outage.
Some minutes later I received two mails from my UPS that there was a power outage. Well, this explains the DSL outage as well, because the DSL is not attached to the USV (located in a different room).

Thu Jul 03 17:56:19 CEST 2008 Power is back. UPS running on mains.
Thu Jul 03 17:56:19 CEST 2008 Mains returned. No longer on UPS batteries.
Thu Jul 03 17:46:49 CEST 2008 Running on UPS batteries.
Thu Jul 03 17:46:43 CEST 2008 Power failure.

Ten minutes of power outage without machines being shutdown. It's a APC UPS BR800I with 3 machines (2x m68ks, 1x PIII@550), a SCSI hardware RAID with 7 disks and a 8 port switch hooked up to it. I'm really glad that I bought the UPS earlier this year - exactly for this purpose and to prevent filesystem errors on the buildds when there is a power outage. :-)

Kategorie: 
Tags: 

Information Policy

It's an ongoing drama: Flash player on PowerPC systems - you can either choose to ignore flash at all (no mozilla plugin installed) or the a non-working version of Gnash installed, which gives you no output of YouTube videos and hanging, CPU eating processes:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16302 ij 20 0 63964 22m 13m R 75.0 2.2 4:27.56 gtk-gnash
16308 ij 20 0 44792 13m 10m S 5.9 1.3 0:24.46 gtk-gnash
16297 ij 20 0 54172 14m 10m R 2.6 1.4 0:11.07 gtk-gnash
16301 ij 20 0 54172 14m 10m S 2.6 1.4 0:10.92 gtk-gnash

I really doubt that there will ever be a working Flash player for non-Intel compatible architectures. :-(
I'll try swf-player next, but have little hope that it will be better than cr^H^Hgnash....

UPDATE:
swfdec-mozilla doesn't work properly either. It hangs iceweasel just a second after playing a YouTube video and the picture is broken.

Kategorie: 

AAAAA oder auch: AbKüFi

Well, during the weekend some Debian machines were unreachable as MJ Ray wrote on debian-devel:

gluck, merkel, samosa and raff uncontactable (192.25.206.* network problem?)

I don't know anything more at this time, but wanted to push a small
message out so that others know it isn't just them and lists and IRC
are both still up, as far as I can see so far.

Except the topic on #debian-devel channel there was no other official notice, afaik.

I think handling of such issues can be improved. Although nobody apparently seemed to know what happened or when the problem will be solved, it would have been nice to publish information that actually were available:

  1. What happened?
  2. When did it happen?
  3. What services are affected?
  4. What is done to solve the problem?
  5. When will the problem be solved most likely (ETA)?
  6. Is there an alternative for the services affected by the problem?
  7. Where can I inform myself about any progress?

A webpage similar to that one for DSA-1571 would be a very nice idea, IMHO.

Kategorie: 

Seiten

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer