UNIX tricks and treats

Aller au contenu | Aller au menu | Aller à la recherche

mercredi 25 août 2010

A handy command to monitor Linux multipath

Works on: Red Hat 5.3 with Qlogic fiber channel cards

Monitoring failing paths on a fibre channel card connected to a SAN on Linux isn't very straightforward

A handy command to check it in real time would be this one:

watch -n 1 "echo show paths | multipathd -k "

The output would look something like this:

multipathd> hcil    dev  dev_t  pri dm_st   chk_st   next_check

[...]
1:0:3:3 sdam 66:96  50  [failed][faulty] XX........ 4/20
1:0:3:4 sdan 66:112 50  [failed][faulty] XX........ 4/20
0:0:0:0 sda  8:0    50  [active][ready]  XXXXXXXX.. 17/20
0:0:0:1 sdb  8:16   10  [active][ready]  XXXXXXXX.. 17/20
0:0:0:2 sdc  8:32   50  [active][ready]  XXXXXXXX.. 17/20

[...]

Here, controller 1 is failing, resulting in 4 failed paths out of 8.

"4/20" and "17/20" being the number of secons left till the next check

Leave me a note if this post has been useful to you

Happy computing

Nixman

lundi 12 mai 2008

Activating disk cache on a Sun Solaris Server


Works on: Sun Solaris

Solaris disables disk cache by default, which has debatable advantages of data integrity, and definitive disadvantages in terms of I/O performance.

Here are the steps needed to enable the functionality:

# init 1
# format -e
format> cache
format> write_cache
format> display
format> enable
  (if disabled)

Happy computing.

Drop me a comment if this post has been useful to you, or if you see any reason for add-on or modification.

Nixman


mardi 6 mai 2008

Activer le cache disque sur un serveur Sun Solaris


(The english version of this post is here)

Fonctionne sous: Sun Solaris

Dans un souci intégriste ... d'intégrité des données, Solaris désactive par défaut le cache d'écriture des disques durs, au détriment de la performance entrée/sortie des disques.

Voici les étapes permettant de restaurer cette fonctionnalité:

# init 1
# format -e
format> cache
format> write_cache
format> display
format> enable
  (si disablé)

Attention! Effectuer cette manipulation de préférence avant que le serveur ne soit en production, dans la mesure où il faut passer par le niveau d'exécution single user.


Laissez-moi un commentaire si cet article vous a été utile.

Bonne journée.

Nixman

dimanche 4 mai 2008

Replacing a failing rootvg disk on AIX


Works on : AIX

Let's suppose you're getting permanent hardware errors on hdisk0  when running the errpt -a command on an IBM AIX server.

In order to check that both disks are really assigned to the volume group, you should start with:
lsvg -p rootvg
You should see both hdisk0 and hdisk1 under the PV name.

A second thing to check would be that the re really are copies:
lsvg -l rootvg
Just check that there is a 1:2 relationship between LPs and PPs, and that PVs is equal to 2. Otherwise, you should check that the volume that's not copied doesn't reside on the failing disk with:
lslv -l LV_NAME

Once you've done these preliminary checks, you can start detaching hdisk0 from the volume:
unmirrorvg rootvg hdisk0

After running the command, I've sometimes had these messages, which are mostly informational:
0516-1246 rmlvcopy: If hd5 is the boot logical volume, please run 'chpv -c <diskname>'
        as root user to clear the boot record and avoid a potential boot
        off an old boot image that may reside on the disk from which this
        logical volume is moved/removed.
0301-108 mkboot: Unable to read file blocks. Return code: -1
0516-1132 unmirrorvg: Quorum requirement turned on, reboot system for this
        to take effect for rootvg.
0516-1144 unmirrorvg: rootvg successfully unmirrored, user should perform
        bosboot of system to reinitialize boot records.  Then, user must modify
        bootlist to just include:  hdisk0.

Then we reduce the volume:
reducevg rootvg hdisk0

And remove the device from configuration:
rmdev -dl hdisk0

Then, we will have to power down the machine, as we're dealing with a rootvg disk. However, before doing so, it's preferable to check whether we will boot of from the right drive:
bootinfo -b will tell you which drive was last booted up.
If it's the failed drive (hdisk0 in our case), we should change it to the drive still usable (hdisk1 in our case) by creating the boot image on hdisk1 and recrcreating the fixed ipldevice link, which was deleted by the previous rmdev command  :
bosboot -ad /dev/hdisk1

ln /dev/rhdisk1 /dev/ipldevice

Then, we can check bootlist:
bootlist -m normal -o

... And now, we can finally power down our server, replace the failed drive, and power it back on...

Once the server has booted up, we should run:
cfgmgr
so that the OS will recognize the new disk.

To check that AIX really has done its job, run:
lsdev -Cc disk
which should list both disks hdisk0 and hdisk1

Now, we can assign the new disk to the rootvg volume group:
extendvg rootvg hdisk0

Then we mirror the group:
mirrorvg rootvg

Wait for hdisk1 to complete copying on hdisk0 (it can take some time, as you can imagine). You can check activity with iostat.

You should check that both disks are really assigned to rootvg by typing:
lsvg -p rootvg

An lsvg -l rootvg will show you whether mirroring has worked OK. You should once again have a 1:2 relationship between LPs and PPs.

Then, create the boot image on the new disk:
bosboot -a -d hdisk0

Finally, modify the bootlist to take into account both disks:
bootlist -m normal hdisk0 hdisk1
Check with:
bootlist -m -normal -o
 
And you're finally done!

Happy computing.

Drop me a comment if this post has been useful to you, or if you see any reason for add-on or modification.

Nixman