Friday, May 27, 2011

both submirrors in a "Needs Maintenance"


There may be times when trying a metareplace or metasync fails and the metadevice is still left in a "Needs Maintenance" state. One of the disks maybe experiencing a problem on a sector.

the mirror d10 is in a "Need Maintenance" state for both submirrors:
# metastat d10
d10: Mirror
Submirror 0: d0
State: Needs maintenance
Submirror 1: d1
State: Needs maintenance



Cause

When trying to run the metasync command, the c1t0d0s0 device reported errors in /var/adm/messages:

################################################################3
: /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037f396c9,0 (ssd1):
Error for Command: read(10) Error Level: Retryable
Requested Block: 4057844 Error Block: 405796
: Vendor: SEAGATE Serial Number: 0107D1MVCF
Sense Key: Media Error
ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0xe4
Sep 15 09:11:19 bobbob scsi: WARNING: /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037f396c9,0 (ssd1):

##########################################################

In this case, the same block is being reported as having problems.
The bad block can be fixed by running

format --> analyze --> read on the c1t0d0 disk.


eady to analyze (won't harm SunOS). This takes a long time,
but is interruptable with CTRL-C. Continue? y
pass 0
Medium error during read: block 4057969 (0x3deb71) (1404/16/101)
ASC: 0x11 ASCQ: 0x0 24619/26/53
pass 1
24619/26/53
Total of 1 defective blocks repaired.


Solution

# metasync d10
# metastat d10

##########################
d10: Mirror
Submirror 0: d0
State: Needs maintenance
Submirror 1: d1
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 69078879 blocks
d0: Submirror of d10
State: Needs maintenance
Invoke: after replacing "Maintenance" components:
metareplace d10 c1t0d0s0
Size: 69078879 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t0d0s0 0 No Last Erred
d1: Submirror of d10
State: Okay
Size: 69078879 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c1t1d0s0 0 No Okay

#############################################3


# metareplace -e d10 c1t0d0s0
# metastat d10

now both mirror are good

###################
d10: Mirror
Submirror 0: d0
State: Okay
Submirror 1: d1
State: Okay
#######################




Improving TMPFS performance in solaris

Improving tmpfs File System Performance


Applies to:
Solaris SPARC Operating System - Version: 8.0 and later [Release: 8.0 and later ]
All Platforms
Goal
Performance of the tmpfs file system can be improved by setting tmpfs tunable "tmp_nopage = 1" in /etc/system. This issue is raised in bug

Solution

Tmpfs is a memory resident file system. It uses the page cache for caching file data. Files created in a tmpfs file system avoid physical disk read and write.

The primary goal of designing tmpfs was to improve read/write performance of short lived files without invoking network and disk I/O.

Tmpfs does not use a dedicated memory such as a "RAM DISK". Instead it uses virtual memory (VM) maintained by the kernel. This allows it to use VM and kernel resource allocation policies. Tmpfs files are written and read directly from the kernel memory. Pages allocated to tmpfs files are treated the same way as any other physical memory pages.

Physical memory assigned to tmpfs files uses anonymous memory to store file data. The kernel does not differentiate tmpfs file data from the page cache. During memory pressure, tmpfs pages can be freed and written back to the physical swap device if the page daemon selects them as candidates for such.

It is the user's responsibility to keep a back up of tmpfs files by copying tmpfs files to disk based file system such as ufs. Otherwise, tmpfs files will be lost in case of a crash or reboot.

In Solaris, fsflush (the file system flush daemon), is responsible for flushing the dirty pages to disk. A page is considered dirty, when the content of the page is modified in memory and has not been sync'd to the disk. For every dirty page in memory, fsflush calls the putpage() routine of the file system, responsible for writing the page to the backing store. For the ufs file system fsflush calls fs_putpage() and similarly for tmpfs dirty page it calls tmpfs_putpage(). Pages in memory are identified using vnode and offset.

When a tmpfs file is created or modified, pages are marked dirty. Tmpfs pages stay dirty until the file is deleted. The only time that the tmpfs_putpage() routine pushes the dirty tmpfs pages to the swap device is when the system experiences memory pressure. Systems with no physical swap device or configured with plenty of physical memory can avoid this overhead by setting the tmpfs tunable

tmpfs:tmp_nopage = 1

in /etc/system. Setting this tunable causes tmpfs_putpage() to return immediately without it's overhead.

tmpfs_putpage() Overhead

There is a great deal of work done in the tmp_putpage() routine. For every vnode and offset, tmpfs searches for dirty page in the global page hash list and locks the page. To make sure it can write multiple dirty pages in chunks, it performs the similar search for pages adjacent to the locked page. tmpfs_putpage() does a lookup for the backing store for the page. If physical swap device is full or not configured, it unlocks the pages and returns without writing the dirty pages. The page-out operation to the swap device only happens when the free memory (freemem) is low. For every successful page-out, tmpfs_putpage() increments the tmp_putpagecnt and tmp_pagespushed. Systems with no physical swap device or a system with a physical swap but plenty of memory should have zero value for tmp_putpagecnt and tmp_pagespushed.

If the system has no swap device configured, then the option to use paging out to free up memory is not available.

Testing and Verification

Lab tests have shown that copying a large file (1 GB in size) from a tmpfs to a ufs file system gets a huge performance boost when the tmp_nopage tunable is set to 1. Test results are shown below:

tmp_nopage=0 (default)

$ mkfile 1024m /tmp/one

$ ptime cp /tmp/one /fast/one

real 2:27.301
user 0.044
sys 2:27.207

$ mkfile 1024m /tmp/two

$ ptime cp /tmp/two /fast/two

real 2:27.452
user 0.044
sys 2:27.352

tmp_nopage=1

Setting tmp_nopage=1 on a Live system using mdb:

# echo 'tmp_nopage/W 1' | mdb -kw

$ rm /tmp/* /fast/*

$ mkfile 1024m /tmp/one

$ ptime cp /tmp/one /fast/one

real 18.767 << 18 seconds instead of over 2 minutes.
user 0.044
sys 18.695

$ mkfile 1024m /tmp/two

$ ptime cp /tmp/two /fast/two

real 19.160
user 0.040
sys 19.095

Setting tmp_nopage permanently

To set this on a permanent basis, the following line should be placed in /etc/system and the system rebooted:

set tmpfs:tmp_nopage=1

Monday, May 16, 2011

Replacing mirrored root disk in svm /sds in solaris 10 online

Scenerio :
OS - Solaris 10
SUN hardware : E2900
environment : system root disk is mirrored with SVM
disk details
c1t0d0
c1t1do
Reason : ONE of the root disk failed in this example c1t1d0 , how we concluded disk is failed
1) it could throwing lots read /write in /var/adm/messages
2) no. of h/w and transport is more then 15 in "iostat -en" cmd
3) disk is showing "not available " in format o/p

1. c1t1d0 drive not available
/ssm@0,0/pci@18,700000/scsi@2/sd@1,0


1) Failed disk will cause metastat -ac o/p maintenance

# metastat -ac
d60 m 19GB d61 d62 (maint)
d61 s 19GB c1t0d0s6
d62 s 19GB c1t1d0s6 (maint)
d40 m 11GB d41 d42 (maint)
d41 s 11GB c1t0d0s4
d42 s 11GB c1t1d0s4 (maint)
d10 m 9.8GB d11 d12 (maint)
d11 s 9.8GB c1t0d0s1
d12 s 9.8
d0 m 9.8GB d1 d2 (maint)
d1 s 9.8GB c1t0d0s0
d2 s 9.8GB c1t1d0s0 (maint)
d50 m 17GB d51 d52 (maint)
d51 s 17GB c1t0d0s5
d52 s 17GB c1t1d0s5 (maint)

2) metadb -i also showing metadb errored stat this is not true for all cases

sh# metadb -i
flags first blk block count
a m p luo 16 8192 /dev/dsk/c1t0d0s7
a p luo 8208 8192 /dev/dsk/c1t0d0s7
a p luo 16400 8192 /dev/dsk/c1t0d0s7
a p luo 24592 8192 /dev/dsk/c1t0d0s7
W p l 16 8192 /dev/dsk/c1t1d0s7
W p l 8208 8192 /dev/dsk/c1t1d0s7
W p l 16400 8192 /dev/dsk/c1t1d0s7
W p l 24592 8192 /dev/dsk/c1t1d0s7


NOTE : failed Disk here is c1t1d0 and MD devices are d2,d62,d42,d12,d52. please verify the disk target and MD change it as per failed target and MD

3) Detach and clear meta device

metadetach -f d0 d2
metadetach -f d60 d62
metadetach -f d40 d42
metadetach -f d10 d12
metadetach -f d50 d52

metaclear d2
metaclear d62
metaclear d42
metaclear d12
metaclear d52

4) delete metadat and confirm

metadb -d c1t1d0s7

metastat -ac

5) do cfgadm -al for getting failed info , o/p shud be like below

:sh# cfgadm -al | grep c1t1d0
c1::dsk/c1t1d0 disk connected configured unknown
sh#

6) remove disk from OS
cfgadm -c unconfigure c1::dsk/c1t1d0

/// if this doesn't works DON"T USE -f cmd given below ///


sh# cfgadm -al | grep c1t1d0

c1::dsk/c1t1d0 disk connected unconfigured unknown
#


7) Ask SUN FE to do disk replacement

cfgadm -al

cfgadm -c configure c1::dsk/c1t1d0

do cfgadm -al for getting failed info , o/p shud be like below

sh# cfgadm -al | grep c1t1d0
c1::dsk/c1t1d0 disk connected configured unknown
sh#

8) check with format , then do the following to recreate and reattach svm devices

prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2

/usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t1d0s0

metadb -c 4 -a c1t1d0s7

metainit d2 1 1 c1t1d0s0
metainit d62 1 1 c1t1d0s6
metainit d42 1 1 c1t1d0s4
metainit d12 1 1 c1t1d0s1
metainit d52 1 1 c1t1d0s5


metattach d0 d2
metattach d60 d62
metattach d40 d42
metattach d10 d12
metattach d50 d52


9) hip hip horray U r all set keep running metastat -ac untill sync completed