Categories

JAVA DATEBASE
Technology Network Community
Oracle Database
Fusion Middleware
Development Tools
Java
Desktop
Server & Storage Systems
Enterprise Management
Berkeley DB Family
Cloud Computing
Big Data
Business Intelligence
Architecture
Migration and Modernization
E-Business Suite
Siebel
PeopleSoft Enterprise
JD Edwards World
Industries
JD Edwards EnterpriseOne
User Productivity Kit Pro (UPK) and Tutor
Governance, Risk & Compliance (GRC)
Master Data Management (MDM)
Oracle CRM On Demand
On Demand: SaaS and Managed Applications
AutoVue Enterprise Visualization
Primavera
ATG
Agile PLM
Endeca Experience Management
Fusion Applications
Archived Forums

 



Tags

Servers General Discussion


Poor IO performance on x2250 with SATA and ZFS root


Long story short. Installed s10u6 with zfs root(mirror from sd2 and sd3) and some zones quite a long time ago. Tried update some months ago. Update took 14 hours and was unsucessful so ZFS rollback of all filesystems was performed in safe mode. After that everything seemed to be OK, but later it turned out that after the failed update disk IO performance decreased.  The only indication we see that the problem arised after failed update is SpamAssassin logs which shows increase of mail scanning times (SA uses Postgres for Bayessian data storage). Now I have gathered "iostat -x sd2 sd3 60" (later with -xn option) and "vmstat 60" (not using vmstat in last few days) data and I see that the svc_t (when using -x) and wsvc_t (using -xn) are very large numbers. They go from 0.x to 50, 100 over the time. Testing drive IO with "dd if=/dev/zero of=/foo bs=2048k count=200 conv=fsync" sometimes can go up to 40MB/s, but at "bad" times is 10-15MB/s. When using dd I discovered that the beforementioned command executed 2 or 3 times (or with bigger count parameter) "fixes" the problem - svc_t (wsvc_t) drops to about 0.x - 10 in average and I can get about 29MB/s or even better with dd.  This week we moved server to another rack and discovered that after that performance was even worse (MB/s wise and svc_t/wsvc_t wise - 8MB/s and 100,200 svc_t/wsvc_t). We went to site and discovered some consequences - moving server (or other servers nearby) around seemed to have impact on how server performs. "DD effect" ("fix") has significantly less impact now and it doesn't help 90% of the time. It seems like the vibration in the rack or server itself could be the coause. We also are considering to experiment with temperature, by opening or closing rack doors. Sensor readings are between 28-33 for /SYS/MB/T_AMB0 33-36 for AMB1 as seen in iLOM when opening or closing doors.  Average data: # iostat -xn                     extended device statistics                   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device    17.0  136.3  801.9 4362.8  8.8  0.6   57.5    4.0  40  27 c3t0d0    17.3  136.1  802.6 4362.8  8.3  0.6   54.0    3.6  38  26 c3t1d0     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 noname:vold(pid654)   When I stopped the most IO generating thing - monitoring I also saw frightening numbers at asvc_t in the middle of "iostat -xn sd2 sd3 5" wich was surrounded by almoust no disc activity. extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.0    8.6    0.0  320.0  0.0  0.0    0.0    0.2   0   0 c3t0d0     0.0    8.6    0.0  320.0  0.0  0.0    0.0    0.2   0   0 c3t1d0                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.0   70.4    0.0 5708.9  1.1 11.3   15.8  160.8  35  36 c3t0d0     0.0   74.8    0.0 6201.4  1.0 11.1   13.5  148.1  33  36 c3t1d0                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.0   30.2    0.0 1033.7  0.2  4.4    6.4  144.8   7  24 c3t0d0     0.0   21.2    0.0  541.2  0.0  0.7    0.0   34.4   0   7 c3t1d0                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c3t0d0     0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c3t1d0   The first thing we tried after ruling out possibility that someone is generating too heavy load on system (because 4000kw/s is hard to call "heavy") was updating pieces of system wich are connected with kernel or drives. System now is running: # uname -a SunOS noname 5.10 Generic_141415-09 i86pc i386 i86pc Solaris   Right now It seems that kw/s above 3000 goes in hand with increased wsvc_t: # iostat -xn sd2 sd3 5                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device    17.0  136.3  801.4 4363.4  8.8  0.6   57.5    4.0  40  27 c3t0d0    17.3  136.1  802.1 4363.4  8.3  0.6   54.0    3.6  38  26 c3t1d0                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.0   47.0    0.0 2970.6  0.0  0.0    0.4    0.3   2   1 c3t0d0     0.0   46.6    0.0 2970.6  0.0  0.0    0.9    0.3   4   1 c3t1d0                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.0  134.6    0.0 7521.5 28.5  0.6  211.4    4.3  83  58 c3t0d0     0.0  133.8    0.0 7489.2 33.0  0.6  246.9    4.2 100  57 c3t1d0                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.4   75.2    0.8 3387.1  0.5  0.0    6.4    0.3  13   2 c3t0d0     0.8   78.6    4.0 3419.4  0.5  0.0    6.6    0.4  14   3 c3t1d0                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.2  136.0   12.8 7442.9 21.3  0.7  156.5    4.8  89  65 c3t0d0     0.0  124.0    0.0 7442.8 21.2  0.6  171.2    5.0  88  61 c3t1d0                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.0   61.2    0.0 3495.8  0.2  0.0    3.9    0.3   8   2 c3t0d0     0.0   57.0    0.0 3495.9  0.1  0.0    2.0    0.3  11   2 c3t1d0                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.0  130.4    0.0 7461.4  4.5 18.8   34.4  143.9  76  71 c3t0d0     0.0  122.6    0.0 7461.4 19.5  0.6  158.8    5.1  76  62 c3t1d0                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.4   50.0    2.0 2388.1  0.6  0.0   11.4    0.4  16   2 c3t0d0     0.2   50.0    1.6 2388.1  0.9  0.0   17.4    0.3  23   1 c3t1d0                     extended device statistics     r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.0  141.2    0.0 7764.8 13.2  0.4   93.6    3.0  42  42 c3t0d0     0.0  158.4    0.0 7770.0 13.3  0.4   83.7    2.7  42  43 c3t1d0   I've also tried to utilize dtrace for performance data gathering, but I mostly used it trying to pinpoint someone who'se actions might cause the problem with no luck.

During the writing of this post, engineer arrived at site and opened both doors of the rack. Temperature on AMB0 droped to 32 and the wsvc_t droped instantly: extended device statistics                   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device    16.8  136.0  792.7 4379.4  8.8  0.6   57.5    4.0  40  27 c3t0d0    17.1  135.8  793.4 4379.4  8.3  0.5   54.0    3.6  38  26 c3t1d0                     extended device statistics                   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.0  177.5    0.0 10312.3  3.7  0.1   20.9    0.7  31  13 c3t0d0     0.0  165.6    0.0 8436.5  2.3  0.1   13.9    0.5  18   9 c3t1d0                     extended device statistics                   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.4  120.8   38.4 7196.5  0.0  0.0    0.4    0.3   4   4 c3t0d0     0.2  120.6   12.8 7196.5  0.0  0.0    0.4    0.3   5   4 c3t1d0                     extended device statistics                   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.0  108.6    0.0 6737.2  0.0  0.0    0.3    0.3   3   3 c3t0d0     0.0  109.0    0.0 6737.2  0.0  0.0    0.4    0.3   4   3 c3t1d0                     extended device statistics                   r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device     0.0   92.4    0.0 5857.7  0.0  0.0    0.3    0.3   3   3 c3t0d0     0.0   92.2    0.0 5857.7  0.0  0.0    0.3    0.3   2   3 c3t1d0   Also dd gives relatively better results(atlhought non-satisfactory) than before: # dd if=/dev/zero of=/foo bs=2048k count=100 conv=fsync 100+0 records in 100+0 records out 209715200 bytes (210 MB) copied, 6.20885 seconds, 33.8 MB/s # dd if=/dev/zero of=/foo bs=4096k count=200 conv=fsync 200+0 records in 200+0 records out 838860800 bytes (839 MB) copied, 35.7555 seconds, 23.5 MB/s   Edited by: Marcis_L on Oct 1, 2009 2:02 AM, added dd information.

Ough, forgot that I marked it as question. The question would be - could the ambient vibrations or temperature (3 degree decrease) variations cause the system to perform worse in sence of IO performance? Maybe it's some kind of defect for server and we should contact SUN's reseller about this strange behaviour?

Don't know if this will help, but we've observed very poor ZFS performance when using ZFS software based raid to mirror (raid 1) two system disks. This is on x4240 systems.   When we removed the ZFS software mirroring, and instead relied upon hardware mirroring from an installed RAID controller, performance improved to an acceptable level, and we are now extremely pleased with ZFS fileserving performance from the x4240, serving filesystems from disks contained in a SAN.  As for your question, yes, temperature can affect performance, but I doubt vibration would, unless it was causing intermittent disconnects to occur on the network cabling. However, I've seen SUN servers sitting in 90F closed closets, sitting behind stacks of newspapers, and they just cruise along. So you may want to check your RAID settings first, as well as the hardware and message logs (/var/adm).

OK. After some try-and-watch things we moved server down in rack where air conditioner blows directly to server. Temperatures and FAN speeds decreased even more, dd performed better, wsvc_t decreased to about 50ms. /SYS/MB/T_AMB0	Temperature	25.000 degrees C /SYS/MB/T_AMB1	Temperature	30.000 degrees C .... /SYS/F0/TACH	Fan	4560.000 RPM /SYS/F1/TACH	Fan	3120.000 RPM /SYS/F2/TACH	Fan	2640.000 RPM  But after we dismounted server from rails and put it on shelf with plastic airbags beneath it wsvc_t decreased even more and now spikes are not more than 25ms and dd can get even 41MB/s.  I forgot to mention that after system patching we also tried to change disk drives, but that didn't help. I also wasn't aware of capabilities of the built in diagnostic utility (regarding disk testing) and will try that out when server downtime will be possible. Also, as this is mirror of two disks then mostly they perform similarly in sense of IO operation count, bandwidth and service times, but sometimes, when everything goes bad I can see that service times start to differ and are not synchronous (looking in graphs, as they usually are).  I'm also aware of this  [http://blogs.sun.com/brendan/entry/unusual_disk_latency|http://blogs.sun.com/brendan/entry/unusual_disk_latency] which for now seems the best explanation of this issue.

The only suspicios things to me seems this: Sep 29 00:04:17 noname unix: [ID 954099 kern.info] NOTICE: IRQ20 is being shared by drivers with different interrupt levels. Sep 29 00:04:17 noname This may result in reduced system performance. ... in /var/adm/messages  The ZFS mirror isn't guilty as I tested how system performs with single disk also.  Also, I switched disks to AHCI mode in bios (it was the very first thing I did).

Here's graphical representation of last 24 hour "iostat -xn" data.  [http://imagebin.ca/view/hSQ2jWC5.html|http://imagebin.ca/view/hSQ2jWC5.html]   Timeline: Start -19" rack with x2250 and some other servers in it. Front rack door closed, rear open. 17:30 old sun Fire which was right above x2250 is moved 1U upwards, put on shelf and plastic airbags. Rear rack door gets closed. about 21:30 I try dd'ing root partition, but now that helps no more about 8:00 I try dd'ing again with no luck at 9:00 (wsvc_t drops sinificantly) engineer arrived on-site and opened both rack doors at 10:00 (wsvc_t have rised a little bit agian) engineer lowers x2250 to bottom of the rack where cold air blows directly in front of sever and wsvc_t drops but later climbs up again at 11:40 the x2250 is put on plastic airbags - the graph shows the impact - wsvc_t drops lower (and stays there) than ever.

Speaking of vibrations, it's probably not relevant, but server is actually mounted at 120m height in TV tower and accordingly tower is also swinging  in case of strong winds.

Believe it or not, but this night engineer on site tried screaming to server. As he had no knowledge about DTrace, he compared results from following command dd if=/dev/zero of=/foo bs=1024k count=400 conv=fsync and results are quite good. At first, dd shows IO troughput from 27MB/s to 45MB/s. Then two tests while screaming on disks showed 16MB/s. Right after another test without screaming - 37MB/s. And one "screaming test" 14MB/s. Afterwards there was two tests when not screaming - 38MB/s and 41MB/s.  I guess I'll mark this quiestion answered. Although dd isn't the most precise tool to measure such strange thing it clearly showed difference between disk performance while they are under varying noise (vibration) conditions. Now we must find solution. I'll soon have  such results from screaming on another x2250 and then I'll be able to make some conclusions . Maybe someone here also can try the same test while screaming to server.  p.s. Tests was done while server was under load, so there is a chance that these results might be a coincidence, but I guess that it is very unlikely, because, as I said, usual server workload generates pretty stable (above 5MB/s writing) IO flow to disks.

Yeah, right! It would be good if this would be the problem...  The noise and some vibrations definitely have impact on disk performance, but this is not the main cause of this problem. Server was moved to completely different server room, but we still see this problem. When analyzing "iostat -xn" output I see that big wsvc_t times are happening when write operation count increases significantly and amount of written data do disks decrease. It looks like something creates a lot of small write operations which spend more time in wait queue and that slows down io.

Marking this as unanswered.

Seems, that only "conv=fsync" for dd does the magick because with it the "dd trick" doesnt help.  I modified sd_cmds.d script I found somewhere, and created the following D script #!/usr/sbin/dtrace -qs  BEGIN { reftime=timestamp }   ::sdintr:entry { self->id = timestamp; }   sd_start_cmds:entry /self->id/  {         /*         printf("%s,%s\n",zonename,execname);         @c[zonename,execname]=count();         */         @c[stack()]=count(); }   ::sdintr:return /self->id/ { self->id = 0; }   tick-5sec {         normalize(@c,(timestamp-reftime)/1000000000);         reftime=timestamp;         printf("Timestamp is %d\n",timestamp);         trunc(@c,5);         /*         printa("%s %@d\n",@c);         */         printa(@c); }  and then ran it while performance was bad and after it was "fixed" with this "dd magick". The output was the same, except for counters. When IO was slow...sd`sd_return_command+0xd7               sd`sdintr+0x187               sata`sata_txlt_nodata_cmd_completion+0x95               ahci`ahci_intr_cmd_cmplt+0x212               ahci`ahci_port_intr+0x211               ahci`ahci_intr+0x8a               unix`av_dispatch_autovect+0x78               unix`intr_thread+0x5f               206                 sd`sd_return_command+0xd7               sd`sdintr+0x187               sata`sata_txlt_rw_completion+0x145               ahci`ahci_intr_set_device_bits+0x1ae               ahci`ahci_port_intr+0x1f6               ahci`ahci_intr+0x8a               unix`av_dispatch_autovect+0x78               unix`intr_thread+0x5f              3240   And when it was OK sd`sd_return_command+0xd7               sd`sdintr+0x187               sata`sata_txlt_nodata_cmd_completion+0x95               ahci`ahci_intr_cmd_cmplt+0x212               ahci`ahci_port_intr+0x211               ahci`ahci_intr+0x8a               unix`av_dispatch_autovect+0x78               unix`intr_thread+0x5f               275                 sd`sd_return_command+0xd7               sd`sdintr+0x187               sata`sata_txlt_rw_completion+0x145               ahci`ahci_intr_set_device_bits+0x1ae               ahci`ahci_port_intr+0x1f6               ahci`ahci_intr+0x8a               unix`av_dispatch_autovect+0x78               unix`intr_thread+0x5f               920   I'm newbie with D trace. I have tried, but with no success, to create D script that could show who's to blame for this increased amount of write operations. I've created scripts, that save zonename and execname when entering syscall::write:entry and then aggregated by them (with cound()) in sd_start_cmds:entry probe is this the correct way? #!/usr/sbin/dtrace -qs #pragma D option dynvarsize=4m #pragma D option cleanrate=1000hz   BEGIN { start=timestamp; }   syscall::*write*:entry { self->execname=execname; self->id=1; self->op="write"; }   syscall::*read*:entry { self->execname=execname; self->id=2; self->op="read"; }   syscall::*write*:return, syscall::*read*:return /self->id/  { self->id=0; self->op=0; self->execname=0; }   sd_start_cmds:entry /self->id/ { @c[zonename,execname,self->op]=count(); }   tick-5sec {         printf("%s %50d\n","the timestamp",walltimestamp/1000000000);         /*         trunc(@c,50);         */         normalize(@c,(timestamp-start)/1000000000);         start=timestamp;         printa("%s,%s,%s %@15d\n",@c);         trunc(@c); }


Related Links

Sun Fire 280R
Sun Fire X4600 - reset /SP and start /SYS commands fails
Sun Fire V240 Falash update failed and server can't boot
How to reset RSC login password on Sun fire V490?
V240 SCSI compatibility
x4600 and Windows 2003 x64
faqs about clusters..
Taking a peak MIPS output and sizing a CPU requirement
how to configure the cluster?
v440 hang at "Hardware watchdog enabled" after apply EIS-DVD 3.3.1
amperage consumption of v20z
Poweron from sc men 
Sun Blade 6000 , X6220 AMD Module, ilom Redirection
hardware question
How to get Console Login
about v880 server