Watching strace -p 23339 -tt -T in another window, where 23339 is the PID of our
virt-v2v-server process - reads just machine gun past. But now it's doing a bunch of
writes and they blip past kind of like individual shotgun blasts. OK, lousy metaphor but
you get the idea. And some of the writes hang for a while, like the one I caught earlier.
In fact, I just grabbed another couple of hanging writes - see below. I inserted a bunch
of blank space at each hang. And now that we have some footprints, see an action plan
below that.
.
.
.
10:58:04.647333 write(5,
"*\215N\367\273\377\1\0\0\323\343\v\323\213\312\301\371\10\210\17GN\301\342\10\203\376\v~2\203\356"...,
8192) = 8192 <0.233197>
10:58:04.880702 write(5,
"\310\323\343\3\323\213\336\203\300\10\353\10\205\300\17\214\3\2\0\0\212\2174\354\2\0\204\311\2134\275X"...,
8192) = 8192 <0.000043>
10:58:04.880935 write(5,
"\306\213\3513\3523\357\301\311\2\3\305\213t$`\213l$h3\365\213\254$\200\0\0\0003\365\213"...,
8192) = 8192 <0.000034>
10:58:04.881103 write(5,
"\n\213\302\203\360\377\3\373\213i8\v\3073\303\3\365\5\247#\224\253\3\360\301\306\17\213\303\203\360\377"...,
8192) = 8192 <0.000033>
10:58:04.881269 write(5,
"\311\302\34\0\314\314\314\314\314\314\213\377U\213\354\213M\20\213E\fS3\333\210\10@C\205\311\17\216"...,
8192) = 8192 <0.233241>
10:58:05.114669 write(5,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) =
8192 <0.000038>
10:58:05.114932 write(5, "WithTag\0N\0ExFreePoolWithTag\0\265\5st"..., 8192) =
8192 <0.000035>
10:58:05.115095 write(5,
".rdata\0\0$\2\0\0\200C\0\0\200\2\0\0\200C\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
<0.000039>
10:58:05.115306 write(5,
"\377\377\351\263\376\377\377j\0j\1\215\206\264\2\0\0P\377\25\10D\1\0\351\254\376\377\377\215\216\234"...,
8192) = 8192 <0.233190>
10:58:05.348659 write(5,
"\2\2111^[_]\302(\0\314\314\314\314\314\314\377%\264C\1\0\314\314\314\314\314\314\377%\270C"...,
8192) = 8192 <0.000038>
10:58:05.348884 write(5,
"\356\1\0\0\0\25\0\0\0\2\0\0\0\25\0\0\0\0\0\0\0\0\0\0\0\0\0\0@\0\0H"..., 8192) =
8192 <0.000035>
10:58:05.349055 write(5, "P\0\0\0\32\0\0\0\0\5\0\0
\3\0\0\0\17\0\0\1\0\0\0\30\0\0\0<\0\0\0"..., 8192) = 8192 <0.000082>
10:58:05.349311 write(5,
"u\30\377u\24\377u\20S\377u\10\350\375\30\1\0;\307\211E$\17\204\204\2\0\0009}\364\17"...,
8192
) = 8192 <63.979352>
10:59:09.328867 write(5,
"\1\0\0\213K\4\213E\350;H\ft\r\213M\10\366A\34\1\17\205L\1\0\0\377u\10P\350"...,
8192) = 8192 <0.000070>
10:59:09.329110 write(5,
"WP\350s\331\0\0\211\2050\377\377\377\213M\374\213\2050\377\377\377_^[\350\333\324\0\0\311\302"...,
8192) = 8192 <0.000048>
10:59:09.329268 write(5,
"\354\f\213\r\250Q\367\277SV\301\341\5\213\221\300Q\367\277\213\201\310Q\367\277W\213}\10\213w$"...,
8192) = 8192 <0.000050>
10:59:09.329422 write(5,
"\17\204\244\0\0\0\213G\4;C\fu\31\377u\20\215E\350P\350\354p\377\377j\0S\350\212\234"...,
8192) = 8192 <0.536185>
10:59:09.865891 write(5,
"\300\213E\274\3\310\211M\334\213\n;\371}\2\213\317\3\310\213B\3749E\304\211M\344\213M\304\177"...,
8192) = 8192 <0.000169>
10:59:09.866274 write(5,
"\277\211\204\21\354\27\1\0\2418@\367\277\203\274\1\354\27\1\0\0\17\204\362\0\0\0\213O\f#\313"...,
8192) = 8192 <0.000163>
10:59:09.866663 write(5,
"\300\212g\367\212G\370f\205\300t\3\200\n\0043\300\212g\371\212G\372f\205\300t\3\200\n\0103"...,
8192) = 8192 <0.000168>
10:59:09.867043 write(5,
"\0w\n\306\2\320\212\331B\376\313\353\351\306\2\367Bf\211\nBB\213]\364\210Z\3\301\353\10\210"...,
8192
) = 8192 <189.274151>
11:02:19.141412 write(5,
"\0\4\0\4\2\6\2\6\0\4\0\4\2\6\2\6\1\5\1\5\3\7\3\7\1\5\1\5\3\7\3\7"..., 8192) =
8192 <0.233716>
11:02:19.375339 write(5,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) =
8192 <0.000288>
11:02:19.375800 write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0.text\0\0\0
4\t\0\0\20\0\0"..., 8192) = 8192 <0.000269>
11:02:19.376243 write(5,
"L\2\0\0\205\300t\23Pj\0\377\25|V\t\20\307\206L\2\0\0\0\0\0\0\213\206P\2\0"...,
8192) = 8192 <0.000279>
11:02:19.376693 write(5, "\0\0\302\10\0j\0S\215L$
Q\213\317\350\234\365\377\377\213\360\205\366|\305\213\317\350\317\345\377"..., 8192)
= 8192 <0.234436>
11:02:19.611311 write(5,
"U\213\354j\377h\373\332\10\20d\241\0\0\0\0Pd\211%\0\0\0\0\203\354\24S\213]\20V"...,
8192) = 8192 <0.000288>
11:02:19.611772 write(5,
"\3\0\0\213\317\306D$\34\4\350i\237\7\0\307\7\24\371\t\20\215\276h\3\0\0\213\317\306D$"...,
8192) = 8192 <0.000295>
11:02:19.612241 write(5,
"\374\353\0023\300\213P,\271\1\0\0\0;\321t\r\211H,\213@\10\213\20QP\377R\f_3"...,
8192) = 8192 <0.000265>
11:02:19.612677 write(5,
"\213\6W\213\316\307F,\270\2\0\0\377P8\215L$\20Q3\333h\314\263\t\20\211\\$\30\377"...,
8192) = 8192 <0.232419>
11:02:19.845273 write(5,
"V\213\361\213Fd\205\300t&\213F`\205\300t\7P\377\25\314V\t\20\213FhP\377\25\330R"...,
8192) = 8192 <0.000290>
.
.
.
Based on the evidence we've been able to uncover, it feels to me like we're having
a problem writing to the NFS server. Here's why I think so.
virt-p2v-server pretty much reads blocks from the source server and writes to the Export
NFS share - right? Reads are fast and furious, writes are very slow and sometimes even
hang for ***minutes*** at a time.
If my hunch is right, then it seems to me we either have an issue with the virtual NIC on
our Fedora16-64P2V VM, or the NFS server, or the network in-between.
The source server isn't part of the picture because we've tried several source
servers and the behavior is the same for every one.
We've simplified the network as much as we can. We have a brand new HP Procurve
2910al-24G and all the players are directly connected to that switch. So I think we can
eliminate the network since all the transactions travel over the same switch, just to/from
different sources and destinations.
That leaves either the conversion server or NFS server as culprits.
So here's what I'd like to try:
Let's eliminate the Storagetek NFS Server first.
Let's build up that 500 GB box Fredy mentioned as a physical F16 system and set it up
as an NFS server. Change the RHEV Export domain to point to this new NFS server and try
some P2Vs using the same Fedora VM as our conversion server. Observe the results with
strace and overall.
If this doesn't change the symptoms, then let's add the virt-v2v stuff to that 500
GB Fedora system and we'll use it both as a conversion server and NFS server and try
again.
And finally, to round out the testing, if we still have energy left, we can set up the
Storagetek NFS server again as the RHEV export domain and try some more P2Vs with that 500
GB physical Fedora system to the Storagetek.
- Greg