01月12, 2017

nova volume-attach bug

现象

使用cinder创建网盘,然后挂载到虚拟机上,正常命令为

nova volume-attach servicename volumeid

但是发现虚拟机挂载第五块盘无法挂载成功。 具体现象是:nova volume-attach 执行成功,但是cinder list看到volume 没有in-use信息,虚拟机里面fdisk -l也看不到此设备。 nova volume是异步执行

  1. control将挂载指令发给rabbitmq
  2. 对应compute接到执行,调用nova libvirt driver执行挂载操作
  3. 调用libvirt api开始执行

    分析

    排查发现control端没有报错信息,compute端有如下报错
    2016-06-07 10:03:31.802 4622 ERROR oslo_messaging.rpc.dispatcher [req-69a5223d-efaf-41b0-a687-3873c71e0872 a8378d7f10f04ad69121411eca42d8b5 176e6938dadd45dcaa847d8242778321 - - -] Exception during message handling: internal error: Could not find scsi controller with index 1 required for device
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     executor_callback))
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     executor_callback)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     result = func(ctxt, **new_args)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6761, in attach_volume
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     bdm=bdm)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 460, in decorated_function
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/exception.py", line 88, in wrapped
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/exception.py", line 71, in wrapped
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     return f(self, context, *args, **kw)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 344, in decorated_function
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     LOG.warning(msg, e, instance_uuid=instance_uuid)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 315, in decorated_function
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 372, in decorated_function
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     kwargs["instance"], e, sys.exc_info())
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 360, in decorated_function
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     return function(self, context, *args, **kwargs)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4841, in attach_volume
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     do_attach_volume(context, instance, driver_bdm)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 445, in inner
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1119, in attach_volume
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     self._disconnect_volume(connection_info, disk_dev)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     six.reraise(self.type_, self.value, self.tb)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1108, in attach_volume
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     virt_dom.attachDeviceFlags(conf.to_xml(), flags)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     result = proxy_call(self._autowrap, f, *args, **kwargs)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     rv = execute(f, *args, **kwargs)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     six.reraise(c, e, tb)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     rv = meth(*args, **kwargs)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 557, in attachDeviceFlags
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     if ret == -1: raise libvirtError ("virDomainAttachDeviceFlags() failed", dom=self)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher libvirtError: internal error: Could not find scsi controller with index 1 required for device
    
    报错信息比较多,找几个重点的信息
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 557, in attachDeviceFlags
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher     if ret == -1: raise libvirtError ("virDomainAttachDeviceFlags() failed", dom=self)
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher libvirtError: internal error: Could not find scsi controller with index 1 required for device
    
    第二个 attach volume函数
    2016-06-07 10:03:31.802 4622 TRACE oslo_messaging.rpc.dispatcher   File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1108, in attach_volume
    
    看一下源码 /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py中的attach_volume函数 alt
    大致是前面nova volume会根据挂载的信息,生成出一份xml文件,比如挂载的设备/dev/sda,挂载类型cinder,ceph的认证信息等。 比较关键的是flags值,这是传给virsh的一个标志位,告诉virsh使用何种方式挂载,基本分以下几种
  4. 影响下一次引导
  5. 影响当前域

  6. flags 通过VIRT_DOMAIN_AFFECT_CONFIG赋值,值为2 nova判断虚拟机的状态,如果状态为运行中,flags和 VIR_DOMAIN_AFFECT_LIVE做一次位或,值就变成3了。 这个flags值具体代表什么意思呢? 看virsh文档
    alt 总之nova 准备好xml文件,flags值,最后调用,virDomainAttachDeviceFlags这个函数,这个函数哪来的? 是libvirt.py中libvirtmod.virDomainAttachDeviceFlags来调用,libvirtmod是一个二进制so文件,随python-libvirtrpm包安装的
    alt
    总之 nova做好前期准备,最终干活的函数是libvirt.py中libvirtmod.virDomainAttachDeviceFlags,这个函数需要3个参数,domain对象,xml文件,flags。

    模拟

    那么使用virsh命令是否可以模拟这次挂载呢,当然可以 使用
    virsh attach-device --domain instance-00001430 --file 1.xml --config --live
    
    -domian 对应virDomainAttachDeviceFlags第一个参数 domain对象
    --file 1.xml 对应生成的xml文件
    --config --live 对应flags参数3,因为flags是这两个值的按位或
    nova volume-attach生成的xml文件如下
    <driver name="qemu" type="raw" cache="writeback"/>
    <source protocol="rbd" name="volumes/volume-6a804a02-8bf7-4ae9-9fcc-af625281c1bf">
     <host name="10.138.123.194" port="6789"/>
     <host name="10.138.123.195" port="6789"/>
     <host name="10.138.123.196" port="6789"/>
    </source>
    <auth username="cinder">
     <secret type="ceph" uuid="12a18a3c-e67b-4aa4-968b-e60e458fe041"/>
    </auth>
    <target bus="scsi" dev="sdh"/>
    <serial>6a804a02-8bf7-4ae9-9fcc-af625281c1bf</serial>
    </disk>
    
    执行这条命令,报错如下,和nova log记录的一样。
    Could not find scsi controller with index 1 required for device
    
    这时候可以断定是virsh本身问题导致挂载失败。

    其它尝试

    1.添加scsi controller
    cat add-scsi-controller.xml
    <controller type="scsi" index="1" model="virtio-scsi"/>
    
    将设备热添加到虚拟机中
    virsh attach-device --config instance-0000142f 
    add-scsi-controller.xml --live
    
    执行attach操作
    nova volume-attach servicename volumeid
    
    2.指定scsi controller 插槽进行挂载
    <disk type="network" device="disk">
    <driver name="qemu" type="raw" cache="writeback"/>
    <source protocol="rbd" name="volumes/volume-6a804a02-8bf7-4ae9-9fcc-af625281c1bf">
     <host name="10.138.123.194" port="6789"/>
     <host name="10.138.123.195" port="6789"/>
     <host name="10.138.123.196" port="6789"/>
    </source>
    <auth username="cinder">
     <secret type="ceph" uuid="12a18a3c-e67b-4aa4-968b-e60e458fe041"/>
    </auth>
    <target bus="scsi" dev="sdh"/>
    <serial>6a804a02-8bf7-4ae9-9fcc-af625281c1bf</serial>
    <address type="drive" controller="0" bus="0" target="0" unit="7"/>
    </disk>
    
    使用如下命令挂载。
    virsh attach-device --domain instance-00001430 --file 1.xml --config --live
    

    注:这样挂载 cinder数据库并不会更新,故openstack不知道这个volume挂载在了这台虚拟机下

因为虚拟机已经挂载了其它volume,我将挂载好的volume xml截取出来,然后做一下改动,改成这个样子 使用 virsh dumpxml 导出虚拟机xml,截取挂载磁盘部分 将volumes-id 换成要挂载的,将name='scsi0-0-0-5改成name='scsi0-0-0-7,将unit='5 id编号改成unit='7

<disk type="network" device="disk">
  <driver name="qemu" type="raw" cache="writeback"/>
  <auth username="cinder">
    <secret type="ceph" uuid="12a18a3c-e67b-4aa4-968b-e60e458fe041"/>
  </auth>
  <source protocol="rbd" name="volumes/volume-6a804a02-8bf7-4ae9-9fcc-af625281c1bf">
    <host name="10.138.123.194" port="6789"/>
    <host name="10.138.123.195" port="6789"/>
    <host name="10.138.123.196" port="6789"/>
  </source>
  <backingStore/>
  <target dev="sdh" bus="scsi"/>
  <serial>6a804a02-8bf7-4ae9-9fcc-af625281c1bf</serial>
  <alias name="scsi0-0-0-7"/>
  <address type="drive" controller="0" bus="0" target="0" unit="7"/>
</disk>

然后重新执行

virsh attach-device --domain instance-00001430 --file 1.xml --config --live

虚拟机可正常挂载/dev/sdh设备

使用以下命令也可查看当前虚拟机挂载的设备

virsh qemu-monitor-command --hmp instance-00001430   "info block"

结论

virsh在挂载scsi设备第7个编号时,会有报错,导致虚拟机无法挂载第5块网盘 已提交bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1343302 最新进展,官方回复了这个bug,在libvirt-1.3.5版本fix了此bug alt

本文链接:https://www.opsdev.cn/post/nova-volume-attach-bug.html

-- EOF --

Comments

评论加载中...

注:如果长时间无法加载,请针对 disq.us | disquscdn.com | disqus.com 启用代理。