From: AMEET M. PARANJAPE <aparanja@redhat.com> Date: Thu, 18 Dec 2008 15:15:21 -0500 Subject: [openib] restore traffic in connected mode on HCA Message-id: 20081218201442.8117.91571.sendpatchset@squad5-lp1.lab.bos.redhat.com O-Subject: [PATCH RHEL5.3 BZ477000] restore data transmission in connected mode on any HCA Bugzilla: 477000 RH-Acked-by: Doug Ledford <dledford@redhat.com> RH-Acked-by: David Howells <dhowells@redhat.com> RHBZ#: ====== https://bugzilla.redhat.com/show_bug.cgi?id=477000 Description: =========== This patch restores traffic between systems using IPoIB connect mode (CM). It assigns the receive array for CM mode. Have tested this patch with netperf (multiple instances) on several different combinations of HCAs between Rhel 5.3 (build 126) and Rhel 5.2 between two system-Ps and system-P system-X. RHEL Version Found: ================ RHEL 5.3 Beta Snapshot3 kABI Status: ============ No symbols were harmed. Brew: ===== Built on all platforms. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=1621075 Upstream Status: ================ This is specific to RHEL 5.3. The problem is not seen in mainline. Test Status: ============ To recreate the problem: 1. install RHEL5.3 snapshot 6 on any platforms, reboot the node, default is ipoib-cm mode, 2. run "ping" from one node to another node, the remote node is unreachable, 3. echo datagram >/sys/class/net/ib0/mode, 4. ping the remote node again, it works with ipoib-ud mode. Validate the fix: 1. apply this patch, rebuild/reload ib_ipoib module 2. run "ping" from the one node to another node, it works across different platforms and different HCAs. 3. run "netperf/netserver" multiple streams test, ipoib-cm works fine. 4. echo datagram >/sys/class/net/ib0/mode 5. ping or netperf/netserver test, it works with ipoib-ud mode. On Rhel 5.3 edit /etc/sysconfig/network-scripts/ifcfg-ib* to comment out "CONNECTED_MODE" and "MTU" and execute "/etc/init.d/openibd restart" to change to UD mode, instead of using "echo datagram > ...." That is not supported with Rhel 5.3. =============================================================== Ameet Paranjape 978-392-3903 ext 23903 IBM on-site partner Proposed Patch: =============== diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c index 5f87d20..315a434 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -375,7 +375,7 @@ struct ib_sge *sge) sge[i].length = PAGE_SIZE; wr->next = NULL; - wr->sg_list = priv->cm.rx_sge; + wr->sg_list = sge; wr->num_sge = priv->cm.num_frags; } @@ -1563,7 +1563,7 @@ destory_srq: int ipoib_cm_dev_init(struct net_device *dev) { struct ipoib_dev_priv *priv = netdev_priv(dev); - int i, ret; + int i, ret, j; struct ib_device_attr attr; INIT_LIST_HEAD(&priv->cm.passive_ids); @@ -1601,6 +1601,25 @@ int ipoib_cm_dev_init(struct net_device *dev) priv->cm.num_frags = IPOIB_CM_RX_SG; } + if (ipoib_cm_has_srq(dev)) { + for (j = 0; j < ipoib_recvq_size; ++j) { + for (i = 0; i < priv->cm.num_frags; ++i) + priv->cm.rx_wr_arr[j].rx_sge[i].lkey = + priv->mr->lkey; + + priv->cm.rx_wr_arr[j].rx_sge[0].length = + IPOIB_CM_HEAD_SIZE; + for (i = 1; i < priv->cm.num_frags; ++i) + priv->cm.rx_wr_arr[j].rx_sge[i].length = + PAGE_SIZE; + + priv->cm.rx_wr_arr[j].wr.sg_list = + priv->cm.rx_wr_arr[j].rx_sge; + priv->cm.rx_wr_arr[j].wr.num_sge = priv->cm.num_frags; + } + priv->cm.head = &priv->cm.rx_wr_arr[0]; + } + ipoib_cm_init_rx_wr(dev, &priv->cm.rx_wr, priv->cm.rx_sge); if (ipoib_cm_has_srq(dev)) {