From: Doug Ledford <dledford@redhat.com> Date: Thu, 11 Dec 2008 12:08:06 -0500 Subject: [openib] fix ipoib oops in unicast_arp_send Message-id: 1229015286.32405.103.camel@firewall.xsintricity.com O-Subject: [Patch RHEL5.3] Fix ipoib oops in unicast_arp_send Bugzilla: 476005 RH-Acked-by: Peter Martuccelli <peterm@redhat.com> This addresses https://bugzilla.redhat.com/show_bug.cgi?id=476005 After the last IPoIB oops patch, further testing turned up this additional item. It's already been submitted and accepted upstream into the mainline kernel and into OFED 1.4. I can't reproduce (my cluster is too small), but IBM reports that this solves an issue that caused something like 1400 machines to drop out of a cluster simultaneously. -- Doug Ledford <dledford@redhat.com> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband commit ff79ae80837cf45cb703b34824dd3862d2ddcb24 Author: Yossi Etigin <yosefe@Voltaire.COM> Date: Wed Nov 12 10:24:39 2008 -0800 IPoIB: Fix crash in path_rec_completion() Fix a crash in path_rec_completion() during an SM up/down loop. If more than one path record request is issued, the first completion releases path->done, allowing ipoib_flush_paths() to free the path, and thus corrupting it for the second completion. Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM change events") added the field path->valid and changed the test "if (!path)" to "if (!path || !path->valid)". This change made it possible for a path with an outstanding query to pass the test and issue another query on the same path. Having two queries on the same path leads to a crash. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1325>. Signed-off-by: Yossi Etigin <yosefe@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index 0f03d20..f7028ff 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -669,7 +669,7 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev, skb_push(skb, sizeof *phdr); __skb_queue_tail(&path->queue, skb); - if (path_rec_start(dev, path)) { + if (!path->query && path_rec_start(dev, path)) { spin_unlock(&priv->lock); path_free(dev, path); return;