Jump to content

nForce LAN port STRANGENESS (really.. you gotta read this)


Recommended Posts

Ok, so I've had this rig since about May and except for some fun w/ memory that I resolved by putting good stuff in, it's been rock solid.... until this week. (apologies in advance -- this is a long post, as the problem is very finite and I've done much troubleshooting on it.)

 

I started having problems sending e-mail. (Client is Eudora, for what it's worth.) Some e-mails would go out, some wouldn't. I run my own mail server, and everything is GigE LAN-connected. I also couldn't send the same e-mails directly to my ISP (RoadRunner) mail servers, so I knew it wasn't the mail server. Other machines on the LAN *could* send the same e-mail, so I knew it was something with this box.

 

Troubleshooting further, I found I could generally send short e-mails, but anything over a few lines wouldn't go. Given that I have access to the mail server, I could see the logs on the other end, and see that the session was being disconnected before the e-mail was finished. (WTF??)

 

This one had me baffled, but I was determined. I installed Ethereal on both machines and started capturing SMTP traffic across both machines. What I found baffled me even further. Text versions of the trace files, edited for privacy, are attached for anyone who wants to play along with the home game. Python is the mail server, Zoot is the mail client.

 

In short, the mail client sends the standard SMTP commands - MAIL FROM:, RCPT TO:, DATA, etc. and starts sending the mail body. This e-mail happens to span 2 Ethernet frames (TCP segments). From the client (Zoot) side, you can see the e-mail body get transferred, but from the mail server (Python) side, you don't see the first segment of the body. The second arrives, out of sequence, and the mail server starts sending a retransmit request (an ACK to the previous segment).

 

The client starts doing a TCP retransmit on the segment, but it never appears to get to the mail server. Interesting ....

 

Think this is my OS? Nope... as soon as I switched to the Marvell Yukon NIC (same IP address, same GigE port, same Cat5e cable), everything is perfectly fine. Switch back to the nForce Vitesse, it goes into the crapper again.

 

 

I call this a serious bug in the nForce NIC drivers, as they should be handling the TCP sequence. Anyone wanna disagree with me ???

 

I haven't had time to do anything else (clear CMOS, re-test) because I'd really like to get to the bottom of this and wanted to collect more failure data. However, I did upgrade from the ForceWare 6.33 drivers I was running (stock from mobo CD) to the 6.69 drivers (upgraded all - NIC, bus, IDE, etc.), with same results.

 

Another interesting tidbit -- this machine is seeding anywhere from 1-10 torrents at any given time, so it runs a lot of traffic through it all day. If I look at the TCP stats, I see more than 2% TCP retransmits. I'd expect somewhere between 0-1%, but 2% is a bit high. May be nothing to look at.

 

GigE switch (SMC 8508) in between has been power-cycled, all machines rebooted/power cycled. Though switch supports it, Jumbo Frames is not being used.

 

I think this NIC (the PHY part) is hosed... anyone have any thoughts??

Share this post


Link to post
Share on other sites

Guest Spartacus

It's not a hardware issue.

 

Try setting the ActiveArmor default to "notoffloadable" rather than "offloadable".

 

Either that or uninstall the Nvidia network junk altogether.

Share this post


Link to post
Share on other sites

That is your problem. And if you downloaded anything (did you?) and tried to install say quicktime you would realize that all downloads are corrupt. Its the ActiveArmor bs, and really you should just uninstall it, because if you're behind a router (i don't remember if you are or not) you should be okay, as you probably well know...

Share this post


Link to post
Share on other sites

Sorry, but that's not it. I never installed any of the nVidia firewall stuff. I'm behind a Cisco 1721 router, so it's not needed.

 

And to make things more fun, if I do turn off Segmentation Offload in the nForce Vitesse controller, that NIC totally stops working. Default route vanishes from "route print," and it doesn't show up in the "ipconfig /all" output.

 

Only way to get it back was to uninstall it, (reboot optional), rescan for hardware and let XP pick it up again.

 

It's not ActiveArmor.

Share this post


Link to post
Share on other sites

It's not a hardware issue.

It could be a hardware issue....? Seems as it only happens on 1 out of 2 network ports - both on the same machine.

I would suggest getting someone with a similar set up to test the issue failing that get the MoBo RMA'd.

 

Also - how are you negotiating the network speed? Is your network switch and onboard LAN both auto-negotiate? I've had issues of packets dropping where the the negotiation is interfering.

Try checking the other network devices and set your NVLAN to the same settings.

Share this post


Link to post
Share on other sites

Also - how are you negotiating the network speed? Is your network switch and onboard LAN both auto-negotiate? I've had issues of packets dropping where the the negotiation is interfering.

Try checking the other network devices and set your NVLAN to the same settings.

 

I've gone through all the settings for speed/duplex, including 10 Mbit/Half duplex. Fails no matter what it's set to. That's the first thing I checked. The fact that the first frame isn't getting to the destination (0 hops through switch) but the rest do is just freaky, and no other protocols (that I can tell) seem to fail -- FTP, BitTorrent, SMB/CIFS, etc.

 

In all the years (going on 15) of doing IT and tech support, this is one of the strangest things I've seen. The first packet literally just didn't get sent.

 

I guess I'll clear the CMOS and test whether it's something there, but that's a giant leap, given that this box hasn't been rebooted (other than to troubleshoot this problem once it started) since April. In other words, no changes were made, it just started failing.

Share this post


Link to post
Share on other sites

Got a spare 100Mb card or two lying around? Try a 100Mb card at one end, then the other, then at both.

 

The server wouldn't happen to have a SATA drive on an NForce/Silicon Image chipset, would it?

 

 

I'm out of GigE NICs at the moment, but an Intel Pro/1000 GT is on the way from Intel ($35 shipped -- look for their Evaluation program!!). I gave away all my 10/100s. Once the Pro/1000 arrives, I'll slap that in and move off the onboard permanently (the 3Com/Marvell Yukon just can't hold a candle to an Intel... and all my other NICs are Intel Pro/1000 MT, including the one in the mail server).

 

The server is a P3/550 running Win2K Server, with a 4GB SCSI boot disk and a 4x160GB RAID5 array on an Adaptec 2400A RAID controller. No SATA on the server end. (And yes, the server is aging -- was built w/ RAID5 back when RAID5 couldn't be had onboard...)

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...