Jump to content

Help! Unstable Server!, more unstable at LOWER FSB!!


Recommended Posts

Hi,

 

My system has become incredibly unstable since I put together a server from the parts mentioned in my signature. I have put together a server, run on Debian Linux, with software RAID5. Everything seems to be running smoothly, until - about once a week (but it varies between 1 day and 1 month) the system crashes in the most terrible way: All possible errors pass the screen, disks may fall out of the array, etc, etc..

 

So what might be the problem? Bad disks? Bad cables? Bad controllers? Bad BIOS settings?

By now I have replaced a lot of parts by other parts, I have removed the whole array (by actually removing the power cables) from the motherboard, and many other options, but the problem still persists. My conclusion is that the problem must lie with the motherboard, the memory or the processor (or, MAYBE the PCI VGA card.. I don't know, but it is doing nothing most of the time).

 

The following characteristics stand out:

- The errors seem not to be correlated with how busy the system is: it might hang on Idle just as well as on full load;

- None of the parts is hot, in any way;

- With FSB settings at standard 200 MHz, the system hangs within a minute;

- With FSB settings at 220 MHz, the system usually hangs with a day;

- With FSB settings at 250 MHz, the system seems the most stable: it hangs about once a week;

- With FSB settings at 260 MHz, the system hangs about once a day;

- MemTest runs smoothly at any of the above FSB's (the memory is rated at 275 MHz, so one should expect this);

 

Current Genie Bios Settings are the following:

 

FSB Bus Freqency: 250 MHz

FSB/LDT Frequency Ratio: x4.0

LDT Bus Transfer Width: ↓16 ↑16

CPU/FSB Frequency Ratio: x10.0

PCI eXpress Frequency: 100 MHz

K8 Cool'n'Quiet: Disabled

CPU VID Startup Value: 1,475v

CPU VID Control: 1,475v

CPU VID Special Control: Above VID * 104%

LDT Voltage Control: 1,30v

Chip Set Voltage Control: 1,60v

DRAM Voltage Control: 2,80v

DRAM+.03v if not 3,2v Disabled

 

Current DRAM Configuration:

 

DRAM Frequency Set (MHz): 200=RAM/FSB: 01/01

CPC: Enabled

Tcl: 3.0

Trcd: 04 Bus Clocks

Tras: 08 Bus Clocks

Trp: 04 Bus Clocks

Trc: 09 Bus Clocks

Trfc: 16 Bus Clocks

Trrd: 03 Bus Clocks

Twr: 03 Bus Clocks

Twtr: 02 Bus Clocks

Trwt: 03 Bus Clocks

Tref: 3120 Cycles

Odd Divisor Correct: Disabled

DRAM Bank Interleave: Enabled

 

DQS Skew Control: Increase Skew

DQS Skew Value: 0

DRAM Drive Strength: Level 6

DRAM Data Drive Strength:Level 4

Max Async Latency: 08.0 Nanoseconds

DRAM Response Time: Fast

Read Preamble Time: 06.0 Nanoseconds

IdleCycle Limit: 256 Cycles

Dynamic Counter: Disabled

R/W Queue Bypass: 16x

Bypass Max: 07x

32 Byte Granularity: Disabled (4 Bursts)

 

 

Well.. Thats about it.. Any help in solving this problem would be greatly appreciated!!! (Unstable servers are a classy lady..)

 

Thank you,

 

Regards,

 

Warner

Share this post


Link to post
Share on other sites

the words 'server' and 'overclocking' don't belong together, if you are really serious, if not, your rig.....

 

possible culprits, but not limited to,

 

Sata cabling, the newer cables have a slightly larger bump built into the backside of the plugin to assist with contact.

 

ram, even though it passes memtest, there can be problems.......

 

MB, could be a flakey one...

 

Vid card, i have had systems do strange things with a dicey vid, and it befuddled me as the errors seemed like something else was up.....

 

HD, could be a hidden problem, have you run a drive fitness test on all your drives? i bring this up becasue i had a problem last year with one of my spare (ide) storage drives running with a Sata Raid 0 OS setup, even though it had nothing to do with the raid array, the HD would just fritz the system once and awhile, once removed, things went back to normal......

 

bummer, chasing problems like this can be frustrating, it takes time to see if it is stable with each change, and it is like you have to have a dulplicate of every part to help chase the problem down....

 

i am sure others will chime in with ideas today....

 

laterz,

 

baldy

Share this post


Link to post
Share on other sites

Hey Baldy (and others), thanks for the reply!

 

>the words 'server' and 'overclocking' don't belong together, if you are really >serious

 

I know! :-( .. The thing is, that the moment I rigged this up to be a server (the MoBo, Mem and CPU were in my workstation before), I clocked it down to normal again, to make it as much a stable server as possible... BUT.. It then turned out that the whole thing wouldn't even boot, being at 'only' 200 MHz FSB.. That is why I returned to my proven (workstation) settings to get it going. While experimenting with different settings I found out that playing with FSB had a very significant effect on stability.. IN THE WRONG DIRECTION!

 

SATA Cabling seems ok (at least I found out that the errors keep occurring after disconnection every single one of them..) Also I replaced the PATA cabling, without results.

 

MB and Video card.. indeed.. there might be a problem with either one of them, or both. But to my knowledge no explanation for the "Stability-In-Opposite-Direction-Defying-The-Laws-Of-Physics" issue... Hope anyone comes up with a theory!

 

Finally, 'Boot' HD could indeed be a problem.. I should run extra tests on it, maybe.

 

Have a nice day!

 

Warner

Share this post


Link to post
Share on other sites

You could try running one of the Live Linux distributions to see if the HDD's are in fact causing a problem.

 

You would have a complete OS running from RAM and then you could see if it stays up longer.

 

I believe Linux is more thouough in it's use of RAM than Windows is and can show up memory problems more readily. (windows tends to use the lower memory addresses where as linux cycles through all of memory)

 

Try Ubuntu which is Debian based and quite fully featured. I use it to run BF2 and BF2142 servers for our LAN Parties.

Share this post


Link to post
Share on other sites

  • 2 weeks later...

So.. I have now found out that the RAM is the most probable culprit. And I was wondering, can anybody tell me if putting it in another slot would make a difference. Also I heard some rumors about how setting CPC to 2T (or disabled) instead of 1T (or enabled) might help. Would that result in a big performance loss? What is the difference between those two options?

 

Thanks!

 

Warner

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...