Jump to content

Memtest stable, but unstable with anything else


Recommended Posts

Status so far: The first stick ran memtest overnight (50 passes) without errors, at Sharp's settings. The second stick is running now.

 

 

@ExRoadie

 

Thanks, but I am aware of that: that's why I Ghosted the original Windows installation to DVD, and am running Knoppix and other stuff from CD. That's one thing you learn after building the first few PCs :rolleyes:

 

I did start on the "reinstall Windows path" - to rule out that the Ghost image contained something that had been trashed - but then ran into the same problem that other forum members have seen when trying to install Windows. So I chose to use the Knoppix live CD to perform an additional level of stability testing (after memtest and/or the Windows memory diagnostics program), before risking a dodgy Windows installation.

Share this post


Link to post
Share on other sites

  • Replies 30
  • Created
  • Last Reply

Top Posters In This Topic

- Sharp:

Yes, I have tested them individually, but it's a while ago now. But I did a successful memtest om both sticks (dual channel) only a couple of days ago, and I would expect both sticks to be OK. Do you have any experience of RAM sticks passing memtest together, but failing individually? It wouldn't seem logical, but reading these forums it seems that logic sometimes has its limitations :) , at least at the level we puny mortals are able to apply it.

 

 

-CsA-TAZ:

The SPD isn't quite correct: recently, Crucial's webpage has stated the rating of the Ballistix as 2-2-2-6. I'm pretty sure I've testing the timings you've proposed somewhere along the way, but I'll give them a try anyhow.

 

Thanks to you both for your input.

Yes they do but that is not the SPD default values to find that out use CPUZ and it should tell you what timmings for what speeds, 200 Mhz or 250 Mhz 300 Mhz,{ if you were using DDR600}

I dont think the problem is your memory but @ the same time its impossible to guess what is, even when you start from scratch, but any way good luck!

Share this post


Link to post
Share on other sites

@mc123

 

Using a Ghost image is OK if you've been able to qualify it. The problem most forum member run into is setting up the OS in an overclocked rig. Using a Ghost image while overclocked is just as iffy.

 

From experience... Always use stock settings for an OS install or Image rebuild.

Share this post


Link to post
Share on other sites

@CsA-TAZ

 

Thanks for wishing me luck, I think I'll need it! By the way, I've never seen it mentioned in these forums but memtest86 v1.60 (and possibly earlier versions too) can also display detailed RAM timings by selecting "Advanced Options" and then "A64 options" (I may have the titles wrong but the menu selections are 9 and 5, respectively).

 

 

@ExRoadie

I agree. But right now, even though memtest and other low-level stuff runs fine, I can't even find stock settings that are stable in Knoppix (let alone Windows). My system seems stable with the CPU at stock, and a single stick of RAM at 100 MHz (1:2 divider) - I haven't done any long-term testing with that configuration - but such a low-performance setup isn't very satisfactory as a long-term solution. I would be rather happier if I could just get the system stable with dual-channel RAM and CPU both at stock, and I'd work from there.

Share this post


Link to post
Share on other sites

All right, the second stick of RAM got through 57 passes of memtest without errors. So both sticks still memtest OK, individually.

 

Back to dual channel, then. I put the first stick into slot 4 and restarted memtest. It failed in test #8 after a few passes - this can usually be cured (on my rig) by raising Vdimm a tad, but I couldn't enter the BIOS setup - it would either freeze in the first screen as I've described earlier or just show a blinking cursor instead of the blue setup screen. Dual channel is becoming a major pain :(

 

So I removed the 2nd stick again, got into the BIOS setup without mishap, raised the Vdimm by 0.1V to 2.8V, saved, powered down, put the 2nd stick back into slot 4 and powered up. Memtest is now chugging along (for now, anyway).

 

I have to go away on business for a couple of days (I'll leave memtest running in the meantime), but I'll post an update when I get back.

 

Watch this space! (If you're at all interested ;) )

Share this post


Link to post
Share on other sites

It's good to know I'm not alone on this, with problems like these it's easy to feel lonely :)

 

I got back from my trip to find memtest all screwed up: just a blue screen with a blinking yellow block - vaguely reminiscent of memtest's screen, but no characters, and it looks af if the screen resolution had been set to 320x200 or maybe lower. There's no way of knowing whether it crashed after 2 hours or 2 days.

 

So I rebooted, and memtest ran 34 passes without error. Then, for better test coverage (memtest reserves an area of RAM for its own use), I swapped the RAM sticks (from slot 2 to slot 4, and vice-versa) and ran 44 passes without error.

 

(Call me paranoid, but I don't consider RAM "memtest stable" unless it can make at least 32 passes without errors.)

 

Up till this point, I still had the both the Seasonic and the Fortron attached; but having the Fortron didn't seems to contribute (or detract) from stability, so Í removed it and left the Seasonic to power everything, as originally.

 

More of the paranoid stuff: Last night the RAM ran 300 passes of Windows Memory Diagnostic's standard test, and 3 passes of the extended tests, all without errors.

 

After having read favourable mentions of GoldMemory, I started running that this morning (quick test, I'm still waiting for the registration to complete). 3 passes without errors so far.

 

All this has been with the settings kindly proposed by Sharp (see post no. 4 in this thread), except that Vdimm has been raised to 2.8V.

 

If GoldMemory hasn't detected errors by this evening, I'll have to conclude that the RAM (and the PSU) are OK, and turn my attention to the board and CPU. Any ideas on how to isolate the problem further would be welcome.

 

I'm beginning to fear the worst: that all my components are error-free individually, but can't be combined into a stable system. :sad:

Share this post


Link to post
Share on other sites

Hello Sharp,

 

Yes, I've been using your settings since you proposed them (but had to increase Vdimm to 2.8V to get RAM to run at all in dual channel mode).

 

As for heat: When I tried to find the max. OC for the CPU, I saved MBM5's logs (at least those where the system had run stably for more than a few minutes) for future reference. I've had a quick look through these, and the following temps are the absolute maximum recorded, each from a different test:

CPU: 45 deg C

PWMIC: 74 deg C

NF4: 52 deg C

HDD: 32 deg C

Typically, though, the max temps at the settings in my sig would be CPU: 45, PWMIC: 65, NF4: 50, HD:30, give or take a degree. And this is with the LDT and chipset voltages maxed out.

 

I achieved the PWMIC temp of 74 degrees at the "mad overclock" settings of 253x11 @ 1.695V (nominal Vcore, i.e. set in the BIOS, not reported). The system actually ran stably with 2 instances of Prime95 or SP2004 (I don't remember which) for 8 1/2 hours, when I stopped them. But I felt that the PWMIC temp was higher than I liked, and that I'd rather run at at lower Vcore (although at 43 deg C the CPU temp was fine). The single stick of RAM was on the 1:2 divider.

 

For a laugh, though, I did experiment putting an 80 mm fan over the PWMICs: that lowered the temps to 65 deg C or so (from 74 deg), but since I was going to a lower Vcore anyway the fan never became a fixture. When I started finding the max. RAM OC I put a 120m fan over the Ballistix instead (just in case), and it's been there since.

 

Judging from what other have written, in this forum for example, I guess that the "non-mad-overclocking" temps are pretty much near-average, thanks to water cooling and my plethora of fans:

* 1x120mm rear exhaust

* 1x80mm top exhaust

* 2x120mm front intake (on the two lower drive cages)

* 2x120mm, sucking in through the radiator at the bottom of the case (and blowing on the X800XL)

* 1x120mm blowing on the Ballistix (mounted using a slot cover, a great idea I came across on one of the forums here - I don't remember the "inventor", but thanks to whoever it was!)

* Finally, a VF700-Cu on the X800XL (and the stock chipset fan on the NF4).

 

It's not as noisy as you'd think :)

It's a positive-pressure case, all right, but the Stacker's good ventilation helps the air escape, and thus aids airflow.

 

Update: 73% of pass 7 of GoldMemory complete, no errors

Share this post


Link to post
Share on other sites

My GoldMemory registration came through at last, by which time my system had gotten through 19 passes of the GoldMemory Quick test without errors. Armed with the registered version 6.92 of GoldMemory, I rebooted and fired up the “Thorough” test.

 

The system hung three quarters through the 4th pass (corresponding to about 10-12 hours’ testing) :(. Perplexed, I restarted GoldMemory; this time, it reported errors – thousands of them – almost immediately, from test no. 2 (of 711) onwards. Really strange, since nothing in the system had been changed. The only thing that I can think of having changed was that we had opened the door to the patio, allowing cool air into the house. The system being tested is about four feet from the door, so its temperature may have dropped a tad.

 

The errors reported by GoldMemory (still “Thorough” testing) were located at various memory locations; sometimes the errors started in test no. 2, at other times in test no. 5. All errors seemed to be located in the upper 16 bits of the data word, but otherwise at seemingly random bit positions. I tried swapping the RAM sticks around, without any apparent change. I then removed RAM Stick B (I marked the sticks "A" and "B" a while ago, to help keep track of what’s what) and no more errors occurred. I replaced Stick A with Stick B, which immediately resulted in errors. I swapped the RAM sticks several times, with Stick B failing consistently with hundreds or thousands of errors no later than test no. 5 each time, and Stick A not reporting errors.

 

I ran memtest86 on Stick B, and it failed quickly. It had passed previously (see my earlier posts).

 

To rule out a motherboard error I ran GoldMemory "Thorough" testing on Stick A, and got 10 passes without errors (this took almost 26 hours). Now tonight I replaced Stick A with Stick B, expecting a flood of errors again. But – no errors!!! :confused:

 

I opened the patio door again, let the system cool down, and restarted the test – no errors on Stick B. :confused: :confused:

Borrowed the wife’s hair dryer, heated up the motherboard and RAM, and restarted the test – no errors. :confused: :confused: :confused:

 

IF I ABSOLUTELY MUST HAVE ERRORS, I WANT THEM CONSISTENTLY, DAMMIT!!! :mad::mad::mad:

 

I’ve worked with computers and other electronics professionally, and in a technical capacity, for over 20 years, and I thought I’d seen it all: PCB traces with hairline fractures, mismatched thermal characteristics, cache defects, bad decoupling, ground loops, crosstalk, short circuits (including some caused by mouse turds - don’t ask), you name it. But this I cannot explain.

 

Usually, electronics either work, or they don’t. In my experience, they rarely (if ever) work, then don’t work, then work again, given identical conditions. But this is what seems to be happening here.

 

What the heck is going on?!

 

I’ve gone back to dual channel (Stick A in slot 2, Stick B in slot 4) and restarted GoldMemory’s Thorough test. I’ll let it run for at least 24 hours. I’m not sure I’ll be able to logically conclude anything on the basis of that, but I’m running out of ideas what else to do.

 

Again, any ideas on how to isolate the problem are welcome.

Share this post


Link to post
Share on other sites

Ah! The joys of living on the edge!

 

I would clean the edges of the RAM with 90% or higher isopropyl alcohol and a lint free cloth.

 

My GoldMemory registration came through at last, by which time my system had gotten through 19 passes of the GoldMemory Quick test without errors. Armed with the registered version 6.92 of GoldMemory, I rebooted and fired up the “Thorough” test.

 

The system hung three quarters through the 4th pass (corresponding to about 10-12 hours’ testing) :(. Perplexed, I restarted GoldMemory; this time, it reported errors – thousands of them – almost immediately, from test no. 2 (of 711) onwards. Really strange, since nothing in the system had been changed. The only thing that I can think of having changed was that we had opened the door to the patio, allowing cool air into the house. The system being tested is about four feet from the door, so its temperature may have dropped a tad.

 

The errors reported by GoldMemory (still “Thorough” testing) were located at various memory locations; sometimes the errors started in test no. 2, at other times in test no. 5. All errors seemed to be located in the upper 16 bits of the data word, but otherwise at seemingly random bit positions. I tried swapping the RAM sticks around, without any apparent change. I then removed RAM Stick B (I marked the sticks "A" and "B" a while ago, to help keep track of what’s what) and no more errors occurred. I replaced Stick A with Stick B, which immediately resulted in errors. I swapped the RAM sticks several times, with Stick B failing consistently with hundreds or thousands of errors no later than test no. 5 each time, and Stick A not reporting errors.

 

I ran memtest86 on Stick B, and it failed quickly. It had passed previously (see my earlier posts).

 

To rule out a motherboard error I ran GoldMemory "Thorough" testing on Stick A, and got 10 passes without errors (this took almost 26 hours). Now tonight I replaced Stick A with Stick B, expecting a flood of errors again. But – no errors!!! :confused:

 

I opened the patio door again, let the system cool down, and restarted the test – no errors on Stick B. :confused: :confused:

Borrowed the wife’s hair dryer, heated up the motherboard and RAM, and restarted the test – no errors. :confused: :confused: :confused:

 

IF I ABSOLUTELY MUST HAVE ERRORS, I WANT THEM CONSISTENTLY, DAMMIT!!! :mad::mad::mad:

 

I’ve worked with computers and other electronics professionally, and in a technical capacity, for over 20 years, and I thought I’d seen it all: PCB traces with hairline fractures, mismatched thermal characteristics, cache defects, bad decoupling, ground loops, crosstalk, short circuits (including some caused by mouse turds - don’t ask), you name it. But this I cannot explain.

 

Usually, electronics either work, or they don’t. In my experience, they rarely (if ever) work, then don’t work, then work again, given identical conditions. But this is what seems to be happening here.

 

What the heck is going on?!

 

I’ve gone back to dual channel (Stick A in slot 2, Stick B in slot 4) and restarted GoldMemory’s Thorough test. I’ll let it run for at least 24 hours. I’m not sure I’ll be able to logically conclude anything on the basis of that, but I’m running out of ideas what else to do.

 

Again, any ideas on how to isolate the problem are welcome.

Share this post


Link to post
Share on other sites

Ah! The joys of living on the edge!

 

Lol, not much living, more like hanging on to the edge by the fingernails!

 

I would clean the edges of the RAM with 90% or higher isopropyl alcohol and a lint free cloth.

 

Good idea, ExRoadie, many thanks. I'll do that when I get home. (Ahem, I'm at work now.) So far, I've been the one needing the alcohol most :). (Not isopropyl, though, in case you were wondering.) Should've thought of that myself, but it gets so that you can't see the wood for trees. So it's great that there are some very helpful and knowledgeable people in these forums.

 

The dual channel test I had going crashed at some point, with a colourful pattern in diagonal stripes across the screen and blinking blocks - probably the GoldMemory version of the screwed-up display I saw earlier with memtest86.

 

Last night I tested the RAM sticks again individually, and Stick B persisted with its "now you see me, now you don't" errors. But tonight I'll buy some isopropyl alcohol at the chemist's (drugstore, to North Americans), give both sticks a good swab and see if that helps.

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×
×
  • Create New...