Jump to content

mc123

Members
  • Content Count

    17
  • Joined

  • Last visited

About mc123

  • Rank
    New Member
  1. I described my problems to the retailer, who agreed that it sounded like a defect in one of the Ballistix sticks, so I successfully RMA'ed the 2x512MB PC3200 Ballistix kit. After all the hassle I've had, I was feeling rather sorry for myself and requested an upgrade to 2x1GB PC4000 Ballistix to cheer myself up. I'll have to pay the difference in price, of course, but I guess my mental well-being is worth it I should be getting the new Ballistix kit within a day or two, and I'll test the living daylights out of it.
  2. OK, following ExRoadie's suggestion, I used an abrasive to gently clean the contacts on the edge connectors of both sticks of RAM (each on both sides), and then cleaned them again with isopropyl alcohol. Alas, it didn't help reduce the goofiness of Stick B. Ah well, it was worth a try. For one last check before the final decision to RMA the RAM, I tested each stick in the Asus K8N-based machine that I originally used to validate it, before it ever got near the Ultra-D. Stick B failed miserably, causing BIOS boot failures similar to those I've seen on the Ultra-D with that stick, and often caused the K8N to report that the BIOS checksum was wrong (it probably uses the RAM to calculate and/or store the checksum). In the rare cases I got the K8N to boot memtest86+ from CD, thousands of errors were reported straight away. With Stick A, on the other hand, I didn't experience errors on the K8N (at least not with the relatively limited testing I performed). Stick A ran stably all night on the Ultra-D, at stock CPU/RAM settings and with two instances of mprime, and was still going strong when I left for work. My general impression is that Stick B now fails more consistently, and more severely. The intermittent failures I experienced in the early phases of testing, with long periods of working OK in-between, rather confused the picture and threw me off. Damn right, ExRoadie! Sad but true. Well, I'll just finish off testing stability with Stick A, and if all goes well I'll get Windows installed and then RMA the RAM. I'm pretty sure I have enough evidence now to successfully pursue my RMA claim. As before, I'll give the new RAM a jolly good beating with memtest86+ - and not least GoldMemory - before accepting it as stable. I'll post my findings - good or bad - with the new RAM here.
  3. Sorry, I should have made it clear that the temps I listed were the highest recorded by MBM5 when I was trying to find the max. stable CPU overclock, with two instances of SP2004/Prime95 running over a period of several hours. At idle (but still with OC'ed settings) the average/max temps reported by MBM 5 are typically (degrees C): CPU: 32 avg, 36 max PWMIC: 36 avg, 40 max NF4: 48-49, pretty constant HDD: 29, pretty constant Ambient temp is 22-24 deg C. But yes, if the PWMIC had been 74 deg C at idle, it would have been much too high, and could well have gotten close to its max. rating of 110 deg C at load. Then I would have earned the title of DFI Global Warmer - as it is, I'm still just a rookie
  4. That's the first time somebody has advised me to use an eraser to clear the RAM I'll have to dig around to find a suitable eraser - these days, erasers are usually of the white, rubbery type, and 2HBs are near-extinct - but I'll certainly give it a try, without making too much gold dust. Thanks for the tip, ExRoadie.
  5. OK, now I've decided to RMA the RAM. Although cleaning the edge connectors was an excellent idea, it didn't help; but now both GoldMemory and memtest86 often fail on Stick B. Not quite consistently, but enough to have allowed me to collect enough evidence to show that Stick B is defective. Stick A is running without problems at 2-2-2-6 1T: I'm actually writing this in Knoppix, with two instances of mprime (the Linux equivalent of Prime95) running at the same time. I haven't been able to reach this level of stability with Stick B in the system. It rather bugs me that Stick B doesn't fail consistently, but I can't think of anything else to do to support or refute the contention that Stick B actually is bad. If the system stays mprime stable for 24 hours with Stick A alone, I guess I'll reinstall Windows and all the apps later in the week, Ghost the partition to a DVD (again), just in case the partition i First Ghosted had been trashed, and RMA both sticks of RAM next week (if I RMA the RAM tomorrow, I most likely won't get the replacement RAM before early next week at the earliest, anyway). This will be the second time I've RMA'ed Ballistix because of a defective module - nevertheless, the Ballistix modules that have worked have been excellent, and I'm willing to give them another chance. But one more strike, and they'll be out for good and instead I'll probably go for Mushkin Redline or some flavour of OCZ. I'll let you know how things pan out with the replacement RAM when I've tested them. And many thanks to Sharp, ExRoadie and -CsA-TAZ for their advice and encouragement, it is very much appreciated. So I guess it's so long for now...
  6. Lol, not much living, more like hanging on to the edge by the fingernails! Good idea, ExRoadie, many thanks. I'll do that when I get home. (Ahem, I'm at work now.) So far, I've been the one needing the alcohol most . (Not isopropyl, though, in case you were wondering.) Should've thought of that myself, but it gets so that you can't see the wood for trees. So it's great that there are some very helpful and knowledgeable people in these forums. The dual channel test I had going crashed at some point, with a colourful pattern in diagonal stripes across the screen and blinking blocks - probably the GoldMemory version of the screwed-up display I saw earlier with memtest86. Last night I tested the RAM sticks again individually, and Stick B persisted with its "now you see me, now you don't" errors. But tonight I'll buy some isopropyl alcohol at the chemist's (drugstore, to North Americans), give both sticks a good swab and see if that helps.
  7. My GoldMemory registration came through at last, by which time my system had gotten through 19 passes of the GoldMemory Quick test without errors. Armed with the registered version 6.92 of GoldMemory, I rebooted and fired up the “Thorough” test. The system hung three quarters through the 4th pass (corresponding to about 10-12 hours’ testing) . Perplexed, I restarted GoldMemory; this time, it reported errors – thousands of them – almost immediately, from test no. 2 (of 711) onwards. Really strange, since nothing in the system had been changed. The only thing that I can think of having changed was that we had opened the door to the patio, allowing cool air into the house. The system being tested is about four feet from the door, so its temperature may have dropped a tad. The errors reported by GoldMemory (still “Thorough” testing) were located at various memory locations; sometimes the errors started in test no. 2, at other times in test no. 5. All errors seemed to be located in the upper 16 bits of the data word, but otherwise at seemingly random bit positions. I tried swapping the RAM sticks around, without any apparent change. I then removed RAM Stick B (I marked the sticks "A" and "B" a while ago, to help keep track of what’s what) and no more errors occurred. I replaced Stick A with Stick B, which immediately resulted in errors. I swapped the RAM sticks several times, with Stick B failing consistently with hundreds or thousands of errors no later than test no. 5 each time, and Stick A not reporting errors. I ran memtest86 on Stick B, and it failed quickly. It had passed previously (see my earlier posts). To rule out a motherboard error I ran GoldMemory "Thorough" testing on Stick A, and got 10 passes without errors (this took almost 26 hours). Now tonight I replaced Stick A with Stick B, expecting a flood of errors again. But – no errors!!! :confused: I opened the patio door again, let the system cool down, and restarted the test – no errors on Stick B. :confused: :confused: Borrowed the wife’s hair dryer, heated up the motherboard and RAM, and restarted the test – no errors. :confused: :confused: :confused: IF I ABSOLUTELY MUST HAVE ERRORS, I WANT THEM CONSISTENTLY, DAMMIT!!! :mad::mad::mad: I’ve worked with computers and other electronics professionally, and in a technical capacity, for over 20 years, and I thought I’d seen it all: PCB traces with hairline fractures, mismatched thermal characteristics, cache defects, bad decoupling, ground loops, crosstalk, short circuits (including some caused by mouse turds - don’t ask), you name it. But this I cannot explain. Usually, electronics either work, or they don’t. In my experience, they rarely (if ever) work, then don’t work, then work again, given identical conditions. But this is what seems to be happening here. What the heck is going on?! I’ve gone back to dual channel (Stick A in slot 2, Stick B in slot 4) and restarted GoldMemory’s Thorough test. I’ll let it run for at least 24 hours. I’m not sure I’ll be able to logically conclude anything on the basis of that, but I’m running out of ideas what else to do. Again, any ideas on how to isolate the problem are welcome.
  8. Hello Sharp, Yes, I've been using your settings since you proposed them (but had to increase Vdimm to 2.8V to get RAM to run at all in dual channel mode). As for heat: When I tried to find the max. OC for the CPU, I saved MBM5's logs (at least those where the system had run stably for more than a few minutes) for future reference. I've had a quick look through these, and the following temps are the absolute maximum recorded, each from a different test: CPU: 45 deg C PWMIC: 74 deg C NF4: 52 deg C HDD: 32 deg C Typically, though, the max temps at the settings in my sig would be CPU: 45, PWMIC: 65, NF4: 50, HD:30, give or take a degree. And this is with the LDT and chipset voltages maxed out. I achieved the PWMIC temp of 74 degrees at the "mad overclock" settings of 253x11 @ 1.695V (nominal Vcore, i.e. set in the BIOS, not reported). The system actually ran stably with 2 instances of Prime95 or SP2004 (I don't remember which) for 8 1/2 hours, when I stopped them. But I felt that the PWMIC temp was higher than I liked, and that I'd rather run at at lower Vcore (although at 43 deg C the CPU temp was fine). The single stick of RAM was on the 1:2 divider. For a laugh, though, I did experiment putting an 80 mm fan over the PWMICs: that lowered the temps to 65 deg C or so (from 74 deg), but since I was going to a lower Vcore anyway the fan never became a fixture. When I started finding the max. RAM OC I put a 120m fan over the Ballistix instead (just in case), and it's been there since. Judging from what other have written, in this forum for example, I guess that the "non-mad-overclocking" temps are pretty much near-average, thanks to water cooling and my plethora of fans: * 1x120mm rear exhaust * 1x80mm top exhaust * 2x120mm front intake (on the two lower drive cages) * 2x120mm, sucking in through the radiator at the bottom of the case (and blowing on the X800XL) * 1x120mm blowing on the Ballistix (mounted using a slot cover, a great idea I came across on one of the forums here - I don't remember the "inventor", but thanks to whoever it was!) * Finally, a VF700-Cu on the X800XL (and the stock chipset fan on the NF4). It's not as noisy as you'd think It's a positive-pressure case, all right, but the Stacker's good ventilation helps the air escape, and thus aids airflow. Update: 73% of pass 7 of GoldMemory complete, no errors
  9. It's good to know I'm not alone on this, with problems like these it's easy to feel lonely I got back from my trip to find memtest all screwed up: just a blue screen with a blinking yellow block - vaguely reminiscent of memtest's screen, but no characters, and it looks af if the screen resolution had been set to 320x200 or maybe lower. There's no way of knowing whether it crashed after 2 hours or 2 days. So I rebooted, and memtest ran 34 passes without error. Then, for better test coverage (memtest reserves an area of RAM for its own use), I swapped the RAM sticks (from slot 2 to slot 4, and vice-versa) and ran 44 passes without error. (Call me paranoid, but I don't consider RAM "memtest stable" unless it can make at least 32 passes without errors.) Up till this point, I still had the both the Seasonic and the Fortron attached; but having the Fortron didn't seems to contribute (or detract) from stability, so Í removed it and left the Seasonic to power everything, as originally. More of the paranoid stuff: Last night the RAM ran 300 passes of Windows Memory Diagnostic's standard test, and 3 passes of the extended tests, all without errors. After having read favourable mentions of GoldMemory, I started running that this morning (quick test, I'm still waiting for the registration to complete). 3 passes without errors so far. All this has been with the settings kindly proposed by Sharp (see post no. 4 in this thread), except that Vdimm has been raised to 2.8V. If GoldMemory hasn't detected errors by this evening, I'll have to conclude that the RAM (and the PSU) are OK, and turn my attention to the board and CPU. Any ideas on how to isolate the problem further would be welcome. I'm beginning to fear the worst: that all my components are error-free individually, but can't be combined into a stable system. :sad:
  10. All right, the second stick of RAM got through 57 passes of memtest without errors. So both sticks still memtest OK, individually. Back to dual channel, then. I put the first stick into slot 4 and restarted memtest. It failed in test #8 after a few passes - this can usually be cured (on my rig) by raising Vdimm a tad, but I couldn't enter the BIOS setup - it would either freeze in the first screen as I've described earlier or just show a blinking cursor instead of the blue setup screen. Dual channel is becoming a major pain So I removed the 2nd stick again, got into the BIOS setup without mishap, raised the Vdimm by 0.1V to 2.8V, saved, powered down, put the 2nd stick back into slot 4 and powered up. Memtest is now chugging along (for now, anyway). I have to go away on business for a couple of days (I'll leave memtest running in the meantime), but I'll post an update when I get back. Watch this space! (If you're at all interested )
  11. @CsA-TAZ Thanks for wishing me luck, I think I'll need it! By the way, I've never seen it mentioned in these forums but memtest86 v1.60 (and possibly earlier versions too) can also display detailed RAM timings by selecting "Advanced Options" and then "A64 options" (I may have the titles wrong but the menu selections are 9 and 5, respectively). @ExRoadie I agree. But right now, even though memtest and other low-level stuff runs fine, I can't even find stock settings that are stable in Knoppix (let alone Windows). My system seems stable with the CPU at stock, and a single stick of RAM at 100 MHz (1:2 divider) - I haven't done any long-term testing with that configuration - but such a low-performance setup isn't very satisfactory as a long-term solution. I would be rather happier if I could just get the system stable with dual-channel RAM and CPU both at stock, and I'd work from there.
  12. Status so far: The first stick ran memtest overnight (50 passes) without errors, at Sharp's settings. The second stick is running now. @ExRoadie Thanks, but I am aware of that: that's why I Ghosted the original Windows installation to DVD, and am running Knoppix and other stuff from CD. That's one thing you learn after building the first few PCs I did start on the "reinstall Windows path" - to rule out that the Ghost image contained something that had been trashed - but then ran into the same problem that other forum members have seen when trying to install Windows. So I chose to use the Knoppix live CD to perform an additional level of stability testing (after memtest and/or the Windows memory diagnostics program), before risking a dodgy Windows installation.
  13. @ExRoadie I couldn't agree with you more - I try not to assume anything. One stick is being memtested (at Sharp's suggested timings) as I write this; 18 passes so far, no errors (4 1/2 hours' testing). The other will go in tomorrow morning. And the guide was where I started - please don't ask me to start from scratch! (Breaks down in tears) @kuniva For the reasons I've stated in my first post, RMA'ing will be my last resort. Without hard evidence that a particular component is actually defective, chances are the retailer will simply return it to me (and charge me for his time used to check it, and for packing & postage).
  14. - Sharp: Yes, I have tested them individually, but it's a while ago now. But I did a successful memtest om both sticks (dual channel) only a couple of days ago, and I would expect both sticks to be OK. Do you have any experience of RAM sticks passing memtest together, but failing individually? It wouldn't seem logical, but reading these forums it seems that logic sometimes has its limitations , at least at the level we puny mortals are able to apply it. -CsA-TAZ: The SPD isn't quite correct: recently, Crucial's webpage has stated the rating of the Ballistix as 2-2-2-6. I'm pretty sure I've testing the timings you've proposed somewhere along the way, but I'll give them a try anyhow. Thanks to you both for your input.
  15. OK, the continued story: I left the CMOS to clear for 10-15 minutes (having done all the power, battery and jumper stuff), but when I tried to start up again I had the previously-seen BIOS-booting problem again. This time I noted that the two of the diagnostics LEDs were on, indicating that the graphics adapter hadn't been detected (certainly, the LCD screen detected no video signal). For kicks, I powered down, removed the second stick of RAM (leaving one stick in slot 2), powered up and this time 3 LEDs were on, indicating RAM not detected. Odd. I've been through this scenario before, but Sharp's comment about the PSU started me thinking, and I recalled that the Stacker comes with what you might call a jumpstart extension cable that allows two PSUs to be started at the same time. So I got that installed along with a second PSU - an old 300W, 20-pin Fortron - so that the Seasonic powered the Ultra-D and a single optical drive, and and the Fortron powered the fans, HDD, water pump etc. This setup didn't alleviate the startup problem either, so I left the CMOS to clear again, this time all night (I needed some sleep anyhow). This morning, I still had the startup problem, but eventually defeated it, loaded optimised defaults, rebooted, and entered and saved the settings proposed by Sharp. Unfortunately, these settings weren't stable either: - The BIOS sometimes hangs in the first screen, just after displaying the "Main Processor" line (before displaying the memory size); - Knoppix hangs while it's loading files, or shortly after the graphical desktop is started; - Windows get stuck in the boot-selection screen, or in the copyright screen (with the moving bar thingy), or reboots from there; or BSODs (PAGE_FAULT_IN_NONPAGED_AREA) (All this is still with only one stick of RAM.) This is pretty much par for the course. Thinking back, it's my overall impression that I've had more startup problems (BIOS hanging in the first screen or not starting at all, with LEDs indicating that RAM or graphics haven't been detected) with the CPU at stock or near-stock settings than at the 304x9 CPU overclock (usually with RAM at 195 or 210 MHz, depending on the divider). Hang on a sec... (...goes off and reboots the PC several times...) Hm, hanging in the BIOS seems to be much more prevalent at these stock settings - say, one BIOS hang in 3-4 reboots - than at the 304x9 overclock that I've been using for much of the past several weeks, which would hang in the BIOS once for maybe every 50 reboots (give or take a couple of dozen either way), at least an an order of magnitude less often. This doesn't necessarily have to have anything to do with the settings themselves: I shifted from OC'ed to stock settings because of the problems I was having - if the problems now seem worse, it could (imaginably) be because I have one or more components that are gradually degrading, rather than because the stock settings are bad as such. I'll have to do some more experimenting to confirm or reject this hypothesis. Here's what I'll try: a) Go back to single PSU (Seasonic) B) Go back to dual channel (no big hopes for improvement here, but you never know...) c) Try 304x9 CPU overclock, with RAM at 195 or 210 MHz, keeping the settings proposed by Sharp. I'll report my findings here. Sharp, thank you very much for your efforts so far. If you - or anyone else - have any more ideas, well, just keep 'em coming, they'll all be very much appreciated. Many thanks in advance.
×
×
  • Create New...