Jump to content
Sign in to follow this  
OzSnoal

Cold Boot? Memory Died? [email protected] = who you need to contact

Recommended Posts

@Oscar_Wu

 

How is RAM detected by the BIOS? It looks to me that this is the crux of the problem.

 

You are looking at the SPD (serial presence detect, a serial EEPROM that contains the details/timings/voltage/amount of RAM on the module)- since we know and you have seen for yourself that the RAM is working fine.

 

So, we can safely say the cause is elsewhere, perhaps in how you determine that RAM is present. Since the SPD is a chip all by itself, what will happen if this chip is not working correctly?

 

My Suggestions:

(1)Unsolder the SPD from a working module of the same make and brand and solder it onto the 'bad' module of the same make and brand and see the results from that? This will prove if it is the actual SPD chip that is the problem.

(2) You can modify the BIOS code to anything you like while testing :-). What happens if you hardcode (just for testing) the 'bad' DIMM correct settings in the BIOS code, so that you do not even try to access the SPD and just assume the correct values for that DIMM. Does it then pass all tests?

(3) Again, for debugging, you could save the values being sent by the SPD into a memory area to be read out later. In this way you could see if you are receiving corrupted data...

 

I use stick 1 for example , this stick require very high voltage to be work @ boot

 

I do a lot of try error and trace the bios , this is what I found ...

 

1 . SPD on it read the correct information as good stick

2 . The whole SPD memory detecting sequence is completed , I mean that the bios detect that there is memory presented in the memory bus , and complete setting all the timing parameter in K8 register

3 . After all that is completed , bios will start to do one thing first ... Check if the memory is able to pass the first 256K test of read/write , if it fail to do so , it will jump to the beep sequence , that's why you will hear BEEP , BEEP if there is no memory or the memory is bad ...

 

Let me list some code example to show how this work

 

After the SPD timing detection is completed , we will do something like this

 

=========================================================

xor eax,eax

xor dx,dx

Next_64K_Test:

mov es,dx ;segment to test

cld

 

;fill the first 32Kb memory with specific pattern = eax = 00000000h

 

mov cx,2000h ;32k to read pattern

xor di,di

rep stosd

 

;fill the second 32Kb memory with inversed pattern = eax = FFFFFFFFh

 

not eax

mov cx,2000H

rep stosd

 

;Check the first 32Kb memory contents

not eax ;get pattern to check

mov cx,2000h ;32k memory to read

xor di,di

rep scasd

jnz short Beep_Out ;error occured

 

;Do_next64k:

 

;Check the second 32Kb memory contents

not eax ;get pattern to check

mov cx,2000h ;32k memory to read

rep scasd

jnz short Beep_Out ;error occured

 

add dh,10H ;next segment

cmp dh,40H ;256K address ?

jne short Next_64K_Test

jmp short Ok_256K

 

Beep_Out:

=========================================================

We will fill in the first 32k memory of data pattern = 00000000h

and fill in the second 32k memory of data pattern = FFFFFFFFh

And read them back to check if the written data = data read back

And loop 3rd , 4th .... 32K memory test until the first 256K is completely verified ok

 

With the 1st stick which will fail to boot @ SPD timing @ 2.6V , it will fail in the 1st

32K memory read back test ...

 

The first 32K memory fill code

 

mov cx,2000h ;32k to read pattern

xor di,di

rep stosd

 

will repeat to write dword data to the es:[di] address of memory to all 0 , in the first 32k memory fill in , es= 0 , di will range from 0 to cx*4 - 1 =2000h*4 - 1= 7FFFh = 32767 , 0~32767 = 32768 = 32K ...

 

that means 0000:0000 ~ 0000:7FFFFh range of memory data will be written to all 0

...

 

The bios code will continue to fill the 2nd 32K memory from 0000:8000h to 0000:FFFFh to all 1 ...

 

After that is completed , we will read back the first 32k data and check if they are all 0 ... With stick 1 , the 0000:0006h always return 04h but not 00h as expected , so the bios code will jump to BEEP loop and stop booting ...

 

I guess there is one or two of the 16 chips in stick 1 is not working correctly @ 2.6V in the begining stage , let's why we write to all 0 to first 32K , but some bit is apparent read back as 1 ...

 

That's all I found here , I think the damaged stick is not all that 16 chips are damaged , maybe only 1 bit in one or two of them does not work as expected in the SPD timing @ 2.6V ... This will cause that bios detect the DRAM is BAD , and stop booting ...

 

Update :

 

New finding about stick 1 condition ... after 1 days of 3.2V(using 3.3V jump) , 250mhz 2-2-5-2 burn in ... Stick 1 now require 2.8V to work complete stable under SPD timing(3-3-8-3) ... It start to become worse than I first got it ...

Share this post


Link to post
Share on other sites

@Oskar

Thanks for the post.

 

Now I can see how you detect bad memory.

 

I have seen this behaviour during memtesting when I first got my sticks, certain single bits (was never more than one bit) got stuck. It was in the memtest routine that steps a single bit through the memory range. However fiddling with CMOS timings eventually got the error to go away in memtest. Tref was the main culprit and Max Async Latency I used to have at 6ns, now at 7ns.

 

Below is an important question to me:

Do you do this memory testing on 'soft' restarts as well? (I can soft restart as many times as I like) If you do memory testing on soft restarts, then the memory is getting 'jammed' into some bad state on powerup and in the case when it does not jam up, it works well (hence me being able to restart).

 

Could I ask you to try one more thing:

1 . SPD on it read the correct information as good stick

2 . The whole SPD memory detecting sequence is completed , I mean that the bios detect that there is memory presented in the memory bus , and complete setting all the timing parameter in K8 register

3 . Check if the memory is able to pass the first 256K test of read/write

 

First let me say that I cannot find flaw in the way you are detecting RAM and not what I am about to suggest is the correct way of doing things, but as we are testing things it is worthwhile to try 'hacks' to see if the memory can ever boot correctly.

a) When do you apply the CMOS settings to the RAM? Is it after your point '3' above? (I would think so as you are running off SPD settings for now)

B) My request:

As I have seen how fussy how my RAM is to timings and when the ram fails your read/write test, could you try and apply your CMOS timings and reapply the tests?

 

I put my extras in bold.

 

mov bx, 0 ; Flag indicating that SPD timings being used

RetryTestWithNewTimings:

xor eax,eax

xor dx,dx

Next_64K_Test:

mov es,dx ;segment to test

cld

 

;fill the first 32Kb memory with specific pattern = eax = 00000000h

 

mov cx,2000h ;32k to read pattern

xor di,di

rep stosd

 

;fill the second 32Kb memory with inversed pattern = eax = FFFFFFFFh

 

not eax

mov cx,2000H

rep stosd

 

;Check the first 32Kb memory contents

not eax ;get pattern to check

mov cx,2000h ;32k memory to read

xor di,di

rep scasd

jnz short Beep_Out ;error occured

 

;Do_next64k:

 

;Check the second 32Kb memory contents

not eax ;get pattern to check

mov cx,2000h ;32k memory to read

rep scasd

jnz short Beep_Out ;error occured

 

add dh,10H ;next segment

cmp dh,40H ;256K address ?

jne short Next_64K_Test

jmp short Ok_256K

 

Beep_Out:

mov ax, bx

jnz Beep_Out_After_CMOS_Timings

call to apply cmos timings (You could hardcode them instead of fetching from CMOS)

mov bx, 1 ; flag to indicate we are using CMOS timings

jmp short RetryTestWithNewTimings;

 

Beep_Out_After_CMOS_Timings:

Your normal beep out code goes here

Share this post


Link to post
Share on other sites

1 . The 256k memory testing always runs no matter cold boot , warm boot , warm reset , or a cold reset ...

 

2 . The main reason the stick 1 I got can not boot @ 2.6V @ SPD timing is voltage , not the timing ... Which mean that I already try all kinds of Timing combination @ 2.6V , that 0000:0006h address still read back with wrong value ...

 

The CMOS timing is already set within SPD timing detecting routine ... It's before the ram being tested ...

Share this post


Link to post
Share on other sites

emm..my suggestion.... instead of doing the memory test at SPD default voltage ... is it possible to apply DDR Voltage from setting we save in bios.... :)

Share this post


Link to post
Share on other sites
I don't think UTT is warranted beyond 3.2V

 

OCZ states up to 3.2V in their page. People "say" that using 3.5V is "ok"

 

I wouldn't go beyond 3.3V for 24/7 use, specially if overclocked.

 

And we know many people get hardware to use phase change and "super overclock" their systems just to max it out, and see where it can reach. Many times it isnt really "rock solid", or maybe it is, but you can be victim of instabilities in near future.

 

3.5V lifetime warranty AFAIK...

Share this post


Link to post
Share on other sites

I understand what you are saying Oskar. I was suggesting timings other than SPD, just to see if that could get around the 'memory failure'.

 

How do you explain this situation, and if your stick behaves in the same way:

 

I used to think that I had cold boot problems because my RAM did not have enough voltage, I have since installed the OCZ booster which means my RAM always has the voltage it needs at startup. I still have cold boot problems. But NEVER EVER have I had warm restart problems.

 

My RAM behaves like this:

1) Leave my PC off for 1 hour.

2) Turn my PC on, use it for awhile,

3) Restart machine x 100 if I like, eventually shutdown (do NOT remove AC)

(At which time the first 256K of ram is being tested at every restart and is being seen as fine)

(And I will pass memtest and OCCT)

4) Wait 30 seconds.

5) Turn my PC on - beep beep beep

6) Turn my PC off, wait 1 hour - everything will work again.

 

Something is being done differently between a cold boot and a warm restart. Whether it is the hardware that has been hard wired to a certain state or software that is remembering 'something' after a successful startup and between successive reboots.

Share this post


Link to post
Share on other sites

Interesting read here. I find it odd that Mushkin actually recommends this board (tested in house) and their XP4000 Redline modules running tight timings at 3.3 - 3.5 volts. They encourage you to throw volts at these modules with a little active cooling. Doesn't sound like something a memory manf. would do if the there were possible issues with electron migration.

Share this post


Link to post
Share on other sites

hum....

 

my ex-DFI nf4 (ex, because is dead) killed herself and killed my ex 3200+ winchi, it also seems to killed an pc3200 corsair xpert memory but i'm not 100% sure....

 

also, a friend of mine have an dfi nf4 SLI and his mobo killed his one of his g.skill pc4400 memory

 

DFI don't do nothing for that, and we both know that the problem is on the mobos... we are not newbies and we know that is something weird on DFI nf4 series, i hope that DFI can say officialy something about that (not for me or my friend, all users want to know how to prevent this problems)

 

salut

Share this post


Link to post
Share on other sites
DFI don't do nothing for that, and we both know that the problem is on the mobos... we are not newbies and we know that is something weird on DFI nf4 series, i hope that DFI can say officialy something about that (not for me or my friend, all users want to know how to prevent this problems)

 

you know this for sure? you have the testing tools and equipment to verify this as fact? If so, please give us your results and we will gladly make the changes based on your facts of the motherboard killing periphials.

Share this post


Link to post
Share on other sites

 

Ahhhh, i looked in the wrong place then

I took a look at the PC3200 VX

" 400MHz DDR1 / CL 2-3-3-8 at 2.6V / CL 2-2-2-8 at 3.2V"

and forgot to read the bottom line:

* OCZ EVP® (Extended Voltage Protection) is a feature that allows performance enthusiasts to use a VDIMM of 3.5V ± 5% without invalidating their OCZ Lifetime Warranty.

 

Well, if the manufacturer warranty the mem up to 3.5v and it has been fried, IMO its still more likely the high voltage is killing the RAM, and manufacturer should replace the RAM.

 

I wouldn't go beyond 3.2-3.3V for 24/7 use.

 

PS.: one thing I notice when I was looking at Newegg.com and made a search for 3.3-3.4V DDR RAM, and they returned as result 2 Mushkin modules, that I suppose are equipped with the UTT chips. Has anyone fried these Mushkin modules with the DFI and high voltage?

 

mushkin Redline XP4000 512MB 184-Pin DDR SDRAM DDR 500 (PC 4000) Unbuffered System Memory Model 991439 - Retail

http://www.newegg.com/Product/Product.asp?...N82E16820146390

 

mushkin Redline XP4000 1GB (2 x 512MB) 184-Pin DDR SDRAM DDR 500 (PC 4000) Unbuffered Dual Channel Kit System Memory Model 991440 - Retail

http://www.newegg.com/Product/Product.asp?...N82E16820146392

 

I always thought higher Vdimm than 3.3V in DDR "I" modules was a risk, but if the manufacturers warranty for that, its their problem :D

Share this post


Link to post
Share on other sites
Guest
This topic is now closed to further replies.
Sign in to follow this  

×
×
  • Create New...