Ask the Community
Groups
Diagnosing memory errors with IPMI - Connect IT Community | Kaseya
<main> <article class="userContent"> <h2 data-id="summary"><strong>SUMMARY</strong></h2> <p>Diagnosing memory errors with IPMI</p> <h2 data-id="issue"><strong>ISSUE</strong></h2> <p></p> <p>Newer Unitrends DPU platforms use IPMI firmware which can log memory errors. For example:</p> <table border="0" cellpadding="0" style="border-spacing: 0px;"><tbody><tr><td colspan="1" rowspan="1" valign="top" style="width: 246px;"> <p>Recovery-712</p> <p>Recovery-713</p> <p>Recovery-813</p> <p>Recovery-822</p> </td> <td colspan="1" rowspan="1" valign="top" style="width: 246px;"> <p>Recovery-823</p> <p>Recovery-833-100</p> <p>Recovery-833-200</p> <p>Recovery-943</p> </td> </tr></tbody></table><p>Use IPMI commands to see memory errors in the firmware log.</p> <h2 data-id="resolution"><strong>RESOLUTION</strong></h2> <p></p> <ol><li>Download an updated ipmiutil. Skip this step if ipmiutil-3.0.0 or later is already installed. <ul><li>For CentOS 6: <pre class="code codeBlock" spellcheck="false" tabindex="0"> wget <a href="/home/leaving?allowTrusted=1&target=ftp%3A%2F%2Fftp.unitrends.com%2Fsupport%2FHotfixes%2Fipmiutil-3.0.0-1_el6.x86_64.rpm">ftp://ftp.unitrends.com/support/Hotfixes/ipmiutil-3.0.0-1_el6.x86_64.rpm</a></pre> </li> <li>For CentOS 5: <pre class="code codeBlock" spellcheck="false" tabindex="0"> wget <a href="/home/leaving?allowTrusted=1&target=ftp%3A%2F%2Fftp.unitrends.com%2Fsupport%2FHotfixes%2Fipmiutil-3.0.0-1_el5.x86_64.rpm">ftp://ftp.unitrends.com/support/Hotfixes/ipmiutil-3.0.0-1_el5.x86_64.rpm</a></pre> </li> </ul></li> <li>Update the RPM package: <pre class="code codeBlock" spellcheck="false" tabindex="0"> rpm -U ipmiutil-3.0.0*.rpm</pre> </li> <li>Look for any recent memory events: <pre class="code codeBlock" spellcheck="false" tabindex="0"> ipmiutil sel -e </pre> </li> </ol><br>Below is sample output of a CPLD error, which is usually caused by a memory fault. <pre class="code codeBlock" spellcheck="false" tabindex="0"> RecId Date/Time_______ SEV Src_ Evt_Type___ Sens# Evt_detail - Trig [Evt_data] 000a 04/10/13 15:03:41 CRT BMC #ff CPLD CATERR Asserted 6f [a0 1c ff]</pre> <p>Below is sample output of a memory ECC error. In this event, an offline memory test with a minimum of four clean passes should be run.</p> <pre class="code codeBlock" spellcheck="false" tabindex="0"> RecId Date/Time_______ SEV Src_ Evt_Type___ Sens# Evt_detail - Trig [Evt_data] 7840 08/09/11 15:10:47 MIN BMC Memory #08 Uncorrectable ECC, DIMM6/CPU1 6f [20 ff 10]</pre> <p>The DIMM should be more accurate and easier to interpret in 3.0.0, as shown below. This error is typically not a memory fault but rather bad data being passed to memory. Review the operating system logs (messages), dmesg and other application logs (/usr/bp/logs.dir) to determine the source of these errors.</p> <pre class="code codeBlock" spellcheck="false" tabindex="0"> ipmiutil ver 3.00 ievents version 3.00 RecId Date/Time_______ SEV Src_ Evt_Type___ Sens# Evt_detail - Trig [Evt_data] 7840 08/09/11 15:10:47 MIN BMC Memory #08 Correctable ECC, P1_DIMMF1 6f [20 ff 50]</pre> <p>CPLD events are not DIMM-specific, but if this is an ECC error event, then the faulty DIMM may be indicated by the event, so replace the specified DIMM.</p> <h2 data-id="cause"><strong>CAUSE</strong></h2> <p></p> <p>The BIOS detects a memory error, either with ECC or with CPLD, and logs it to the IPMI firmware system event log (SEL). </p> <h2 data-id="notes"><strong>NOTES</strong></h2> <p>See <a rel="nofollow" href="/home/leaving?allowTrusted=1&target=http%3A%2F%2Fipmiutil.sourceforge.net">http://ipmiutil.sourceforge.net</a> for a UserGuide and other files.<br>For more information, see <a rel="nofollow" href="/home/leaving?allowTrusted=1&target=https%3A%2F%2Funitrends-support.zendesk.com%2Fhc%2Fen-us%2Farticles%2F360013151657">Using IPMI LAN for remote access</a> </p> </article> </main>