This is becoming a Saga. One of my new server went offline 2 days ago. I logged in through a serial session and everything was fine apart from the networking not working.
/etc/rc.d/init.d/network restart
Failed… Telling the cable wasn’t connected. It was and the link was live.
A hard reboot brought it back up and I kept my fingers crossed it was a one off. I knew at the back of my mind it wasn’t but I kept my fingers crossed anyway.
After much searching of the net I found this is a known problem across a lot of Redhat based distributions and affects the e1000e driver.
Last night the server went down again. I found that the EPEL repo has an allegedly fixed driver. I installed this and watched it for 2 hours.
rpm --import http://elrepo.org/RPM-GPG-KEY-elrepo.org rpm -Uvh http://elrepo.org/elrepo-release-6-4.el6.elrepo.noarch.rpm yum update yum install kmod-e1000e /sbin/shutdown -r now
I then went to bed at around 2AM this morning. I was woken by the klaxon alarm on my BB telling me a server was down. Yup the fix didn’t fix it.
Further research led me to add this line to my kernel options line in /boot/grub/grub.conf
pcie_aspm=off
Update: This worked.
Did your last attempt resolve tghe problem?
Forgot to write a follow up.
This fix worked.