Sunday, April 09, 2006

Wikipedia back up - recovery from power outage

Wikipedia reborn
Wikipedia is now up again after several hours down time. I found the Wikipedia/Wikitech Server Admin Log which provides some insights about what happened.

It seems that they had a major power failure. Even though they seemed to have gotten power back fairly quickly, did it take a lot more time to get all the servers up and running properly again.


See here the excerpt from from Server Admin Log. Please note that the Date and Times are GMT Time.

April 10
04:26 jeluf: ixia, thistle, lomaria, db1 have broken replication settings, webster has database page corruption. Taking db2 out of rotation to create copies from it.
04:20 jeluf: mounted /home on all DB servers
04:03 brion: ran mass-correction of bad-timestamped entries on enwiki (1529 revision records)
03:05 brion: srv71-srv79 had wrong clock, apparently set to local time instead of UTC.
01:45 brion: irc feeds online. had to rescue udprec from kate's old home dir
01:38 brion: taking thistle and db1 out of rotation; broken replication.
01:32 brion: turning read_only off on adler. seems to be set to go on always on boot.
01:28 brion: things look mostly good; tried to take site read/write but someone has put adler into read-only? examining
01:23 brion: got fs-squids on the right ip. seems to work now.
01:20 brion: had to start lighty on amane
01:18 brion: trying to get fileserver squids+lvs up. (avicenna as lvs master)
01:10 brion: run-icpagent.sh didn't take previously; seems to have helped now
01:04 brion: trying to add 10.0.5.5 on dalembert also. no idea if this is correct. 10.0.5.3 works internally, but squids still don't show anything. there's no explanation for this that is obvious to me.
00:55 brion: added the lvs master ip on dalembert; http'ing to it internally seems to work, but still nothing from outside
00:49 brion: trying starting LVS monitor thingy on dalembert. no clue if it's working
00:45 brion: turning on apaches

April 9
23:45 brion: srv33, srv36 should now replicate properly.
External storage borkgage, 2006-04-09
23:20 brion: looking at srv33, srv36 external storage; jens reports replication seems borked
22:00 brion: added izwinger ip to suda; it wasn't automatic.
21:52 brion: finally got into srv1 and albert. maybe working
21:49 brion: ldap depends on dns; dns is still broken. we can't reach srv1 or albert.
21:32 brion: still trying to get some core machines online (suda booting; albert ?? srv1 ??). kyle should be available in 30 minutes
20:55 brion: bw is onsite and available to poke at machines. there was a power problem; some machines seem to still eb booting
20:42 brion: phoned kyle (message)
20:38 brion: network mostly back up, still trying to get in
19:20 brion: PowerMedium offline?

Btw. None of my changes got lost and I was able to finish my changes to the ASCII art Article. Check it out.

I also created a new ASCII and ANSI. Yes, a new one. I created it for deviantART. Enjoy.

deviantART ANSIdeviantART ASCII
       Ciao Carsten a.k.a. Roy/SAC



...cu at dA

No comments:

Post a Comment

Hi, thanks for taking the time to comment at my blog.

Due to spam issues comments are not immediately posted on the site and require my manual approval first, before they become visible.

I try to approve comments as quickly as possible and usually within 24 hours.

To be notified about follow up comments that are made after yours, use the subscribe option with your email address and you will receive an email alert, if somebody else comments at this post in the future.

Also check out the rest of the website beyond this blog, visit RoySAC.com. Also see my YouTube channels, SACReleases for intros and demos.

Cheers!
Carsten aka Roy/SAC

Note: Only a member of this blog may post a comment.