sign up for an account - free!    forgot password
Username:     Password:    

Kithbridge
Connect Your Organization With New Media
  • Strategy
  • Blog Tracking
  • Custom Blog Feeds
From the creator of
The Truth Laid Bear






Hosted By



Web Design By Plum C



Ecosystem MySQL Issue

The Ecosystem and Showcase rankings are not updated today, because a periodic problem I have with my MySQL database is back.

Basically, the Ecosystem has one huge table with almost a million rows in it which stores all the links that it harvests. Every week or so recently, it simply stops working. When I try to access it through phpMyAdmin, I get:

Can't open file: 'links.MYD'. (errno: 145)

Error

SQL-query :

SHOW KEYS FROM `links`

MySQL said:

Can't open file: 'links.MYD'. (errno: 145)

Back

Thus far, I've found no way to recover from this other than to drop the table, re-add it, and do a full rescan of all webolgs for links. (A full scan happens every day anyway, so thankfully this is never really a case of data loss).

Any suggestions from MySQL gurus out there?

Anyway: unfortunately I'm not going to be able to resolve this today, so stats will be off until tomorrow...


Update 11/15: Fixed for now. Thanks to all for the suggestions.

To describe the table in question a little better, here's the creation script:

CREATE TABLE `links` (
`source_url` varchar(200) NOT NULL default '',
`source_host` varchar(200) NOT NULL default '',
`dest_url` varchar(200) NOT NULL default '',
`dest_host` varchar(200) NOT NULL default '',
`new` varchar(10) default NULL,
`blacklist` varchar(10) default NULL,
KEY `source_host` (`source_host`,`dest_host`),
KEY `source_url` (`source_url`,`dest_url`),
KEY `source_url_2` (`source_url`),
KEY `dest_url` (`dest_url`),
KEY `source_host_2` (`source_host`),
KEY `dest_host` (`dest_host`)
) TYPE=MyISAM;

I am fairly sure I am not using all those indicies at the moment and some are redundant/wasteful.

But basically, it's a very simple table, and I'm not entirely sure there's a way to rearchitect it to be more efficient. It's pretty self-explanatory: it's just a big collection of all links found during the Ecosystem's scans, including data on the source and destination URLs. The 'host' is a somewhat clumsy concept which I use to try to identify a particular blog (so all links going to a particular blog may have different dest_urls, but they should all have the same dest_host).

Anyway. I'd welcome suggestions on how to a) avoid the routine corruption problem, or b) optimize this layout.

Thanks again all...

-NZB

Trackbacks