The MySQL DBA Feed Resource

August 20, 2008

Xaprb

How to unit-test code that interacts with a database
I got some interesting comments on my previous article about unit testing Maatkit, including echoes of my own conversion to the unit-testing religion. One of the objections I've heard a lot about unit-testing is how it's impossible to test code that talks to a database. "It's too hard," ...

August 20, 2008 12:50 AM

August 19, 2008

Xaprb

Microsoft gets their way with so-called XML standard
It has all played itself out according to Microsoft's wishes. They have railroaded through a so-called standard for document representation, gotten it rubber-stamped by so-called standards bodies, and fought their way past all the objections of sensible people and companies. In the process, lots of developing nations have ...

August 19, 2008 11:11 PM

Pythian

Announcement: The Pythian Group and Open Query: Partners


I’d like to share some great news — The Pythian Group and Open Query have become partners!

Open Query is a leading provider of high-quality MySQL, PostgreSQL and related training in Australia and New Zealand. They offer consulting services too, and are also known for their MySQL Graph Storage Engine. Feel free to browse through Open Query web-site for more info.

Open Query was founded by Arjen Lentz, who was employee number 25 at MySQL AB. If you follow the MySQL community then I’m sure you already read Arjen’s blog.

Since you’re reading this blog, I guess you probably already know what Pythian does, but if you want to learn more, please click through to our home page.

Together with Open Query, we are going to extend our service offerings and strengthen our positions in outsourced database management services, consulting, and training.

by Alex Gorbachev at August 19, 2008 09:07 PM

Scott Noyes

Two for the price of one

Haven’t done any sneaky puzzles in a while. How would you accomplish this?

mysql> CREATE TABLE t1 (id int);
Query OK, 0 rows affected (0.08 sec)

mysql> INSERT INTO t1 VALUES (1);
Query OK, 1 row affected (0.02 sec)

mysql> SELECT COUNT(*) FROM t1;
+----------+
| COUNT(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

by snoyes at August 19, 2008 05:20 PM

Lenz Grimmer

The key to accessing your data: MySQL Connectors and bindings for various languages

Being able to use an Open Source DBMS to manage your data is nice, but what good would it be if you can't easily access it from your applications? One key factor to the popularity of MySQL is probably its wide range of available language bindings, which started with support for C, PHP and Perl from early on.

I've tried to gather a list of languages and their respective MySQL drivers/modules below. It's by no means complete or exhaustive, but I think I covered quite a lot of popular as well as exotic programming languages.

There is a number of connectors which are actually developed by the Sun Database Group (aka MySQL) itself and that are ready to use:

  • Connector/ODBC - Standardized database driver Windows, Linux, Mac OS X, and Unix platforms.
  • Connector/J - Standardized database driver for Java platforms and development.
  • Connector/Net - Standardized database driver for .NET platforms and development.
  • Connector/MXJ - MBean for embedding the MySQL server in Java applications.
  • MySQL native driver for PHP - mysqlnd - The MySQL native driver for PHP is an additional, alternative way to connect from PHP 6 to the MySQL Server 4.1 or newer.
  • libmysql - The original implementation of the MySQL Client/Server protocol (in C). This library is the basis for a large number of client libraries for other languages.

In addition to the above, there are several other connectors developed by Sun/MySQL, which are still under development:

But it's not only us who develop language bindings for the MySQL server. There is an abundance of drivers that are developed and maintained by the Community, independently from Sun/MySQL (but sometimes with support or guidance from MySQL engineers). The list below is not sorted in any particular order other than the sequence in how I found them over time:

I probably forgot some other drivers/bindings - if you have any more to add, please let me know!

And if you'd like to create your own implementation for your favourite language: the protocol is documented here and here. Jan's additional notes may also be helpful to get you started.

by nospam@example.com (Lenz Grimmer) at August 19, 2008 02:31 PM

Colin Charles

Where I used to live (or how I played with Google Street View)

Where I used to live - Google Street View

This is interesting. Google’s Street View. Yes, I’ve seen a lot about it on the blogosphere, but I decided to finally try it out. The photo is of the house, where I used to live. Zooming in, now I can tell you that to the left of that, is where my dodgy landlord still lives ;)

Actually, more to the point. These pictures were definitely taken this year. I know this because I had the room in front, upstairs, and there were things sticking out between the shutters and the window. This picture is too serene, so must’ve been after November 2007.

I see good potential in Street View. Think about mashups with a site that focuses on you finding rental properties. Now people can comment on the property, look at the surrounding neighbourhood, and basically help you make a better choice at renting.

The real estate industry has moved online (in Australia, I can think of Ray White, LJ Hooker, at the top of my head), but its not really been disrupted. No, domain.com.au isn’t disruption - look who owns it?

I was mildly surprised to find out about HomeSpace.sg from the e27 unconference I attended a few weeks back. Its focus currently is only for homes that are for sale, but they focus on the important aspects - like is it near an MRT, what kind of shopping malls are nearby, if you’re buying a property and have kids in mind, what zone to head to and so on.

They’re mashing it up with Google Maps. Pity there isn’t Street View in Singapore, huh?

Street View does 360° views as well. Nifty, if you ask me. See the surrounds. Does anyone know of a real estate disruptor in Australia, yet? Otherwise, there’s definitely room to start coding one…

by byte at August 19, 2008 07:34 AM

Lewis Cunningham

Last Week For Database Survey

This is the last week to participate in a database usage survey. If you haven't already done so, please take a few minutes to answer 25 questions.

LewisC

Technorati : , ,

by LewisC (noreply@blogger.com) at August 19, 2008 08:15 AM

High Performance MySQL

Worse than DDOS

Today I worked on rather interesting customer problem. Site was subject what was considered DDOS and solution was implemented to protect from it. However in addition to banning the intruders IPs it banned IPs of web services which were very actively used by the application which caused even worse problems by consuming all apache slots which were allocated to the problem. Here are couple of interesting lessons one can learn from it.

Implement proper error control In reality it took some time to find what was the issue because there was no error reporting for situation of unavailable web services. If log would be flooded with messages about web services being unavailable it would be much easier to find.

User Curl PHP Has a lot of functions which can accept URL as parameter and just fetch the data transparently for you. They however do not give you good error control and timeout management compared to curl module. Use that when possible it is easy. You can implement your own class to fetch required URL with single call while having all needed timeout handling and reporting to match your application needs. If you’re using PHP functions make sure default_socket_timeout has proper value or set it per session.

Set Curl Timeouts Set both TIMEOUT and CONNECT_TIMEOUT as these apply to different connection stages and just setting timeout is not enough.

Beware of PHP sessions “files” handler I already wrote about this topic, but when troubleshooting this all takes another angle. Default file handler means file gets locked while PHP request is being served. In this case because of network stall request could be taking 100+ seconds. Users are inpatient and do not wait so long pressing reload multiple times… which just adds to the list of users waiting on session file lock. This not only makes apache slots consumed at much higher pace but makes it harder to find what exactly is causing the lock because most of offending processes you can find from apache “server-status” will be just waiting on file to be unlocked. I used “gdb” to connect to the process showing high number of seconds since start finding where it is stuck. If it is somewhere in curl module (or mysql - waiting on long query to come back) - this is our query if it is waiting on the session file lock we can get that file and use fuser to see what other processes are using that files - these would be either waiting on locks or owning the lock and so one of them is the process we’re looking for. Things are much easier with say memcached session storage - this does not cause any locks for parallel session use so only the process which actually stalls waiting on external resource will show high number of seconds since request start.


Entry posted by peter | 3 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

by peter at August 19, 2008 06:55 AM

Colin Charles

YouTube, an MTV replacement

I think YouTube has arrived. MTV, Channel V, and the rest should start worrying. Why?

I have Channel V and MTV on cable TV. I mostly use cable for the news channels, and music, but lately, these music channels are showing more and more “reality TV” shows, news features and other garbage. Yes, Punkd/Jackass is funny (as is the one where a bunch of guys go on dates, do stupid things for cash), but the whole purpose of a music channel is to show me music videos…

I notice that in the States, there’s VH1, which seems to be mostly music centric. VH1 is interesting as a channel… every music video they have, tacks on a little advert for Rhapsody. I’ve never had the privilege of trying Rhapsody, because they’re US only, but its a smart move.

Now why has YouTube arrived for me? Shitty consumer DSL link (1mbps/512kbps) is allowing me to stream music videos in realtime. If I tether a PC to the TV (MythTV maybe? Apple TV?) I’ve got an MTV replacement right there.

Playlist creation is something I think can be improved with YouTube. Maybe a mechanism to cache FLV files locally afterward (does Squid do this?), so that eventually I build up a music video library all thanks to YouTube.

All in all, ways to make cable TV more and more irrelevant (for example, I haven’t bothered with HBO after Sex & The City ended).


Can’t Smile Without You, go Barry Manilow (Live)


Look What You’ve Done - Jet

Good times ahead. Maybe I should just build that Myth box… Wonder how current Mikal+Stewart’s MythTV book still is… (and if there’s coverage on building your MTV killer).

by byte at August 19, 2008 05:26 AM

Arabx

VirtualBox, compiling Part 2

So I managed to find all dependencies after some trial and error for compiling VirtualBox 1.6.4 under Ubuntu 8.0.4, then finding the Linux build instructions to confirm.

It was not successful however in building, throwing the following error:

kBuild: Compiling dyngen - dyngen.c
kBuild: Linking dyngen
kmk[2]: Leaving directory `/usr/local/VirtualBox-1.6.4/src/recompiler’
kmk[2]: Entering directory `/usr/local/VirtualBox-1.6.4/src/apps’
kmk[2]: pass_bldprogs: No such file or directory
kmk[2]: *** No rule to make target `pass_bldprogs’. Stop.
kmk[2]: Leaving directory `/usr/local/VirtualBox-1.6.4/src/apps’
kmk[1]: *** [pass_bldprogs_before] Error 2
kmk[1]: Leaving directory `/usr/local/Virtu

More searching, I needed to add two more files manually. Read More Here.

A long wait, compiling for 20+ minutes, and a necessary reboot as upgraded images threw another error, I got 1.6.4 running, and able to boot Fedora Core 9 image created under 1.5.6

But the real test, and the need for this version was to install Intrepid.

This also failed with a Kernel panic during boot. More info to see this reported as a Ubuntu Bug and Virtual Box Bug.

More work still needed.

by Ronald at August 19, 2008 01:06 AM

August 18, 2008

Pythian

mysqlbinlog Tips and Tricks

So, you have a binlog. You want to find out something specific that happened inside of it. What to do? mysqlbinlog has some neat features, which I thought we would look at here.

I should first explain what mysqlbinlog really is. It is a tool that lets you analyze and view the binlogs/relaylogs from mysql, which are stored in binary format. This tool converts them to plaintext, so that they’re human-readable.

For the first tip, let’s start with the --read-from-remote-server option, which allows you to examine a binlog on a master server in order, perhaps, to dump it onto your slave and compare master/slave logs for potential problems*.

$ mysqlbinlog --read-from-remote-server -uwesterlund -p mysql-bin.000001 -h 127.0.0.1 -P 3306 | head -5
Enter password:
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#080815 19:25:23 server id 101  end_log_pos 107 	Start: binlog v 4, server v 6.0.5-alpha-log created 080815 19:25:23 at startup

Pretty useful!

Now, let’s assume we have a binlog that is 94 lines long*:

(more…)

by Nicklas Westerlund at August 18, 2008 08:48 PM

Kaj

Running with Alexander Stubb

On Saturday, I spent a couple of hours running with Alexander Stubb. No, Alex is not our newest recruit to the MySQL Support Team, he’s the Foreign Minister of Finland (the guy in the yellow T-shirt below).

Alex Stubb tying his shoelaces

But let me rewind to the beginning. I have been increasing my running to over 1200 km a year, and when I heard that the Münchner Stadtlauf half-marathon doesn’t crash with Finnish midsummer (something never to be missed), I registered for it and finished it in 1:45:58. And I calculated I would have a chance at going below four hours at a full marathon, so I registered for the Helsinki City Marathon, which took place last Saturday on 16.8.2008 in my native Finland.

At a restaurant close to the Olympic Stadium just before start, I met with my long-time MySQL colleagues Patrik Backman and Giuseppe Maxia, as well as our fresh Sun colleague Peter Eisentraut, of PostgreSQL fame. When the time for the start (fairly late, three o’clock in the afternoon) approached, I went for the start area, and was pleasantly surprised to hear the race moderator interview our Foreign Minister.

To continue my pleasant surprise, Alex said he was not only going to be the starter of the marathon, but he was going to run himself. The interview was started in Finnish, and then went on to Swedish. “What’s your target time?” the interviewer asked. “Oh, I have one, but I haven’t published it” — he evidently had the same marathon goal communication policy as I. The interviewer swapped away from our two domestic languages to foreign ones, going over English and German to French. And Alex continued smoothly and fluently in his colloquial, youthful jargon in all of the five languages. He had a targeted and clearly unscripted message in all languages. In German, for instance, Alex shared how Frank-Walter Steinmeier (Germany’s Foreign Minister) the day before had found him to be slighly mad for going for a marathon, right after a week full of intense Georgia related negotiations.

The interview formed a very good pep talk for us runners, and an opportunity for me to no longer have to be ashamed of the linguistic disabilities of our Foreign Minister (as used to be the case when I grew up, leading to a whole genre of jokes). I share Alex’s view that one can show respect by speaking the language of the audience (and Helsinki City Marathon has a very international audience).

So off we went, and the start went well. The weather conditions were perfect: Drizzle. And 16-18 degrees. Not too cold, not too hot. A few km after the start, I saw Patrik, Giuseppe and Peter amongst the spectators, and they took some pictures of me.


The race passed by the Parliament Building, went out through familiar parts of Helsinki (Mejlans, Tölö, Munksnäs) to Esbo close to where MySQL’s first Finnish offices were, and continued via the Nokia House to the absolute centre of Helsinki. That was the most enjoyable part of the running experience. I kept to my pace, and I felt like being on a train. I just went with the flow and didn’t feel any effort. But after a while, my left knee started to remind me of its existence. Past the 21,1 km half way mark, it evened out as my right knee also begged for some attention. But basically, everything went fine until a while beyond 30 km, when the energy reserves of my body were depleted.

“So why didn’t I drink more of the Gatorade offered?”, the running reader will ask. Well, I had prepared a perfect excuse for the scenario in which I would have to stop: When in India for the MySQL Camp in Bangalore, I had caught salmonellosis. Salmonella bacteria are not a nice companion if you run, but also not total inhibitors for running. That said, I chose not to upset my stomach any further by consuming potent, unfamiliar drinks. And the effect was that I dropped from 1606th position at 30 km to 1970th position in the goal (out of 5436 participants).

So this meant that the last few km weren’t all that enjoyable. But I still didn’t get cramps like the 55 year old male athlete who was screaming “aijaijai” less than a km from the goal, at as many decibels as his lung capacity allowed. Humbling.

I finished at 3:55:22. That’s my best marathon ever, about 45 min faster than my previous personal record. I didn’t feel well in the goal, rolling back a few of my latest drinking transactions just after crossing the finish line (honest, it was just water).

After the run, I went for something very typically Finnish: A sauna. And there can hardly be a better timing for a nice, warm bath than right after getting into the goal of a cool (at least temperature-wise) marathon run. My body is often out of balance in many ways after a marathon, including but not limited to shivering out of cold. Ah, was it nice to relax and do some joint bragging together with fellow runners in the Olympic Swimming Stadium. And after the sauna, there were still plenty of runners coming to the goal area. I felt zero superiority over them, as I still vividly remember my own first marathon with a time of 5:41. A marathon for an amateur must strictly be about competing against yourself. And if you (like me) start lousy enough, you’ve got an easy target to beat.

And, what happened with Alex Stubb?

Well, he had shared with us that he had finished a previous Helsinki marathon in 3:59 and a Brussels marathon a bit below 3:40. I suspect he might have targeted 3:30, and like me, he lost positions towards the end of the race. At any rate, he finished 688th with a fabulous time of 3:31:25. Extremely impressive for anyone, especially for a Foreign Minister!

Alex Stubb running

(Yes, Alex did wear something slightly more appropriate than a suit for the marathon).

by kaj at August 18, 2008 08:22 PM

Arabx

Interacting with BuildBot using IRC

Using BuildBot for Drizzle has been a great way to help in the verification of the sometimes rapid code changes that are being committed.

Curious why the IRC notifier within BuildBot only joined and exited the #drizzle channel in IRC, some further investigation of the IRC Documentation lead to more information to share.

By default, the following configuration is not much help in any automated notification.

from buildbot.status import words
c['status'].append(words.IRC(host=”irc.freenode.net”, nick=”drizzle_buildbot”, channels=["#drizzle"]))

However, within IRC you can query using several commands. My first trials.

rbradfor: drizzle_buildbot: list builders
[3:10pm] drizzle_buildbot: Configured builders: centos5.64.1 centos5.64.1-mt debian4.32.1[offline] debian5.32.1 debian5.32.2 debian5.64.1 doxygen fedora8.32.1[offline] fedora8.64.1 gentoo8.32.1 gentoo8.64.1 osx105.32.1 osx105.32.1-mt osx105.64.1[offline] osx105.64.1-mt[offline] suse11.32.1[offline] ubuntu804.32.1[offline] ubuntu804.32.2[offline] ubuntu804.32.3[offline] ubuntu804.32.4 ubuntu804.32.4-mt ubuntu804.32.5 ubuntu804.32.6[offline] ubuntu804.32.7[offline] ubuntu804
[3:10pm] rbradfor: drizzle_buildbot: status all
[3:10pm] drizzle_buildbot left the chat room. (Excess Flood)
[3:11pm] drizzle_buildbot joined the chat room.
[3:11pm] rbradfor: drizzle_buildbot: notify on
[3:11pm] drizzle_buildbot: The following events are being notified: ['started', 'finished']
[3:13pm] drizzle_buildbot: build #484 of centos5.64.1 started including []
[3:18pm] drizzle_buildbot: build #484 of centos5.64.1 is complete: Success [build successful]  Build details are at http://drizzlebuild.42sql.com/builders/centos5.64.1/builds/484
[3:25pm] rbradfor: drizzle_buildbot: notify off
[3:25pm] drizzle_buildbot: The following events are being notified: []
[3:26pm] rbradfor: drizzle_buildbot: watch centos5.64.1
[3:26pm] drizzle_buildbot: there are no builds currently running
[3:34pm] rbradfor: drizzle_buildbot: notify on failed
[3:34pm] drizzle_buildbot: The following events are being notified: ['failed']
[4:09pm] rbradfor: drizzle_buildbot: help
[4:09pm] drizzle_buildbot: Get help on what? (try ‘help foo’, or ‘commands’ for a command list)
[4:09pm] rbradfor: drizzle_buildbot: help commands
[4:09pm] drizzle_buildbot: Usage: commands - List available commands
[4:09pm] rbradfor: drizzle_buildbot: commands
[4:09pm] drizzle_buildbot: buildbot commands: commands, dance, destroy, excited, force, hello, help, join, last, leave, list, notify, source, status, stop, version, watch

The docs list the following commands.

To use the service, you address messages at the buildbot, either normally (botnickname: status) or with private messages (/msg botnickname status). The buildbot will respond in kind.

Some of the commands currently available:

list builders
    Emit a list of all configured builders
status BUILDER
    Announce the status of a specific Builder: what it is doing right now.
status all
    Announce the status of all Builders
watch BUILDER
    If the given Builder is currently running, wait until the Build is finished and then announce the results.
last BUILDER
    Return the results of the last build to run on the given Builder.
join CHANNEL
    Join the given IRC channel
leave CHANNEL
    Leave the given IRC channel
notify on|off|list EVENT
    Report events relating to builds. If the command is issued as a private message, then the report will be sent back as a private message to the user who issued the command. Otherwise, the report will be sent to the channel. Available events to be notified are:

    started
        A build has started
    finished
        A build has finished
    success
        A build finished successfully
    failed
        A build failed
    exception
        A build generated and exception
    successToFailure
        The previous build was successful, but this one failed
    failureToSuccess
        The previous build failed, but this one was successful 

help COMMAND
    Describe a command. Use help commands to get a list of known commands.
source
    Announce the URL of the Buildbot's home page.
version
    Announce the version of this Buildbot. 

If the allowForce=True option was used, some addtional commands will be available:

force build BUILDER REASON
    Tell the given Builder to start a build of the latest code. The user requesting the build and REASON are recorded in the Build status. The buildbot will announce the build's status when it finishes.
stop build BUILDER REASON
    Terminate any running build in the given Builder. REASON will be added to the build status to explain why it was stopped. You might use this if you committed a bug, corrected it right away, and don't want to wait for the first build (which is destined to fail) to complete before starting the second (hopefully fixed) build.

I don’ want to flood the IRC channel with messages, so delving deeper into the documentation via the following commands gives me more tips.

$ cd buildbot-0.7.8
$ pydoc buildbot.status.words

By defining categories against the IRC notification, and assigning builders to a given category, in theory you will get notifications just for these builders. I didn’t seem to produce the desired results, so for now it needs to be manual interaction until I get additional time to investigate.

b00 = {'name': "centos5.64.1", 'slavename': "centos5_64", 'builddir': "build00", 'factory': f1, 'category': "irc" }
...
from buildbot.status import words
c['status'].append(words.IRC(host=”irc.freenode.net”, nick=”drizzle_buildbot”, channels=["#drizzle"], categories=["irc"]))

by Ronald at August 18, 2008 08:18 PM

Pooteeweet

Open Source is not making enough rich people richer

I keep seeing this posts by some of the manager types on planet MySQL about how they or some other guy is worrying about open source vendors not raking in billions or are not stealing billions of money out of peoples pockets that should not be playing on the stock market and things along those lines. While I do agree that its great to see open source software flourish .. actually let me clear that up, why do I even care if open source software flourishes? I care because I think open source software enables a different kind of growth for society, one that is shared, one that lowers barriers, one that I feel is more in tune with a world at peace.

Of course I want people that take part in this to be able to provide themselves and their families a decent life. But the fact of the matter is, these people do not need millions, the people that use this open source as an enabler do not need millions in marketing budgets either to realize the usefulness. Of course market capital can help in funding boring tasks like QA and documentation or full time developers etc. But the show will go on even without that.

And guess what? Those small companies that make a buck with open source, they foster a culture where people go home happy at the end of the day instead of being bitter like most others. This is of course not something easily valued in monetary terms, but its nice that the pool of would be homocidal maniacs is reduced by these companies. At the same time even big guys are making plenty of money and giving back a bit here and there. New companies are popping up and slowly making it to big bucks too. So whats the point of this blog post? Lets get over this obsession with these companies that are supposed to make a few people insanly rich by selling their life (aka modern slaves) to VCs that are owned by people that haven't figured out yet to do better things in their lifes but to stack their millions higher and higher.

Reminds me about the irritation I see these aquisition rumors: Of course it matters to people who their are getting gobbled up by and so its not trivial that a Microsoft goes knocking at the doors of Yahoo (not that Yahoo is a small shop or entirely dependent on open source, but from the outside it seems none the less sufficiently open sourcy for a fair share of the tech staff). Do they really believe that key people will stick around after they are required to use their new "@microsoft.com" email address? Especially since these key people have plenty of options. Then again I guess these guys know their risk-analysis 101 and they are mostly after buying users.

While I am trying to save the world, I might also want to mention that its time we kick the system a serious jolt. I like standards, but the out dated processes by ISO and the likes are making a mockery if the idea. Seems like a little less corruption in such public services is too much to ask. Anyways, lets end this post not well formed </rant>

by Lukas Kahwe Smith at August 18, 2008 05:38 PM

Xaprb

How Maatkit benefits from test-driven development
Over in Maatkit-land, Daniel Nichter and I practice test-first programming, AKA test-driven development. That is, we write tests for each new feature or to catch regressions on each bug we fix. And -- this is crucial -- we write the tests before we write the code.* The ...

August 18, 2008 05:30 PM

Kevin Burton

burtonator

I’m not sure why REST needs defending but apparently it does

Dare steps in and provides a solid background and Tim follows up

What is really interesting about REST from my perspective (and not everyone will agree) is that since it’s REST you can actually solve real problems without getting permission from a standards board.

It’s pretty easy to distrust standards bodies. Especially new standards bodies. The RSS wars were a joke. Atom took far too long to become standardized.

At Wordcamp this weekend one of the developers was complaining about brain damage in XMLRPC - god only knows why we’re still using this train wreck.

I noted that WordPress should abandon XMLRPC and just use REST. They seem to be headed in that direction anyway (by their own admission).

Their feedback was that they didn’t want to go through the Atom standardization process to extend the format to their specific needs.

You know what? You don’t need permission. Just write a documented protocol taking into consideration current REST best practices and ship it.

If people are using the spec and find value then it will eventually become a standard.

I’m a bit biased of course because Spinn3r is based on REST.

We burn about 40-50Mbit 24/7 indexing RSS and HTML via GET. We have tuned our own protocol stack within our crawler to be fast as hell and just as efficient.

Could I do this with SOAP? No way! I’ve had to gut and rewrite our HTTP and XML support a number of times now and while it’s not fun at least with REST it’s possible.

REST is complicated enough as it is… UTF-8 is not as straight forward as you would like. XML encoding issues do arise in production when you’re trying to squeeze maximum performance out of your code.

… and while REST is easy we STILL have customers who have problems getting up and running with Spinn3r.

We had to ship a reference client about six months ago that implements REST for our customers directly.

These guys are smart too… if they’re having problems with REST then SOAP/XMLRPC would be impossible.

by burtonator at August 18, 2008 05:16 PM

High Performance MySQL

The ultimate tool for generating optimal my.cnf files for MySQL

There are quite a few “tuning primers” and “my.cnf generators” and “sample my.cnf files” online. The ultimate tool for generating an optimal my.cnf is not a tool. It’s a human with many years of experience, deep knowledge of MySQL and the full application stack, and familiarity with your application and your data.

I don’t know exactly the percentage, but quite a few of the servers I take a look at have been “optimized” with some tuning primer or question-and-answer script that spits out “optimal” parameters for my.cnf.

Most of the time these servers are far from optimal. Sometimes the my.cnf parameters are extremely wrong, to the point of causing a severe performance penalty.

If it were as easy as writing a tool to do this, don’t you think Maatkit would have mk-optimal-mycnf already? In my opinion — as someone who knows very well the complexity of creating a good my.cnf — it’s practically impossible. Much harder than syncing data, or manipulating a replication hierarchy, or any of the other things Maatkit can do already. And I doubt I’ll ever even feel motivated to try creating such a tool.

Don’t bother with scripts. Don’t waste your time with most of the advice you see on the web in forums — much of it is fundamentally wrong, even when it seems to come from an informed source. Don’t put too much faith in the my.cnf samples that come with your operating system; many of them have very bad advice in the comments, such as instructing you on how to set up replication in ways that guarantee breakage.

If you want solid advice, ask someone who knows what they’re doing (and can prove it). Or buy our book.

But even more fundamentally, you should not focus so much on my.cnf. It is not the be-all and end-all of performance. Tuning your server settings has far less impact on performance than tuning your schema, indexing, queries and — you guessed it — thinking deeply about your application architecture. Server settings are a distraction and a waste of time for most people.

Most my.cnf files I see only need minor tweaks, which give only so-so performance improvements. Tuning my.cnf only helps a lot when my.cnf has extremely bad parameters. The kind you’ll get from tuning primers and automated my.cnf optimization scripts.


Entry posted by Baron Schwartz | 13 comments

Add to: delicious | digg | reddit | netscape | Google Bookmarks

by Baron Schwartz at August 18, 2008 03:15 PM

Arjen Lentz

Results for "which MySQL version used in Dev" poll
Here are the results for this poll, as described in my post some weeks back.

So in a nutshell (see the original post for more info), the question was what MySQL version people currently use in development. Turns out that 64% uses 5.0, 19% uses 5.1, and the rest is small fry. It's a fairly small sample anyway (like most polls), but still I find the 3:1 ratio fairly significant.

Traditionally, it's been the going thing to just use the latest MySQL development version for development, even in its very first alpha versions. MySQL 4.0.x was a fab example of that, as Jeremy Zawodny (then Yahoo), Peter Zaitsev, and quite a few others will remember. Where has that "we'll try it and report the bugs" gone? Is it just a matter of the market growing up and taking fewer risks, or is there something else going on, driving those choices?

Ronald Bradford wrote why he recommends 5.1. While I haven't yet gone as far as that, I do agree with his general analysis. I find the new engine plugins the most interesting, the rest of the new features (like partitioning) not so much. And the remaining serious bugs may be more related to the new features.

August 18, 2008 01:03 PM

Lenz Grimmer

2008 Open Source CMS Award: two more weeks to submit your nomination!

Vote now for 2008 Open Source CMS Award!Just to remind you that Packt Publishing is running their Open Source CMS Award again:

The Packt Open Source Content Management System Award is designed to encourage, support, recognize and reward Open Source Content Management Systems (CMS) that have been selected by a panel of judges and visitors to www.PacktPub.com. Now entering its third year, the Award has established itself as an important measure for quality and the popularity of Open Source Content Management Systems.

You have two more weeks to submit your favourite CMS in the following categories:

As for the last two years, I'll be a member of the team of judges that have to choose from the finalists that received the most nominations during the nomination stage.

I look forward to the list of finalists - it's always interesting to find out about new developments in this area and how the established projects in this market have developed over the course of the year!

by nospam@example.com (Lenz Grimmer) at August 18, 2008 10:56 AM

Pet Projects

The Olympics BSOD Incident Discussed
Well, after I talked about the Blue Screen of Death during the Olympics, arstechnica mentioned it.
































The more important point though, is the discussion in the comments.
Here are the theories as to why the BSOD happened:
1) Hard disk failure
2) Pirated copy of Windows with not all the right updates
3) Air particles due to the pollution in Beijing
4) Bad configuration of Windows (at least failing to tell it to restart after crashing)
5) Not using Vista
6) Failing to use a Mac
7) Failing to use Linux
8) An act of terrorism, deliberately trying to embarrass China

by Jonathan (noreply@blogger.com) at August 18, 2008 03:03 AM

August 17, 2008

Arabx

Virtual Box, a world of hurt

I successfully installed Virtual box via a few simply apt-get commands under Ubuntu 8.04 via these instructions.

It started fine, after two small annoying, install this module, add this group messages. I was even able to install Ubuntu Intrepid from .iso. But from here it was down hill.

Attempting to start VM gives the error.

This kernel requires the following features not present on the CPU:
pae
Unable to boot - please use a kernel appropriate for the CPU

Some digging around, and confirmation that the current packaged version of Virtual Box doesn’t support PAE. You think they could tell you before successfully installing an OS. I’m running 1.5.6, I need 1.6.x

$ dpkg -l | grep virtualbox
ii  virtualbox-ose                             1.5.6-dfsg-6ubuntu1                      x86 virtualization solution - binaries
ii  virtualbox-ose-modules-2.6.24-19-generic   24.0.4                                   virtualbox-ose module for linux-image-2.6.24
ii  virtualbox-ose-source                      1.5.6-dfsg-6ubuntu1                      x86 virtualization solution - kernel module

Off to the Virtual Box Downloads to get 1.6.4
Don’t make the same mistake as I did and use the first download link, that’s the commercial version that doesn’t install what you expect, you need the OSE. Of course this is not packaged, it’s only source.

  ./configure
Checking for environment: Determined build machine: linux.x86, target machine: linux.x86, OK.
Checking for kBuild: found, OK.
Checking for gcc: found version 4.2.3, OK.
Checking for as86:
  ** as86 (variable AS86) not found!

Ok, well I go through this step like 4 times, installing one package at a time, I wish they could do a pre-check and give you all missing requirements. I installed bin86, bcc, iasl.

Then I got to the following error.

$ ./configure
...
Checking for libxml2:
  ** not found!

Well it’s installed, all too hard. Throw Virtual Box away for virtualization software. And why am I using it anyway. Because VMWare Server doesn’t work under Ubuntu 8.04 either because of some ancient gcc dependency. Sees I may have to go back to that. I just want a working virtualization people on the most popular Linux distro to install other current distros. It’s not a difficult request.

$ dpkg -l | grep libxml
ii  libxml-parser-perl                         2.34-4.3                                 Perl module for parsing XML files
ii  libxml-twig-perl                           1:3.32-1                                 Perl module for processing huge XML document
ii  libxml2                                    2.6.31.dfsg-2ubuntu1                     GNOME XML library
ii  libxml2-utils                              2.6.31.dfsg-2ubuntu1                     XML utilities
ii  python-libxml2                             2.6.31.dfsg-2ubuntu1                     Python bindings for the GNOME XML library

by Ronald at August 17, 2008 10:34 PM

Domas Mituzas

mmap()

I’ve seen quite some work done on implementing mmap() in various places, including MySQL.
mmap() is also used for malloc()’ing huge blocks of memory.
mmap() data cache is part of VM cache, not file cache (though those are inside kernels tightly coupled, priorities still remain different).

If a small program with low memory footprint maps a file, it will probably make file access faster (as it will be cached more aggressively in memory, and will provide pressure on other cached file data -thats cheating though).

If a large program with lots and lots of allocated memory maps a file, that will pressure the filesystem cache to flush pages, and then… will pressure existing VM pages of the very same large program to be swapped out. Thats certainly bad.

For now MySQL is using mmap() just for compressed MyISAM files. Vadim wrote a patch to do more of mmap()ing.

If there’s less data than RAM, mmap() may provide somewhat more efficient CPU cycles. If there’s more data than RAM, mmap() will kill the system.

Interesting though, few months ago there was a discussion on lkml where Linus wrote:

Because quite frankly, the mixture of doing mmap() and write() system calls is quite fragile - and I’m not saying that just because of this particular bug, but because there are all kinds of nasty cache aliasing issues with virtually indexed caches etc that just fundamentally mean that it’s often a mistake to mix mmap with read/write at the same time.

So, simply, don’t.

Update: Oh well, 5.1: –myisam_use_mmap option… Argh.
Update on update: after few minutes of internal testing all mmap()ed MyISAM tables went fubar.

by Domas Mituzas at August 17, 2008 09:51 PM

Arabx

Drizzle has it’s own dedicated feed

For those that have been using Planet MySQL to follow the progress of Drizzle, we now have our own Planet Drizzle.

You can also get a RSS feed directly from http://feeds.feedburner.com/drizzle

by Ronald at August 17, 2008 04:38 PM

Brian Moon

doughboy

So, I wrote about the begining of our wild database issues. Since then, I have been fighting a cold, coaching little league football and trying to help out in getting our backup solutions working in top shape.  That does not leave much time for blogging.

Never again will we have ONLY a cold backup of anything.  We were moving nightly full database dumps and hourly backups of critical tables over to that box all day long.  Well, when the filesystem fails on both the primary database server and your cold backup server, you question everything.  A day after my marathon drive to fix the backup server and get it up and running, the backup mysql server died again with RAID errors.  I guess that was the problem all along.  In the end, we had to have a whole new RAID subsystem in our backup database server.  So, my coworker headed over to the data center to pull the all nighter to get the original, main database server up and running.  The filesystem was completely shot.  ReiserFS failed us miserably.  It is no longer to be used at dealnews.

Well, today at 6:12PM, the main database server stops responding again.  ARGH!!  Input/Ouput errors.  That means RAID based on last weeks experience.  We reboot it.  It reports memory or battery errors on the RAID card.  So, I call Dell.  Our warranty on these servers includes 4 hour, onsite service.  They are important.  While on the phone with Dell, I run the Dell diagnostic tool on the box.  During the diagnostic test, the box shuts down.  Luckily, the Dell service tech had heard enough.  He orders a whole new RAID subsystem for this one as well.

There is one cool thing about the PERC4 (aka, LSI Megaraid) RAID cards in these boxes.  They write the RAID configuration to the drives as well as on the card.  So, when a new blank RAID card is installed, it finds the RAID config on the drives and boots the box up.  Neato.  I am sure all the latest cards do it.  It was just nice to see it work.

So, box came up, but this time we had Innodb corruption.  XFS did a fine job in keeping the filesystem in tact.  So, we had to go from backups.  But, this time we had a live replicated database that we could just dump and restore.  We should have had it all along, but in the past (i.e. before widespread Innodb) we were gun shy about replication.  We had large MyISAM tables that would constantly get corrupted on the master or slave and would halt replication on a weekly basis.  It was just not worth the hassle.  But, we have used it for over a year now in our front end database servers with an all Innodb data set.  As of now, only two tables in our main database are not Innodb.  And I am trying to drop the need for a Full-Text index on those right now.

So, here is to hoping our database problems are behind us.  We have replaced almost everything in one except the chassis.  The other has had all internal parts but a motherboard.  Kudos to Dell’s service.  The tech was done with the repair in under 4 hours.  Glad to have that service.  I recommend it to anyone that needs it.

by Brian Moon at August 17, 2008 06:58 AM

Morgan Tocker

How fast (or slow) is MySQL Stored Procedure language?
I had a long flight from Sydney to Edinburgh this weekend, and wanted to answer a common training question - how fast/slow is the stored proc language in MySQL. To do this, I started by stealing an example exercise we have in one of our exercises:


DELIMITER //
CREATE FUNCTION fibonacci(n INT)
RETURNS DOUBLE
NO SQL
BEGIN
DECLARE f1, result DOUBLE DEFAULT 0.0;
DECLARE f2 DOUBLE DEFAULT 1.0;
DECLARE cnt INT DEFAULT 1;
WHILE cnt <= n DO
SET result = f1 + f2;
SET f1 = f2;
SET f2 = result;
SET cnt = cnt + 1;
END WHILE;
RETURN result;
END //


If I run this a few times, here are the results:

mysql> select benchmark(100, fibonacci(40000));
+----------------------------------+
| benchmark(100, fibonacci(40000)) |
+----------------------------------+
| 0 |
+----------------------------------+
1 row in set (17.94 sec)


Then if I write a simple PHP script that does the same (without any further optimization)...

..
function fibonacci ($n) {

$f1 = 0.0;
$result = 0.0;
$f2 = 1.0;
$cnt = 1;

while($cnt <= $n) {
$result = $f1 + $f2;
$f1 = $f2;
$f2 = $result;
$cnt = $cnt+1;
}

return $result;
}
..

How long does it take?

$ php fib.php 40000 100
Finding fib 40000, 100 times
Took 1.7208609580994 seconds


Conclusion: 17.94 seconds versus 1.72 seconds, so MySQL is ten times slower!

There's a small amount of overhead added to MySQL because the procedure has to load up/deconstruct 100 times and build a result to return, but by another test I think this only accounts for 0.19 seconds.

August 17, 2008 03:52 AM