Aug 132007
 

It’s very nice that the LOAD XML patch finally got some attention. I wasn’t expecting it anymore. I worked on this with Erik a year ago. My C programming skill were to rusty at that time, that only my BNF code contribution was used and a lot of issues were never solved. With my C prog skills back up to a decent level, I;m planning to pick up the project. Now seems to be a better time than ever.

The current solution works well in certain situations, but definitely not in all. To insert referential data, you need to specify the id value at forehand, look at project_id in the example. That doesn’t seem like a real world scenario. You should be able to insert the whole XML tree at once creating the reference as you go. This unfortunately doesn’t fit in the current LOAD DATA scheme, on which LOAD XML is based, and therefor requires a lot of additional work.
Continue reading »

Aug 082007
 

I often read articles saying to combine statements and send less queries. But you seldom see advise about splitting queries to improve performance. Here is what I came across just the other day:

SELECT IF(`uid_to`=@user, `uid_from`, `uid_to`) as `uid`, `message`, `date`, `uid_to`=@user as received
  FROM `beepme_msg`
  WHERE (`uid_from`=@user and `uid_to`!=@user) or (`uid_to`=@user and `uid_from`!=@user)
  ORDER BY `uid_to`, `date`;

The query gets all messages send and received by the user, filtering our message the user send to himself. This looks like a good query since you’ll get all you data in one call. In reality it will mess up your performance, since it can’t use any indexes and will therefor use a table scan. And as we know, tables scans are slow.

The ‘OR’ statement basically messes this up, since only one index can be used per table. We can see what happens a bit better if we rewrite the query (not changing behaviour):

SELECT IF(`uid_to`=@user, `uid_from`, `uid_to`) as `uid`, `message`, `date`, `uid_to`=@user as received
 FROM `beepme_msg`
 WHERE (`uid_from`=@user or `uid_to`=@user) and not (`uid_from`=@user and `uid_to`=@user)
 ORDER BY `uid_to`, `date`;

We see that (`uid_from`=@user and `uid_to`=@user) is just an additional filter and doesn’t have to much impact. It is clear that neither an index on uid_from or uid_to can be used, because neither can get a set with all records which fit within the criteria. If an index gets to many records, the selection is filtered to get the correct result. But if an index gets to few records, it simply can not be used.

To solve this we need to split up this query and use a UNION SELECT:

SELECT `uid_to` as `uid`, `message`, `date`, 0 AS `received` FROM `beepme_msg` WHERE `uid_from`=@user and `uid_to`!=@user
UNION SELECT `uid_to` as `uid`, `message`, `date`, 1 AS `received` FROM `beepme_msg` WHERE `uid_to`=@user and `uid_from`!=@user
ORDER BY `uid_to`, `date`;

This is a nice example of how common rules like ‘avoid subqueries and union selects’ are not alway true. So keep using your head (and EXPLAIN) while writing queries.

May 312007
 

Roland Bouman just gave me a call. He’s setting up a MySQL UDF repository, which sounds worth while joining. So I’ve decided to move the libmyxql project there. The lib will be renamed to lib_mysqludf_xql to follow the naming convention. You can find the repository on http://www.xcdsql.org/MySQL/UDF/index.html.

He also gave me a few good tips, including a way to get the name from the column or alias to be used as tagname. This should make the lib a bit more like SQL/XML. The downside is that the API will change again, but it the end it will be much better, so…
He also got his UDFs working for Windows, which I will implement as well. So good news for all you Billy lovers out there.

More about this soon….