Jan 092009
 

Simple replacement
String replacement is often used as a way to apply templating. You might replace “%a:test” with “~~test~~” using the regexp: %a:(\w+), replacing it with “~~$1~~”.

Trying to escape
The only problem now, is that I can’t use “%a:” any more within my string. This could be solved by allowing escaping using the backslash. In the regexp we can use a negative lookbehind to see if the character before the % isn’t a backslash: (?<!\\)%a:(\w+).

Escaping the escaping
Now we’re close, however now it’s not possible to use “\%:a” anywhere. We need to be able to escape the backslash as well. We could state the problem as needing to match %a if there isn’t an uneven number of backslashes in front of it. Checking for an uneven number in a negative lookbehind isn’t possible unfortunately, so we need to get the backslashes into the match. We can say: match 0 or more pairs of backslashes, followed by “%a:”, if there is no backslash in front of it. This results in the regexp:
(?<!\\)((?:\\{2})*+)%a:(\w+), replacing it for “$1~~$2~~”.

To finish up
To only thing is that \% and \\ will still be displayed as that. This can simply be solved with a str_replace.

Oct 312008
 

Data stored in a database is often also useful for statistical purposes. If you own a web-shop you want to be able to create a report about turnover. You can get statistical information by using GROUP BY, eg.

SELECT DATE_FORMAT(invoice.date, '%M') AS `month`, COUNT(*) AS `invoice_count`, SUM(`invoice`.`amount`) AS `turnover`
FROM `invoice`
WHERE `date` BETWEEN '2008-01-01' AND '2008-12-31'
GROUP BY MONTH(`invoice`.`date`)
month     invoice_count  turnover
January   84             9532.26
February  141            20857.61
March     91             10922.71
April     112            15044.48
May       101            9676.60
June      137            12860.88
July      281            34291.20
August    191            26377.66
September 103            16324.78
October   99             12873.23

If you are selling a wide variety of products, you might like to see the turnover for each product category. You could do this with a simple GROUP BY as:

SELECT DATE_FORMAT(`invoice`.`date`, '%M') AS `month`, `category`.`description` AS `category`, COUNT(*) AS `product_count`, SUM(`invoice_product`.`amount`) AS `turnover`
FROM `invoice` INNER JOIN `invoice_product` ON `invoice`.`id` = `invoice_product`.`invoice_id` LEFT JOIN `product` ON `invoice_product`.`product_id` = `product`.`id` LEFT JOIN `category` ON `product`.`category_id` = `category`.`id`
WHERE `date` BETWEEN '2008-01-01' AND '2008-12-31'
GROUP BY MONTH(`invoice`.`date`), `category`.`id`
month     category   product_count  turnover
January   Hardware   62             4821.31
January   Software   51             4419.41
January   Cables     12             291.54
February  Hardware   71             8408.93
February  Software   101            11726.36
February  Cables     17             312.32
February  Other      2              410.00
March     Hardware   21             2371.58
March     Software   81             8238.81
March     Cables     13             312.32
...

This would give you each category in a different row, ordered by month. Though this contains all the information the format is far from nice. Instead you would like to have 1 row per month with each category as a column as the information about the invoices as well.

SELECT DATE_FORMAT(`invoice`.`date`, '%M') AS `month`, COUNT(DISTINCT `invoice`.`id`) AS `product_count`, COUNT(*) AS `invoice_count`, SUM(`invoice_product`.`amount`) AS `turnover`,
  SUM(`product`.`category_id`=1) AS `hardware_count`, SUM(IF(`product`.`category_id`=1, `invoice_product`.`amount`, 0)) AS `hardware_turnover`,
  SUM(`product`.`category_id`=2) AS `software_count`, SUM(IF(`product`.`category_id`=2, `invoice_product`.`amount`, 0)) AS `software_turnover`,
  SUM(`product`.`category_id`=3) AS `cables_count`, SUM(IF(`product`.`category_id`=3, `invoice_product`.`amount`, 0)) AS `cables_turnover`,
  SUM(`product`.`category_id`=4) AS `other_count`, SUM(IF(`product`.`category_id`=4, `invoice_product`.`amount`, 0)) AS `other_turnover`
FROM `invoice` INNER JOIN `invoice_product` ON `invoice`.`id` = `invoice_product`.`invoice_id` LEFT JOIN `product` ON `invoice_product`.`product_id` = `product`.`id` LEFT JOIN `category` ON `product`.`category_id` = `category`.`id`
WHERE `date` BETWEEN '2008-01-01' AND '2008-12-31'
GROUP BY MONTH(`invoice`.`date`), `category`.`id`
month     invoice_count  turnover    hardware_count  hardware_turnover  software_count  software_turnover  cables_count  cables_turnover  other_count  other_turnover
January   84             9532.26     62              4821.31            51              4419.41            12            291.54           0            0
February  141            20857.61    71              8408.93            101             11726.36           17            312.32           2            410.00
March     91             10922.71    21              2371.58            81              8238.81            13            312.32           0            0
...

The big downside of this method is that you need to modify the query if a category is added. This can be solved though by dynamically creating the query in a PHP script.

If you want to do advanced statistical you should have a look at OLAP cubes. Pentaho is an open-source reporting app which supports MySQL. http://mondrian.pentaho.org/

All data is fictional. The SQL queries are untested.

Jul 292008
 

I was reading this month’s php|architect. It has a nice article about EAV modeling. I had seen this db structure in other project, but didn’t know that it was called EAV. For those who don’t read php|architect, EAV describes a method of saving a large set of attributes, only some of which apply to an individual entity. Normally you would create a table, with a row for each entity and save each attribute in a column. With EAV you save each attribute as a row.

This makes selecting the data quite tedious. If you can life with some constraints, there is an easier way to do this.
Continue reading »

Jun 302008
 

Just a short one today, because I’m really busy (un)fortunately.

You can take out and check each argument at a time:

1
2
3
4
  $search = isset($_GET['search']) ? $_GET['search'] : null;
  $page = isset($_GET['page']) ? $_GET['page'] : 1;
  $limit = isset($_GET['limit']) ? $_GET['limit'] : 15;
  // etc
  $search = isset($_GET['search']) ? $_GET['search'] : null;
  $page = isset($_GET['page']) ? $_GET['page'] : 1;
  $limit = isset($_GET['limit']) ? $_GET['limit'] : 15;
  // etc

But to do it in one go, just do:

1
  $args = $_GET + array('search'=>null, 'page'=>1, 'limit'=>15);
  $args = $_GET + array('search'=>null, 'page'=>1, 'limit'=>15);

Note that in some cases it is important to filter your input, that is not done here.

May 312008
 

How to get a file extension in PHP:

1
$ext = pathinfo($file_name, PATHINFO_EXTENSION);
$ext = pathinfo($file_name, PATHINFO_EXTENSION);

How to get a file extension in Perl:

1
my $ext = ($file_name =~ m/([^.]+)$/)[0];
my $ext = ($file_name =~ m/([^.]+)$/)[0];

How to get a file extension in Ruby:

1
ext = File.extname(file_name)
ext = File.extname(file_name)

How to get a file extension in Bash:

ext=${file_name##*.}
name=${file_name%.*}

How to get a file extension in Python (thanks to Jensen):

import os
ext = os.path.splitext(file_name)[1]


Got more? Please post a comment.

Mar 072008
 

It’s time for a new article in the ‘How I PHP’ series, which are about methods I use, but aren’t commonly known. This time it is about an Apache module ‘mod_xsendfile’.

Normally when want to let a user download a file, you simply stick it in a dir under the document root and let Apache do the rest.

However in some cases that is not good enough. You might need to do some authenticate first or you need to lookup the actual file name. In that case you would use PHP, which would result in a script looking like this:

1
2
3
4
5
6
7
8
9
authenticate(); # authenticate and authorize, redirect/exit if failed
$file = determine_file();
 
if (!file_exists($file)) trigger_error("File '$file' doesn't exist.", E_USER_ERROR);
 
header("Content-type: application/octet-stream");
header('Content-Disposition: attachment; filename="' . basename($file) . '"');
header("Content-Length: ". filesize($file));
readfile($file);
authenticate(); # authenticate and authorize, redirect/exit if failed
$file = determine_file();

if (!file_exists($file)) trigger_error("File '$file' doesn't exist.", E_USER_ERROR);

header("Content-type: application/octet-stream");
header('Content-Disposition: attachment; filename="' . basename($file) . '"');
header("Content-Length: ". filesize($file));
readfile($file);

This means PHP has to read in the file, which goes through the output buffer, is flushed to Apache and processed before send to client. In this small I didn’t specify any other headers Apache normally sends like last-modified. If I want to actually make the caching based on last-modified work, I need to check the if-modified-since request header, check the mtime of the file and send a 304 result header. (I’m to lazy right now to write a code example for that, sorry)

Wouldn’t it be nicer to tell Apache, please send that file, and be done with it. Well, you can. When you enable mod_xsendfile in Apache, you can send an X-SendFile header, which is processed by Apache.

1
2
3
4
5
6
authenticate(); # authenticate and authorize, redirect/exit if failed
$file = determine_file();
 
header("X-Sendfile: $somefile");
header("Content-type: application/octet-stream");
header('Content-Disposition: attachment; filename="' . basename($file) . '"');
authenticate(); # authenticate and authorize, redirect/exit if failed
$file = determine_file();

header("X-Sendfile: $somefile");
header("Content-type: application/octet-stream");
header('Content-Disposition: attachment; filename="' . basename($file) . '"');

Note that this technique was copied from Lighttp, so if you’re using that it will also work.

Mar 042008
 

The normal way to save a state, like the fact that a user is logged in, is to use sessions. Session work really nice, you can save all kind of data on the server (in a temp file) which is identified by a hash which is known on the client in a cookie or by url rewriting.

However if you only need to save a small bit of login info, like a username and timestamp, using sessions is a bit of an overkill. Also, if your system would grow beyond to a size where you need load balancing over multiple servers, having a solution with sessions would be somewhat of a burden. You would need to store them in a centralized place, like a DB server.

Another way to do this, is to create an authentication hash. The idea is fairly simple: Join the information you want to known into a single string and append an md5 key of all the info + a secret word. On each request, check if the hash is available and correct, otherwise redirect the user to a login form. You can use the hash the same way PHP uses the session hash, using cookies or URL rewriting.

Feb 162008
 

Sometimes you want things that simply look impossible in PHP. A few years ago I was using the DB library from PEAR. I wanted to add some functionality to the DB_Result class, however since the result object is created in the DB class and the class used for that isn’t configurable, I was kind of stuck there.

I solved this problem by writing a class extending the DB_Result class. The class had which used reflection to pull all the info out of the DB_Result class and created a new DB_Result_Ext object. In the end I decided that I didn’t like DB class at all (nor MDB2 for that mather), so I threw away this dirty code, however I always felt that there had to be a better way in solving these kind of issues.

The new solution for this problem I’m about to show, is really a dark corner of PHP and perhaps even dirtier than the other dirty solution. However the method can get you out of a jam when this situation arises.

PHP has a function serialize, which can create a hash from any type of variable, scalars, array, but objects as well. Using the unserialize function, PHP can recreate the variable from the serialized hashed. If we look at how an object is serialized, we see only the properties and the class name are stored. Serializing an object of the ‘mother’ class, with a protected variables ‘$myvar’, would give the following hash:

O:6:"mother":1:{s:8:"�*�myvar";i:5;}

We see an ‘O’ for object, the string length of the class name, the quoted class name and the properties. What would happen if we would change the class name and unserialize?
Continue reading »

Jan 242008
 

Before I start
I’ve always stayed away from writing a performance topic. There seems to be a lot of performance enthusiasts, benchmarking anything they can, in order to scoop off a few more percent of performance time. To my opinion, if your system is already reacting within an acceptable fashion, there is no need to try to further improve performance. Maintainability is a much more important topic to look at, at that point.

Finding out what is the problem
So you have a script which is not performing the way you want to. The first thing you should do it try to find out what the problem is. There are some tools out there that can help you. If you’re using Zend Studio, there is a profiler which shows you a tree of files and functions with the amount of processing time. (Warning: With Zend Studio 5, you need to close your project and all open files, otherwise you’ll get incorrect values). There is also a profiler in XDebug, which is a bit harder to set up, but works just as well (or even a bit better actually).

If you don’t have the option to install any of these tools, you can echo (or use FirePHP to output) the execution times, using the following script:

1
2
3
4
5
$_ENV['exec_start_time'] = microtime(true);
....
echo "<!-- ", "After doing XYZ: ", (microtime(true) - $_ENV['exec_start_time']) * 1000, " ms", " -->\n"
....
echo "<!-- ", "After doing SOMETHING: ", (microtime(true) - $_ENV['exec_start_time']) * 1000, " ms", " -->\n"
$_ENV['exec_start_time'] = microtime(true);
....
echo "<!-- ", "After doing XYZ: ", (microtime(true) - $_ENV['exec_start_time']) * 1000, " ms", " -->\n"
....
echo "<!-- ", "After doing SOMETHING: ", (microtime(true) - $_ENV['exec_start_time']) * 1000, " ms", " -->\n"

Most likely the problem will be one of these three (in order of likeliness):

  1. A slow database query. Solving this is outside the scope of this article, but read this PDF about Query tuning for starters.
  2. To much is executed within a loop and/or the number of items for the loop is to big. Read the rest of the article.
  3. To much code is executed. This is probably cause of an abstraction layer that is simply to big or of to many abstraction layers. Try to do some refactoring, removing unnecessary abstraction or bypass abstraction on bottleneck positions.

Continue reading »