It can be a tough slug sometimes.  Whether it's gaming, work, something you do for fun, for money or any other reason.  Sometimes things really drag, but once in a while there's the moment that reminds you why you do it.  The paycheque, the achievement, the breakthrough.

Sometimes, it's just plain excitement and satisfaction of a job well done.  It's a bit of a quandary, did I gravitate towards programming because of the excitement, or was that why I gravitated towards programming.  Either way, while it can be downright painful to drudge through writing a lengthy form, or plowing through tedious bugs, there are the moments of personal academic breakthrough that turn it around.

Now to the lecture.  I've mentioned previously about having a large update coming to The VG Press.  Despite moving to a much improved server, part of this is improving code efficiency.  When it comes to performance on the web, it's almost laughable how dependent it is on database queries.  It's so heavily tilted, that you may as well completely ignore everything else that isn't a query when it comes time to worry about performance.

Thusly that's where it begins.  By having my database handler class output every query when a page loads, I was able to pick out inefficiency.  A significant issue was repeated queries per forum post for votes and stars.  The reason was that previously I was unable to optimize this to include it in the initial query.  Break in attempt number two.

You see, GROUP functions have multiple places where they can be used, but one of them is so terrible in performance, it ought to be eliminated altogether.

SELECT a.id, SUM(b.num) AS total 
FROM a, b
WHERE a.id = b.aId
GROUP BY a.id


Selecting a GROUP value directly requires it run the function on every row, which is horribly inefficient and will cause queries on large tables to take several seconds or even minutes to complete.

Alternatively, and far more efficiently, you can JOIN onto a subquery.

SELECT IFNULL(total, 0)
FROM a
LEFT JOIN
    (SELECT aId, SUM(num)
     FROM b
     GROUP BY aId) AS bTotals
     ON a.id = bTotals.aId


(disclaimer: I have not tested these specific queries, so apologies for any errors)

Each individual query has overhead as well, so doing many small queries is worse than the sum of their execution times.

Consequently, when the database is so determinant of performance, and multiple queries or direct selections of group functions are so terrible, it's a big win to combine complicated logic into a single efficient query.

No one may ever actually be able to see what happens, but that's not the point.  The sum of such efforts are the fruit of the labour, and oh how sweet it is.
Posted by Ellyoda Tue, 10 Aug 2010 02:55:34 (comments: 2)
 
Tue, 10 Aug 2010 05:29:02
Well, it may not be seen, but it is appreciated.
 
Tue, 10 Aug 2010 11:57:37
^Absolutely.
Log in or Register for free to comment
Recently Spotted:
Dvader (6m)
Login @ The VG Press
Username:
Password:
Remember me?