Do you want a scalable, high performance PHP application that logs everything in an intelligible manner; in a manner that you can use now, tomorrow and three weeks from now? It's so possible! In a series of posts, I am going to show you how to put it together using Zend Framework, ZeroMQ and Hadoop.In my
recent discussion and rants, some would say, I've been really getting stuck in to what makes a good developer, what makes good code and generally, what is a professional approach to software development and maintenance.
What is High Performance?
Well, I started looking at logging specifically recently. I was talking with a former colleague who pointed me to this excellent article on high scalability application development. It specifically discussed just how much logging you can and should do in an application, yet still be able to have a high performance metric – even in an embedded system.
The points that stood out for me the most were these:
- Store log messages in a circular queue to limit resource usage.
- Write log messages to disk in big sequential blocks for efficiency.
- Every object in your system should be dumpable to a log message.
- Make the log message directly queueable to the log task so queuing doesn't take more memory allocations.
Log to a separate task and let the task push out log data when it can.
Use a preallocated buffer pool for log messages so memory allocation is just pop and push.
It really got me to thinking, because all to often, logging is an after thought, or at the very least, not very well thought out from a longer-term perspective. So I thought, what if you could log in an intelligible manner, massive amounts of information, yet still be able to do it in a high-performance manner. How would you do it?
How will it work? Zend / ZeroMQ / Hadoop
Given my propensity for Zend Framework, I tended to it straight-away as the core base (ok, it's not, out of the box, the fastest, but this is a proof of concept), then combining it with ZeroMQ through PHP's extension to it, excellently written by Mikko Koppanen. If you're not familiar with ZeroMQ, check out the superb talk given by my friend Ian Barber at this years PHP London.
With ZeroMQ, we could setup a publish/subscriber configuration, where objects, appropriately written, could be published to the pipe with the relevant information, such as:
- event timestamp
- retrieved data (from a SOAP/Rest request)
- configuration/settings information
- event/class/function information
The process would be published asynchronously, so as to not slow down the core PHP app. That way, we could, theoretically, have both a high performance application, with comprehensive logging, managed by another PHP process, or a process written in a different language entirely, logging to anything from the filesystem, a NoSQL database, Amazon S3 or a database – as your imagination, time and budget allowed.
So in the second series, we're going to look at integrating a combination of Zend Framework, ZeroMQ and the Hadoop NoSQL database to store the log information. We're going to cover:
- Designing objects to be dumpable quickly and effectively
- A publisher/subscriber configuration of ZeroMQ that's contactable from ZF
- A PHP process to subscribe to the ZeroMQ server, retrieve the dumped objects and store them in the Hadoop database
Now that's a pretty lofty set of goals, as you can imagine. So we're not going to cover security or MapReduce. Maybe later, but not for the time being. However, if you're keen to contribute, I'm keen to hear from you.
If you liked what you read and would like to see more, please retweet it, or give it a like on Facebook or even give it some Digg love. And we always value you feedback and comments.