CakePHP offers a good selection of tools to help you retrieve the data. Recently, I came into a situation where I needed to find and paginate results based on a single, distinct column in the table. Distinct data can be tricky, especially if the tools do not allow you to select the distinct based on a column. Distinct will check all columns returned, and coupling in time stamps, 99% of the time all rows will be distinct. So how do you grab the data? Well, first lets examine the sample data that is needed to be extracted first.
The sample data is in a MS SQL Server database. The table contains a record ID, title id, author id, genre, type, last check out date, and edit date. It is possible to have duplicate title, author IDs in the table. We need to extract all DISTINCT title IDs, along with the other information listed where the type is not a paperback, and provide a paginated list. I am sure this would be better architected if needed in the real world. Paginate will only get us so far, as this would only show all records.
class BooksController extends AppController { var $paginate = array( 'order' => array('Book.id' => 'desc'), 'fields' => array('Book.id', 'Book.title_id', 'Book.author_id', 'Book.genre_id', 'Book.type', 'Book.check_date', 'Book.edit_date'), 'limit' => 15, ); . . . function index(){ $this->set('hardback_books', $this->paginate()); } }
We need to use more to build a conditional query so the paginate will query against this. We can use CakePHP’s data source to help in this. Now, we could also just write this query out ourselves, but this is helpful to know so when you have to build sub-queries for other items. All data is in MS SQL Server, and we can use normal SQL expressions, but we need to grab DISTINCT data, which goes by rows, not columns, which means we will need to do 2 sub-queries in addition to the main one. So we first need to grab a list of the TOP 1 items. This will be our inner query.
SELECT TOP 1 * FROM [books] AS [bk_inner] WHERE [bk_inner].[title_id] = [Book].[title_id] AND [bk_inner].[type] <> 'paperback'
Next, we need to encapsulate that query with an outer one that will select all items which match up to the main query ID.
SELECT * FROM ( SELECT TOP 1 * FROM [books] AS [bk_inner] WHERE [bk_inner].[title_id] = [Book].[title_id] AND [bk_inner].[type] <> 'paperback' ) AS [bk_outer] WHERE bk_outer.[title_id] = Book.[title_id]
So we have the queries, and it needs the main query needs to constrain the results that exists in the sub-queries.
SELECT TOP 15 [Book].[id], [Book].[title_id], [Book].[author_id], [Book].[genre_id], [Book].[type], CONVERT(VARCHAR(20), [Book].[check_date], 20) CONVERT(VARCHAR(20), [Book].[edit_date], 20) FROM [books] AS [Book] WHERE EXISTS ( SELECT * FROM ( SELECT TOP 1 * FROM [books] AS [bk_inner] WHERE [bk_inner].[title_id] = [Book].[title_id] AND [bk_inner].[type] <> 'paperback' ) AS [bk_outer] WHERE bk_outer.[title_id] = Book.[title_id] ) ORDER BY [Book].[id] desc
We have the final full query. Now how do we get that? First, we need to invoke the getDataSource() method.
class Book extends AppModel { . . . function getHardbackBooks(){ $dbo = $this->getDataSource();
Next we need to use the buildStatement() to build each statement. Since CakePHP will build a sub query with this, we have to do this twice: once for the inner query, and once for the outer query. The “table” for subquery2 will actually be subquery1, so we need to add that as a “table” in the array.
$subquery1 = $dbo->buildStatement( array( 'fields' => array('TOP 1 *'), 'table' => $dbo->fullTableName($this), 'alias' => 'bk_inner', 'limit' => null, 'offset' => null, 'joins' => array(), 'conditions' => 'bk_inner.title_id = Book.title_id AND bk_inner.type <> \'paperback\'', 'order' => null, 'group' => null ), $this ); $subQuery2 = $dbo->buildStatement( array( 'fields' => array('*'), 'table' => '(' . $subquery1 . ')', 'alias' => 'bk_outer', 'limit' => null, 'offset' => null, 'joins' => array(), 'conditions' => 'bk_outer.[title_id] = Book.[title_id]', 'order' => null, 'group' => null ), $this );
Now, we need to make sure we add an EXISTS:
$subQuery = ' EXISTS (' . $subQuery2 . ') '; return $subQuery;
Return the data from the model to the controller. In the controller function we need to add a new condition to the paginate. In the conditions, we do not need to use a paired item value to set it, we can use the straight SQL returned from the model.
class BooksController extends AppController { var $paginate = array( 'order' => array('Book.id' => 'desc'), 'fields' => array('Book.id', 'Book.title_id', 'Book.author_id', 'Book.genre_id', 'Book.type', 'Book.check_date', 'Book.edit_date'), 'limit' => 15, ); . . . function index(){ $data = $this->Book->getHardbackBooks(); // Set to the paginate object conditions $this->paginate['conditions'] = array($data); $this->set('hardback_books', $this->paginate()); } }
And it returns the items based on the paginate parameters, ready to use in the view. It provides a DISTINCT list. And yes, I know I used more than 400 words in this one. It was closer to 500 without the code. Oh well, maybe tomorrow will be shorter.