Sorting arrays in PHP without using sort, ksort, array_multisort, etc.

Posted: 2010/09/27 in Development, PHP

I had an interview the other day at a pretty intense company that uses LAMP stack(s) to do MASSIVE amounts of data sorting, filtering, etc. Interesting, to me, was the use of PHP and MySQL… although they are finally getting around to using some C/C++ for some of their heavy lifting.

The interview was, well, intimidating. They stated at the beginning, “Don’t get nervous, we just have to see how far you knowledge goes. We have to get you to a place where you cannot answer the question/scenario.” Great learning experience, and a bit humbling, too.

One scenario posed was this: sort this ‘two column’ array without using PHP’s built-in function nor any libraries. This was pretty normal: create your own version of a PHP function. Honestly, I couldn’t do it. Don’t know if it was nerves, lack of knowledge, fatigue, or the mental ‘trick’ that I only had about 4 feet square of white-board to draw out my plan – no way I could figure it out or try alternatives in a 2′ x 2′ space 🙂  So I came home and figured it out.

The gist of the code is this: there is a class that has an actually sorting method, an initial sort method (gotta start with one ‘column’ getting sorted), and a ‘grouping’ method. First round is to sort by a given ‘column‘ (a key-name or an index). The subsequent rounds sort based on, typically, the last sort column sorted-by.

In the example in the code below, the first ’round’ is a sort by State (index of 1). Then sort of Cities (index 0) per State (all of Arizona is sorted alphabetically, then California, etc.). Lastly, a sort of ‘zip’ inside each City. I did a few things different in the last (zip) sort:
1) used a key, not an index to sort by (still used an index for ‘grouping’ since the just-previous sort was on an indexed ‘column’)
2) included a few identical zip-codes in a given city (Tacoma) jut to see what happens

Code notes-o’-interest:
1) I always check that the key/index exists by just looking inside the first ‘row’ ($inArray[0]). If it’s not there, then return FALSE (which I don’t ‘handle’ for brevity of this post!)
2) $Final____[] is the array that, ultimately, is fully sorted as desired ($FinalArray[] and $FinalTemp[])
3) First and Last rows are often handled directly – I don’t like that I have to do that, but the overhead seemed minimal (ex. a quick if count==0). If you have an idea of how to get around this, please show me up!
4) I used the presumably less-efficient ‘strcasecmp()‘ in the ->oneColumnSort method rather than sort integers and strings differently (notice the ‘casting’ of (string) in each part of the strcasecmp() function).
5) I use a variable $allDone (in ->oneColumnSort method) to indicate that a given row has been inserted into the $Temp array – so now just throw the remaining rows onto the end of $Temp and go to the next $inArray row.  This was for ‘speed’, but seems pretty hack-ish to me.
6) As noted above, the ->furtherSort method requires that the ‘grouping‘ key/index column already be sorted since this method compares the ‘current’ row’s ‘grouping column‘ to the next row’s grouping column in an effort to find out if there is a group of values that need to be sorted by the $nowSortBy key/index.

To take it further:
1) I’d like to send in a list-of-columns (indexes and/or keys) and the array to be sorted and have the class iterate through the list-of-columns, sorting/grouping in the order of the list, and return a finished array.
2) I’d like to pull out the if (count$FinalArray) == 0 check (beginning of foreach loop in ->oneColumnSort if possible; I was trying to not use deep-reaching PHP functions like array_shift() but I’m not really liking how that if {} has to be run every time.
2.1) I’m using array_merge at the end of ->furtherSort.  That is ‘cheating’ according to the rules I set for myself, but I get so frustrated concatenating arrays!  And since I can’t guarantee that the array will be keyed (which is one place where ‘ + ‘ works… kind’a), and I didn’t want to foreach the rest of the array onto the temporary one… I just array_merge()‘d.
3) I need better variable names than $i and $j in the for{} series.  Some days I just can’t think of good names – team collaboration needed.

Here’s the code!

<?php
// Sorting arrays

/*
 * This code is free software; you can use it, redistribute it, and/or modify as you wish.
 * This code is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY - implied or otherwise.
 * This code is distributed "as is."  All risk and cost are assumed by the user of the code and not the creator thereof.
 * If you want to give attribution to the original creator of the original code, his name is David Malouf and he is, probably, available at EmailTheDavid@gmail.com
 */

class multiSortAnArray {

  private $theArray = array();

  public function __construct($initialIncomingArray) {
    $this->theArray = $initialIncomingArray;
  }

  public function getArray() {
    return $this->theArray;
  }

  /**
   * Sort based on a give column/key/position-in-the-array
   * @param array $inArray The INcoming multi-level or multi-column array to be sorted
   * @param string|num $col Indicates which key/column is to be used for sorting
   * @return array
   */
  private function oneColumnSort($inArray, $col) {

    if (!array_key_exists($col, $inArray[0])) {
      return FALSE;
    }

    $FinalArray = array();

    foreach ($inArray as $passedInRow) {
      // if first row, throw into FinalArray[]
      if (count($FinalArray) == 0) {
        $FinalArray[] = $passedInRow;
        continue;
      }

      $allDone = 'No';
      $Temp = array();
      foreach ($FinalArray as $fromFinal) {
        // if $passedInRow has been 'inserted', put remaining rows of $FinalArray onto the end
        if ($allDone == 'Yes') {
          $Temp[] = $fromFinal;
          continue;
        }

        // 0 means ==, < 0 means 1st < 2nd ( > 0 means 1st > 2nd)
        $comparison = strcasecmp((string) $passedInRow[$col], (string) $fromFinal[$col]);
        if ($comparison <= 0) {
          $Temp[] = $passedInRow;
          $Temp[] = $fromFinal;
          $allDone = 'Yes';
        } else {
          $Temp[] = $fromFinal;
        }
      }

      // if $allDone STILL == 'No', this means $passedInRow[$col] > all remaining rows in $FinalArray
      if ($allDone == 'No') {
        $Temp[] = $passedInRow;
      }
      $FinalArray = $Temp;
    }
    return $FinalArray;
  }

  public function initialSort($col) {
    $this->theArray = $this->oneColumnSort($this->theArray, $col);
  }

  /**
   * Groups rows that have the same value in a PREVIOUSLY SORTED array and has the group sorted itself
   * @param string|num $primarySortCol Sorting occurs when there is more than one value in THIS key/col.
   * @param string|num $nowSortBy  Indicates which key/column is to be used for sorting
   * @return array
   */
  public function furtherSort($primarySortCol, $nowSortBy) {
    if (!array_key_exists($primarySortCol, $this->theArray[0]) || !array_key_exists($nowSortBy, $this->theArray[0])) {
      return FALSE;
    }

    $cnt = count($this->theArray) - 1;
    $FinalTemp = array();

    for ($i = 0; $i <= $cnt; $i++) {
      // if [$i][$primarySortCol] != next one, then just add to FinalTemp and move on
      if ($this->theArray[$i][$primarySortCol] != $this->theArray[$i + 1][$primarySortCol]) {
        $FinalTemp[] = $this->theArray[$i];
        continue;
      }
      // if there ARE two rows with same primarySort, then get the 'group' that THIS
      //  primarySort value in a temp array.  Then sort temp array and add to $FinalArray
      $temp = array();
      for ($j = $i; $j <= $cnt; $j++) {
        if ($this->theArray[$j][$primarySortCol] == $this->theArray[$j + 1][$primarySortCol]) {
          $temp[] = $this->theArray[$j];
          continue;
        } else {
          // pickup last row as it will NOT compare to be == with the next row!!
          $temp[] = $this->theArray[$j];
          break;
        }
      }
      $sortedTemp = $this->oneColumnSort($temp, $nowSortBy);
      $FinalTemp = array_merge($FinalTemp, $sortedTemp);
      $i = $j;
    }
    $this->theArray = $FinalTemp;
  }

}

$initialArray = array(
    array('Tacoma', 'Washington', 'zip' => 98156),
    array('Tacoma', 'Washington', 'zip' => 98155),
    array('Phoenix', 'Arizona', 'zip' => 85260),
    array('Seattle', 'Washington', 'zip' => 98001),
    array('Federal Way', 'Washington', 'zip' => 98005),
    array('Federal Way', 'Washington', 'zip' => 98003),
    array('Federal Way', 'Washington', 'zip' => 98004),
    array('Anaheim', 'California', 'zip' => 93206),
    array('Dallas', 'Texas', 'zip' => 58226),
    array('Los Angeles', 'California', 'zip' => 43221),
    array('Tucson', 'Arizona', 'zip' => 85250),
    array('Flagstaff', 'Arizona', 'zip' => 85266),
    array('Winslow', 'Arizona', 'zip' => 85999),
    array('Sacramento', 'California', 'zip' => 43220),
    array('Tacoma', 'Washington', 'zip' => 98155)
);

$sortMe = new multiSortAnArray($initialArray);
$sortMe->initialSort(1);
$sortMe->furtherSort(1, 0);
$sortMe->furtherSort(0, 'zip');

echo "<pre>\n";
print_r($sortMe->getArray());
?>

Final outcome:

  City           State           zip
  ====           =====          =====
Flagstaff       Arizona         85266
Phoenix         Arizona         85260
Tucson          Arizona         85250
Winslow         Arizona         85999
Anaheim         California      93206
Los Angeles     California      43221
Sacramento      California      43220
Dallas          Texas           58226
Federal Way     Washington      98003
Federal Way     Washington      98004
Federal Way     Washington      98005
Seattle         Washington      98001
Tacoma          Washington      98155
Tacoma          Washington      98155
Tacoma          Washington      98156
Advertisements
Comments
  1. kamlesh says:

    i wanna sort an array that contains integer no like 4,1,8,2,6,9,3,45,12,23,89,8,44,163,474,656,47,465,66,45,28,5 etc.

    • Kamlesh,

      PHP has excellent sorting (this post was about trying to emulate PHP’s sorting). You can use sort() and asort() in your situation…
      – sort() simply sorts the numbers
      – asort() sorts the numbers but if the array uses keys (ex. array ( “apple”=>4, “lemon”=>1, “orange”=>8, “pear”=>2)), the numbers will be sorted but they won’t loose their connection with their key (e.g. “asort-ed” = array (“lemon”=>1, “pear”=>2, “apple”=>4, “orange”=>8).

      IF, however!, you are wanting to use the above class and there is a column that has the integers you show in your array, then know that the above class will work not work. This is because the class uses the function strcasecmp() with it’s parameters cast as strings. Thus, 15 is greater than 100 because strcasecmp() looks at the first digit (not the one’s digit!) and compares – in this case both compare equal with “1”. Then it compares the next digit – in this case the 5 in 15 is greater than the 0 in 100 so strcasecmp() returns the value 5 (essentially 5 – 0) and ends comparison.
      – In the case of the class above, this works fine because I am using ZipCodes which are exactly the same number of digits every time.

      A solution would be to pass in a “String or Integer” flag. If it’s “String”, use the strcasecmp() function. If “Integer”, subtract the the second number FROM the first number to get the same ‘output’ as strcasecmp() and continue.

      Another solution would be to check if both numbers are integers and let that comparison ‘set’ the “String or Integer” flag but this has issues if both numbers happen to be integers but other values in the arrays are not… perhaps 15 needs to be greater than 100 because the arrays contain strings that just happen to be integers at those points. This would be very specific to the situation the class is being used in.

      Hope that helps a little!
      David

  2. Brett says:

    Another solution would be to create a class whose constructor takes an array and a comparison function (or object). The class would then only have one member function “sort”. Your compare function would return 1 for two rows and look something like:

    function compare($a, $b) {
    $ret = strcasecmp($a[1], $b[1]);
    if ($ret == 0) $ret = strcasecmp($a[0], $b[0]);
    if ($ret == 0) $ret = $a[‘zip’] – $b[‘zip’];
    }

    If passing an object (an even better solution) it would have one function “compare”. The great flexibility there is that it could have any number of member vars and funcs for setting the sort parameters. For example, a setSortOrder function could take three inputs: $sortLevel, $column, $comparisonMethod (string, numeric, whatever) and store them in the object. Decoupling the sort from the comparison means your “sort” function can implement any sort method need for the job (quick, bubble, merge, etc.) and your compare can do whatever it wants/needs.

    I like your implementation that copies from the original to the final array as it sorts. I do wonder though what size of array would be required that this would save you a lot of time over an array copy. Isn’t an array copy just a low level memcopy? That should be blazing fast and then you can sort in place, no?

    –Brett

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s