It is a common practice to work with large datasets. For example, to retrieve data from an API endpoint and store it into an array so later we can iterate over the array and manipulate the data according to the specific needs. Even though this can be fairly common practice, sometimes the dataset can increase in size so much that it can cause memory overflow.
In cases like this when we have a big dataset or we expect the data to increase in the future, generators can come to the rescue. A generator in PHP is a function that allows us to iterate over data without needing to build an array in memory. Unlike a standard function, which can return only a single value, a generator can yield as many values as it needs to.
When a generator function is invoked, as a result, we get a Generator object that can be iterated over. For example, if we use the foreach loop to go through the object, PHP will call the generator function each time it needs a value, then saves the state of the generator when the generator yields a value so that it can be resumed when the next value is required. Once there are no more values to be yielded, the generator function behaves the same as when an array has run out of values.
2 Array Iteration vs. Generators
In order to make a practical example of the generator class power, we will make a comparison with a function that populates an array with a custom range of values given by the user.
The function is as follow:
If we invoke the function to populate the array with range values 1-500, the function is executed without any problem as shown in the image below.
This approach is working as expected and as we mentioned previously is widely used, the problem occurs when we have a big range and to populate the array consumes a lot of memory.
In the next example for the end of the range, we will use PHP_INT_MAX– the largest integer number that the current version of PHP can reach.
After the execution, we get the error that the allowed memory size is exhausted.
Let’s try the same example but this time we are going to use generators to get the needed values displayed.
When we execute the code, we don’t get the same error which means we are not running out of memory. When we yield the value, we return the value when it is needed, which means we are not keeping the entire dataset in memory.
There are other possible solutions for this particular problem like going into php.ini and increasing memory_limit but the question is if that approach is effective and whether we want our code to use all the server memory.
3 Generator Features
3.1 Yielding values with keys
When working with generators it is possible to return key-value based data in a very similar manner to that used to define an associative array, as shown below.
As input data, we will take a set of users with different skills and as output, we will yield only those users who have PHP in their skill set.
3.2 Sending Values to Generator
There is the possibility for generators to accept values, which means if there is a particular need we can inject values in the generator function, this can be done in different ways, for example, we can use the value as input in some command.
To illustrate this, we will use the first example with the custom range function.
3.3 Returning Values from Generator
As in the previous example, when we injected value into the generator function in the next one we will return the value once it has finished executing.
If we want to get the first element from the generator function we can’t do it like we normally do with arrays (array ), we will get an error on execution, but we can make use of the generator methods current () and next () for that purpose.
In addition to the previously mentioned usage, generators are useful when we are dealing with data import & export. For example, reading a CSV file with a lot of rows and manipulating the raw data according to the requirements.
In Laravel, as one of the most popular PHP frameworks, generators are used for lazy-loading collections since version 6.
Generators offer a significant performance boost that can limit the memory usage for a large set of data. When we are talking about performance and optimization, we are always balancing between what we can improve on the cost of a different aspect of the application. In that context when generators are executed we are not going to get a memory overflow error, but we need to be careful about the execution time that is needed in order all the data to be obtained. The point here is that we should make the most of the powerful features of the generator, but we need to consider all the other aspects of the application and choose the appropriate solution for our problem.