A very simple problem we often give to little kids in school is comparing numbers, “If Bob has 5 apples and Anna has 6 apples, who has more apples?”. Most kids will answer Anna, which is the correct answer.
Believe it or not, this is a problem that we often also give to our computers to solve, some very complex operations performed at our most highly evaluated Ecommerce/FinTech… etc.
SaaS companies almost always contain many problems like the one given.
But in real life development, sometimes due to either developer error or using inconsistent and unchecked data, this question can get more difficult…
A much more difficult problem…
Now let’s ask, what would happen if we were to change the above-mentioned problem to “If Bob has ‘ABC’ apples and Anna has 6 apples, who has more apples?” Notice how now this problem makes no sense, how do we compare text ‘ABC’ to the number 6? It is not possible, and if you were to encounter this problem in an exam, you would probably be confused and either skip the problem or try to guess a solution.
Just like you, when computers encounter this problem, they often either fail or produce invalid results that can later result in serious consequences. This is what is known as a type error, when we attempt to perform an operation with two incompatible operands. To prevent this from occurring, all programming languages have some functionality built in to enforce what is known as ‘type safety’.
This is a concept that describes the degree to which a language enforces correctness in data types. But to understand this, we need to very briefly look at data types, type errors and type systems.
Data types: How computers store data and why do we need data types
Computer memory at its simplest approximation works like a light switch. One bit of memory is one light switch, it’s either on (1) or off (0). A bank of 8 switches is one byte. A kilobyte of memory is 1000 of these switch banks joined together and a megabyte is one million.
Due to this, computers can natively store only binary numbers. But problems in the real world use various other types of data. Because of this we have come up with different standardized ways to represent other forms of data with binary numbers, such as ASCII and later Unicode encodings for text characters where we give each symbol of each alphabet in the world (as well as various other symbols used in math and science as well as emojis) a specific binary code which we then standardize for all computers in the world.
Another problem is how to use this data when writing code, storing and declaring kilobytes of data by changing the values of each individual bit would be a massive undertaking, we need a higher level abstraction that will allow us to set the actual data (number, letter… etc) instead of the memory locations themselves. Let’s look at a brief example of how this is done.
An example of how real-world data is stored in memory
Even though in memory all values are represented with binary numbers, depending on the type of data we want to read from a specific section of the computer’s memory we can interpret those binary numbers in different ways.
When declared as a letter or a char (short for character), the binary value 1010 is interpreted as the letter A (using ASCII, in Unicode the letter A is 01000001) when declared as an integer (a whole number) it is interpreted as the number 10. When using the IEEE-754 standard for floating point arithmetic to store a decimal in memory, the 32 bit long binary number 00111111110000001000001100010010 will be interpreted as the decimal number 1,504.
Here the number is represented as: sign * 2^exp * fraction. Bit 32 is the sign 0 means positive, 1 means negative, bits 31-24 are the exponent and the rest are the fraction with a value between 1.0 and 2.
Data types
To accurately and sensibly represent real data in memory when writing code we use data types, in code representations of segments of bytes in memory. They allow the compiler, or any other tool that analyzes the code, for example static analysis tools, to know what the developer intends to do with the declared variables.
At the most basic level we have two types of data types, primitive and composite. Primitives are the most basic types of data, numbers, letters and truth values (char, integer, float, double, Boolean .. etc). Composite types use or combine the primitive types to form new data types that make development easier. A very important member to mention here is the array, a long uninterrupted block of memory that contains multiple members of one primitive type.
Type errors
A type error is the result of performing an operation with unexpected data types, for example you cannot divide the values ‘A’ and ‘B’ (or any other characters), nor can you perform any other math operations on them. You cannot compare a number to a string or an object to a floating-point value.
These errors will usually result in an error and a crash of the affected program, but in some cases, they can have more subtle and dangerous consequences like faulty data being used later in another part of the program.
The best way to prevent type errors is for developers to pay special attention while working to ensure that no conditions exist where an operation can receive incompatible operands, but programming languages also have various ways to prevent type errors or deal with them if conditions arise. The part of every programming language responsible for preventing type errors is the type system.
Type system: What is a type system and what it does
In simple terms a type system is a part of every programming language that is responsible for binding the terms used in code for specific data types (int, bool, float..etc) to actual enforced data types. It is also responsible for assigning and enforcing a set of allowed operations on those types. The type system can also show errors or warnings if the developer introduces a condition for a type error.
Types of type systems
Type systems are a very advanced topic, but luckily, we do not need to cover a lot of it to understand the issue ahead.
There are multiple ways to categorize programming languages, depending on how their type system works.
Some type systems check for type errors before the program even runs and any damage can occur, this is usually done during compiling where the program will either fail to compile outright or will complete with warnings.
This is called static typing, where the type system assigns types to each variable and checks if the assigned type is followed by the developer in the rest of the code. If well implemented this can easily prevent most type errors since the user will be warned if a condition for a type error even exists, before the error occurs.
If the type system only checks the data during runtime, this is called dynamic typing. In this case the types are ‘assigned’ instead, to the data in memory. This allows for type checks that are not possible by static checking, for example in the case of object inheritance, if we’re dealing with an object of type A that is a subtype of type B, if we want to convert a value of type A to type B, this is only allowed if the original object is of type B.
This operation is called downcasting and is only supported on languages that have dynamic typing, regardless is it’s used by itself or together with static typing.
Some typing systems require the developer to explicitly state what type each variable should have, this is known as explicit typing (or manifest, since the types become manifest in the code), while there are also type systems where the type of variables is ‘inferred’ deduced during runtime, this is known as implicit typing.
Strong and weak typed languages
Depending on how strict the type system is with its checks a language can be ‘strong’ typed or ‘weak’ typed. There is no clear definition for these terms but in general a weak typed language will be dynamically and implicitly typed and will silently coerce values from their original type to their expected type when performing operations instead of failing.
A very (in)famous example of this is JavaScript where any type can be coerced into any other type depending on the performed operation. So, you can perform some very bad operations. Adding the number 5 and the letter ‘A’, the result will be a string “5A”, you can do the same with the Boolean value true, true+5 will result in a string “true5”. Adding 5 + ‘5’ will result in ‘55’ whilst adding 5 + any object will result in the string ‘5[Object TypeOfObject]’.
None of these operations will result in an error and the program will continue to run normally, but with faulty values. This ‘chaotic’ behavior is only resolved as of recently with the help of Typescript.
Where does PHP stand?
An important example for today’s story is PHP. This is a language that started out as a quick templating and scripting language and could perform similar magic operations to Javascript. Later it evolved and it slowly added functionality to ensure type safety, amongst other things, to the point where modern PHP 8+ is similarly safe to other modern languages.
This is good news since over 70% of web apps run on it and amongst them, many enterprise applications. Although today it supports specifying types for function arguments and class properties and enforcing them both statically (with tools such as phpstan) and on runtime, it still only supports implicit typing for variables, so they cannot enforce a type for their contents and can be reassigned on the fly. In some cases, this can still cause trouble as we will see later.
A double-edged programming sword
This was a lot of information, and it could be confusing to readers who are not familiar with the subject, but in short, a programming language’s type system determines the ‘strictness’ of the language when it comes to enforcing rules on the developer.
This strictness ensures the code will operate consistently and safely but also that the developer will spend a lot more time covering all cases during development. Setting up a new project where the requirements are still not clear, using a statically typed language, can often be a real pain.
You can easily restrict yourself to only one way of doing things and when a sudden change in requirements hits (which often happens in the early days of a startup) you will have found yourself in a very deep hole that you need to code yourself out of.
On the other hand the less strict the type system is the easier it is to type out a simple MVP or prototype of an application, you just define the business logic and the language will silently coerce data to cover for your oversights, mishaps and sudden changes in direction, that is if you’re okay with seeing, using and storing faulty data every time your program hits unaccounted for conditions.
It does not matter if a certain variable used to hold an object but now needs to hold a JSON string, if all methods using it can work with or tolerate JSON data, you’re fine.
Speed is very often a requirement when building a prototype, especially during the early days of a software startup, where seeing any results is more important than seeing 100% accurate results.
But that startup will eventually grow, and its customers will slowly grow tired of dealing with incorrect results. This is where complaints will start and developers will have to start caring more and more for type safety, among other things. Code will need to be refactored; tech debt will need to be resolved.
How we crashed our app with a single comparison: A strange error
I recently encountered a very confusing but serious error while working on our client’s application.
The application runs on PHP with Symfony framework. During our regular development cycle, right before releasing a new version of the application, all developers are tasked with working on pre-release bug tickets. I was given a ticket that simply showed that a new search field that uses AJAX to fetch results while you’re typing, is instead showing nothing and in the network tab there are error messages with a 500 HTTP code for each request.
These messages mean that the requests are failing due to an issue with our backend PHP code.
Digging deeper
My first step in debugging was to see the actual error message, I fully expected that when I copy the AJAX request’s URI and open it in a new browser tab to see our custom error page that we show for all errors and to me it means that whatever the error was it is logged and I can see the full error message in those logs.
Instead, the error was a PHP error stating that the allowed memory limit was exhausted. This is a very serious error that you do not expect to see usually, because this likely means that a developer did not account for the size of some data that they were working with and this will require a thorough investigation, and the fix will likely require a larger rework, it will not be a one-line fix.
Furthermore, the content retrieved will vary between environments and the affected code runs only in specific conditions where certain features are enabled.
The affected searching functionality is performed mostly on Elastic search, a special type of database, optimized for searching and filtering text data. Our code requests specific data per the user’s search query and then performs various operations on it before returning it to the back to the user.
The failure occurred in a part of the code immediately after retrieving the data. Retrieval was done in a single request and the data then passed as an argument to other functions and this meant to me at the time that the only possible way to exhaust the memory limit is if this retrieved data is unusually large and working with it as one large chunk of data, passing it as an argument to methods and copying it in other variables, exhausts the allowed memory limit for the application and triggers this crash.
This could happen because every time a function is invoked, it and all passed parameters are stored in the call stack, until it completes, after which it is popped out. If the parameters are unusually large, or the memory limit is unusually small, this will eventually exceed said limit for the application. But after a lot of time investigating this lead, it led to nowhere.
The cause of the issue
I was getting ready to call in for reinforcements from one of the more senior engineers on staff, when on one final look over the problematic source code, I noticed a few unusual lines of code. For NDA reasons I cannot show the actual source code here nor the performed operations with the data but I will provide a pseudocode example of how the bug occurred.
resultsCount = array_get(searchResults, ‘hits’) for i=0 ; i<resultsCount ; i++ searchResults[i][‘result_type’] = some_function(searchResults, I, …arguments); … return searchResults;
And this answered all my questions. So, what happened exactly? Well, the answer lies in how PHP handles comparisons between incompatible types. What PHP does is similar to what Javascript does in this case, it attempts to convert incompatible types to compatible types when possible, and if not possible returns a fixed result. In our case this is a big issue because of how this comparison is handled when one operand is an array and the other is anything else (in our case an integer).
This is a case where PHP just outputs a fixed result, an array is ALWAYS greater than anything else that it is compared to. This is different than what happens during casting, where casting an array to an integer, results in PHP essentially checking if the array is empty and returning a 0 if yes and 1 if no.
With this knowledge, if we now follow the execution of the above for loop, we will notice that:
- Even though the developer likely intended resultsCount to be an integer, it is now an array.
- I is always smaller than resultsCount, this means the loop will never reach the condition of resultsCount is equal to i that is required to stop the loop.
- Each step of the loop adds a value to an existing array in memory.
- In PHP it is valid syntax to do $array[‘non-existing-key’] = ‘value’, this essentially adds the non-existing-key to the array and adds ‘value’ as its value. It does not make a difference if the original array’s keys are strings or integers since all PHP arrays work as hashmaps in the background.
Regardless of the size of the initial result data, this infinite adding of values in memory will result in the memory limit exceeded error, described in the ticket.
Unlike my original assumption, this turned out to be a very simple bug to fix. Checking other places where the original developer implemented searching, I noticed that they are counting the results of the search query like this:
resultsCount = count(array_get(searchResults, ‘hits’))
It seems in this case; they forgot to add the count. But this is a bad idea even without making any mistakes. There is no need to calculate data that already exists. Elastic search provides a count of returned data in the result itself and using this would have avoided even introducing the conditions where a bug like this would be possible to make.
After refactoring and testing, I’m pleased to say that the searching works flawlessly… for now.
Conclusion
When writing code in a weak typed language it is very important to pay special attention to what data you are assigning to your variables.
The lack of compiler warnings and the language’s ability to ‘coerce’ data, means that you can easily forget a casting or conversion step and your code will work without errors or warnings, there will be no obvious tell that you are making a mistake. In the case of PHP, after adding property and method argument types to your code and a static analysis tool and enabling strict mode, this concern will be greatly reduced, but not eliminated.
As shown by the above-described bug it is still possible to assign an invalid data type to a variable in your code which introduces unwanted behavior and adds billable development hours on the project. This also highlights the importance of thorough manual testing on the side of the individual developer, before completing the ticket.