Symfony bootstrap optimization

A complex product like Nuvola can incur performance degradation over the years because many small inefficiencies don’t have a noticeable effect when analyzed individually, and so their compound effect can gradually become very costly, undetected.
We noticed that our backend instances had become heavily CPU hungry, so we decided to investigate to find the sources of inefficiency.

Strategy

Given the size of the project, an exhaustive analysis of the codebase is both infeasible and cost-ineffective. Therefore, keeping the Pareto Principle in mind, we decided to analyse only the most-used pages, confident that this would still yield significant results with limited effort. In fact the best point where to begin to search for inefficiencies is the framework initialization because that code is being executed on every request.

We used the XDebug trace feature to profile the requests on a local machine using the Symfony “prod” environment, obviously, and then processed the trace file with the Flamegraph tool. This gave us an interactive graph with “flames” having an height proportional to the maximum depth of the call stack and a width proportional to the time spent into the method call at the base, and allowed us to quickly identify the biggest inefficiencies.

One thing to bear in mind is that the method at the base of the big flame you want to extinguish is usually just a victim of one of its dependencies. If you remove that call, the flame simply moves to another point on the graph. For example, if that service had a costly-to-instantiate dependency, removing it merely postpones the work until another service depending on it is instantiated.

Preloading

One easy target for optimization is enabling the preloader if it’s not already.
Symfony generates a preload file for you and it will contain all the classes used by the cache warmers and the services with the container.preload tag.
We just needed to customize it by adding this line

1	ini_set('memory_limit', '1024M');

before including the AppKernelProdContainer.preload.php file, because during preloading the limit set in the php.ini file is ignored.
By analyzing our flamegraphs we determined that the symfony voters autoloading was one of our biggest computational costs so we implemented a compiler pass to auto tag all of them to be preloaded.
Then we added a bunch of class_exists in the preload file to ensure that all of our Doctrine Types, Doctrine Filters and Symfony Authenticators are preloaded.

While doing this work we discovered that one of the libraries we use had a very inefficient autoloader that checked the existence of a file in its own source directory for every class, even those outside of the library namespace. We ended up writing a simple autoloader for that library and not using the provided one so be sure to check if some of your libraries use a custom autoloader.

Lazy services

Some of the biggest inefficiencies we found were due to unneeded services initializations. Those services were dependencies of core services so they were instantiated with all their dependencies recursively even when being useful just for a small and rarely used part of the system.
The Symfony container provides a lazy flag to overcome this problem, but it is up to the developer to use it when needed. However, don’t just flag all services as lazy. Using it for services that are not to be instantiated causes their classes to be added to the preloading unnecessarily, as Symfony needs to create a proxy object for each of them.
Be especially wary of services that generate network connections. We found that one library we use to upload customer files to S3 was sending requests during Symfony bootstrap, even for the login page so we had to explicitly set that service as lazy in a compiler pass.

Voters

The Symfony voters are cycled through every time a permission is checked in the system so this can consume a lot of CPU in a large application. We found that IS_AUTHENTICATED_FULLYwas checking in extremely frequently executed methods so we decided to call its underlying service AuthenticationTrustResolverInterface directly in these key places.
Additionally, if you don’t do it already, make sure to make use of their caching capabilities as explained here.

Results

After implementing the discussed changes these are the benchmarks we obtained locally on the login page before deploying the changes to production:

Before

After

Concurrency Level: 3

Time taken for tests: 21.809 seconds

Complete requests: 1000

Failed requests: 970

(Connect: 0, Receive: 0, Length: 970, Exceptions: 0)

Total transferred: 5303561 bytes

HTML transferred: 4959561 bytes

Requests per second: 45.85 [#/sec] (mean)

Time per request: 65.426 [ms] (mean)

Time per request: 21.809 [ms] (mean, across all concurrent requests)

Transfer rate: 237.49 [Kbytes/sec] received

Connection Times (ms)

min mean[+/-sd] median max

Connect: 0 0 0.0 0 0

Processing: 50 65 9.7 62 97

Waiting: 49 65 9.7 62 97

Total: 50 65 9.7 62 97

Percentage of the requests served within a certain time (ms)

50% 62

66% 67

75% 71

80% 73

90% 81

95% 85

98% 88

99% 91

100% 97 (longest request)

Concurrency Level: 3

Time taken for tests: 10.566 seconds

Complete requests: 1000

Failed requests: 962

(Connect: 0, Receive: 0, Length: 962, Exceptions: 0)

Total transferred: 5303389 bytes

HTML transferred: 4959389 bytes

Requests per second: 94.64 [#/sec] (mean)

Time per request: 31.698 [ms] (mean)

Time per request: 10.566 [ms] (mean, across all concurrent requests)

Transfer rate: 490.17 [Kbytes/sec] received

Connection Times (ms)

min mean[+/-sd] median max

Connect: 0 0 0.0 0 0

Processing: 24 32 5.0 30 52

Waiting: 24 31 5.0 30 52

Total: 24 32 5.0 30 52

Percentage of the requests served within a certain time (ms)

50% 30

66% 33

75% 35

80% 37

90% 39

95% 41

98% 42

99% 44

100% 52 (longest request)

After deploying the changes in production we observed at least a 30% reduction in CPU consumption.

We hope some of this suggestions will be useful to you. Happy coding!