Details
Details
Assignee
Unassigned
UnassignedReporter
Brad Wood
Brad WoodLabels
New Issue warning screen
Before you create a new Issue, please post to the mailing list first https://dev.lucee.org
Once the issue has been verified, one of the Lucee team will ask you to file an issue
Priority
Created 1 December 2022 at 18:56
Updated 30 October 2024 at 12:21
While working on , I have noticed a performance bottleneck when adding rows and setting data into a query object concurrently.
Doing some final performance tests for my blog and for large QoQ, the worst part of the design is that query columns use an internal native array that is grown as needed when you add rows. This means
All
set()
operations must be synchronized (to prevent setting data while the internal array is being grown/copied)All
set()
operations therefore cause contention with each other, even if the array is not being resized! (only one thread can set data into a query column at a time!)The constant re-growing of the array also causes bottlenecks while it copies data over to the new array and blocks all
set()
operations in the mean time (this gets slower as the query gets bigger)In a test where I ran a 1 million row QoQ 100 times, 10 seconds out of the total 30 seconds of the test was literally spent adding rows to the array and waiting for locks. If I removed the synchronization and pre-populated the query with all rows, it shaved off 30% of the run time! Of course, I can't use this because it's no longer thread safe, but it demonstrates the bottleneck of locking under concurrent access.
I even tried using a
ReentrantReadWriteLock
so twoset()
operations wouldn't conflict with each other but would still get blocked by agrowTo()
call, however the overhead of acquiring and releasing the lock under load is so much that it's actually just as slow as a synchronized block!I'd like to look into changing the native array that gets copied as it grows to a List of arrays or Array of Arrays where we just add new "chunks" as needed to allocate more slices without touching the existing chunks/arrays. I’ve worked with a caching mechanism in the JBoss Undertow source code which uses this approach. This would allow the
set()
operations to not require any locking at all and would greatly improve thegrowTo()
performance since no data would need copied. So if the chunk sizes were 1000 rows, the 1001st piece of data would go in the first position of the second chunk, etc. Note,. it would be possible to have the chunks grow larger with each one, but that would greatly complicate the logic to find the correct chunk for a piece of data.If we want to use a native array to store the slices, we could initialize it with 1000 or 5000 slots. Each null value in a native array takes around as many bits as your CPU arch, so a native array with 1000 nulls would essentially be 64 bits times 1000 or 8KB which is very small. Then if each slice was initialized as a native array of 5000, each empty slice would consume 40 KB. The only real downside here of a native array is it would place an upper bound on the number of rows that could be stored in a query object. 5000 slices with 5000 items each would allow us to store 25 million rows in a query object. If we traded for a List of Arrays, we could grow forever, but we’d sacrifice a bit of that raw performance we get from native arrays.
Either way. being able to set into the column without any locking would be totally worth it for high concurrency usage of query objects. I don’t think this is a tremendous amount of effort to implement, I just wanted to talk through it and get feedback if possible.