arrayEach has too much memory overhead from duplicating pageContext

Description

The following code will use no more than 20 threads at a time process 50k items in an array, but the used heap on a 64 bit Windows machine jumps from 54 Megs to 2.9 Gigs! A GC will clear out the heap, but that is just way too much memory.

myArr = []; for( i=1; i<=50000; i++ ) { myArr.append( i ); } myArr.each( (i) => i++, true, 20 );

Not using parallel runs in 312 ms but adding the threading slows the request down to 15 seconds. The FusionReactor profiler shows that nearly 80% of the request was spent cloning the page context.

lucee.runtime.thread.ThreadUtil.clonePageContext(ThreadUtil.java)

A heap dump of the server's memory also shows that there are a huge number of page context objects in the heap which were expensive to create.

  • Lucee appears to create 50,000 new page contexts which is very wasteful. Since there are no more than 20 threads running at a time, no more than 20 page contexts should have been created and pooled.

  • I actually don't think Lucee should be cloning the pc at all! Unlike a cfthread which can be a daemon, an arrayEach() or structEach() thread by design will never outlive the parent request and runs "inside" of the parent request. Therefore, why are we cloning the pc at all? This is just wasted resources and time. The threads should be able to share the same pc in this instance.

I've run into this and had it cause out of memory errors due to wasted memory and I also have clients running into this consuming high amounts of memory on their servers while trying to process large batches of items using arrayEach() or queryEach().

Environment

None

Activity

Zac Spitzer 
14 September 2020 at 17:28
(edited)

the following change has been made to 5.3.7.42 and 5.3.8.71 which reduces the memory usage drastically. previously it was creating a pageContext for each array element, which exploded memory usage for large arrays

https://github.com/lucee/Lucee/commit/f0a45566b78362ec493ccb2d286993498e4c66c4

https://github.com/lucee/Lucee/commit/ef8ce8e4bc58b324d8675160a0a57b85ee24145f

Samuel W. Knowlton 
27 July 2020 at 17:31

Memory usage on arrayEach is still quite high compared to a for loop in the 5.3.7 RC. We have a couple of CPU- and memory-intensive tasks that loop over 2,000+ structs. If we use .each(), we’ve seen memory usage go from 4 GB to 24 GB. With for loops doing exactly the same thing, we haven’t seen anything like that.

Michael Offner 
3 April 2020 at 14:59
(edited)

we got memory usage down and also the time spend, i see now the same numbers for both cases BUT in contrast to for example a simple for loop, it is still much slower! so i created an other ticket for this addressing overall performance of this

LDEV-2824

Samuel W. Knowlton 
20 March 2020 at 11:10

Hi - Brad enlightened me that it’s not just ++. It’s not just a question of writing that longhand. Many operations are not thread safe and the example of locking that he gave is probably what should go in the docs at least until Lucee can do that behind the scenes.

Fixed

Details

Assignee

Reporter

Priority

Labels

Fix versions

New Issue warning screen

Before you create a new Issue, please post to the mailing list first https://dev.lucee.org

Once the issue has been verified, one of the Lucee team will ask you to file an issue

Sprint

Affects versions

Created 6 November 2019 at 16:47
Updated 20 March 2024 at 09:56
Resolved 3 April 2020 at 15:01