Lucee server crashes when 1 db connection hangs/fails for a clustered session/client scope

Description

I had an unreachable Lucee server today, on a machine which is barely used. About 2 requests per minute come in, from pingdom.
After investigation, I found the following errors in requesttimeout.log (see attached)

The log says a request timeout occurred at
{{lucee.runtime.db.DatasourceConnectionPool.getDatasourceConnection(DatasourceConnectionPool.java:77)
at lucee.runtime.type.scope.storage.IKHandlerDatasource.loadData(IKHandlerDatasource.java:36)
at lucee.runtime.type.scope.storage.IKStorageScopeSupport.hasInstance(IKStorageScopeSupport.java:213)
at lucee.runtime.type.scope.ScopeContext.hasExistingCFSessionScope(ScopeContext.java:549)}}
After this timeout, any next request also times out at
at lucee.runtime.type.scope.storage.IKStorageScopeSupport.hasInstance(IKStorageScopeSupport.java:213)
Which makes sense, because the function at IKStorageScopeSupport.java:213 is a synchronized function, so the follow-up requests are waiting forever.

After a restart of Tomcat, the same thing happened again 70 minutes later! Again, see the (same) attached log.

The web application in question has this.sessionstorage = "datasource Name"; and this.sessionCluster=true; set in Application.cfc.

Environment

None

Activity

Show:
Paul Klinkenberg
January 24, 2018, 8:06 AM

The stack traces I added to this ticket before, are good enough, right? Or do you really want to get the same results again?

Michael Offner
January 24, 2018, 9:16 AM

so far yes

analysing stacktrace

in version 5.2.4.37 it seems that this code (https://github.com/lucee/Lucee/blob/5.2.4.37/core/src/main/java/lucee/runtime/type/scope/storage/IKStorageScopeSupport.java#L213) should be blocked on line 211 and not 213. are you sure this is the right version this stacktrace was produced in? i just want to be sure that i'm not haunting a ghost.

Michael Offner
January 24, 2018, 9:18 AM

none of the implementations of IKHandler.loadData methods are synchronised so entering this methods should never be blocked in 5.2.4.37

Michael Offner
January 24, 2018, 9:26 AM

candidate causing the issue
https://github.com/lucee/Lucee/commit/9a03d7f9417c8b57c0ff474e4c42ccc24837a7bc#diff-b87da572293ebd1811dbfefc7e7a6747

version 5.2.2.68-SNAPSHOT

sadly this change is not based on a ticket, i will investigate why this change was made and how we can improve it.
we stopped using synchronised with the method declaration anyway, we now only use synchronised code blocks, to limit the code in synchronised state.
we will for sure improve that code in that way.

Michael Offner
January 24, 2018, 8:12 PM
Edited
Fixed

Assignee

Michael Offner

Reporter

Paul Klinkenberg

Priority

Critical

Labels

None

Fix versions

Sprint

None

Affects versions