datasource connections not reliably returned to connection pool

Description

(see attached code)
NOTE: This may be related to https://luceeserver.atlassian.net/browse/LDEV-119#icft=LDEV-119

Connections passed to Java methods are not returned to the connection pool, which can result in exhaustion of the connection pool and requiring restart of the Tomcat instance (hung requests, timeouts ignored)

The test case in the attached code is simplified to the bare minimum - the 'real world' case, of course, would actually use the connection to do something useful (PreparedStatement >> ResultSet)

The attached code works as expected with ACF 2016

To reproduce...

  1. Edit the attached code to use a valid datasource for 'datasource_name'

  2. Update the 'connection limit' for the selected datasource to 10

  3. Browse to check-connection-release.cfm

After running this code, any use of the affected datasource will hang - including the 'verify' action in the administrator

If left running long enough (10 minutes is the default request timeout), the following error may be returned (but... not reliably):

Lucee 5.2.1.9 Error (expression)
Message request /issues/check-connection-release.cfm (/media/sf_shared/issues/check-connection-release.cfm) has run into a timeout (600 seconds) and has been stopped.
Stacktrace The Error Occurred in
/media/sf_shared/issues/check-connection-release.cfm: line 10

8: for (i = 1; i LTE limit; i = 1 + i)
9: {
10: conn = dsn.getConnection();
11: conn.close();
12: WriteOutput("#i#:closed<br />");

Java Stacktrace lucee.runtime.exp.RequestTimeoutException: request /issues/check-connection-release.cfm (/media/sf_shared/issues/check-connection-release.cfm) has run into a timeout (600 seconds) and has been stopped.
at java.lang.Object.wait(Native Method)
at lucee.commons.io.SystemUtil.wait(SystemUtil.java:688)
at lucee.runtime.db.DatasourceConnectionPool.getDatasourceConnection(DatasourceConnectionPool.java:73)
at lucee.runtime.db.DatasourceManagerImpl.getConnection(DatasourceManagerImpl.java:73)
at lucee.runtime.db.DatasourceManagerImpl.getConnection(DatasourceManagerImpl.java:67)
at lucee.runtime.services.DataSourceImpl.getConnection(DataSourceImpl.java:51)
at lucee.runtime.services.DataSourceImpl.getConnection(DataSourceImpl.java:43)
at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at lucee.runtime.reflection.pairs.MethodInstance.invoke(MethodInstance.java:55)
at lucee.runtime.reflection.Reflector.callMethod(Reflector.java:846)
at lucee.runtime.util.VariableUtilImpl.callFunctionWithoutNamedValues(VariableUtilImpl.java:799)
at lucee.runtime.PageContextImpl.getFunction(PageContextImpl.java:1701)
at issues.check_connection_release_cfm1035$cf.call(/issues/check-connection-release.cfm:10)
at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:928)
at lucee.runtime.PageContextImpl._doInclude(PageContextImpl.java:842)
at lucee.runtime.listener.ClassicAppListener._onRequest(ClassicAppListener.java:63)
at lucee.runtime.listener.MixedAppListener.onRequest(MixedAppListener.java:44)
at lucee.runtime.PageContextImpl.execute(PageContextImpl.java:2408)
at lucee.runtime.PageContextImpl._execute(PageContextImpl.java:2398)
at lucee.runtime.PageContextImpl.executeCFML(PageContextImpl.java:2366)
at lucee.runtime.engine.Request.run(Request.java:34)

==============
This is a stack dump of another thread which is blocked because it's unable to get a datasource connection - the connection timeout is 10 minutes, but the thread has not been aborted:

Thread-392 (TIMED_WAITING - [Thread[Thread-392,1,main]]):
Thread 'Thread-392' [ID=438, State='TIMED_WAITING']
... request started at 2017-08-04 16:29:17.496[running 13 minutes and 29 seconds]
... Request (GET) from 192.168.1.59
... Server: qcs105luceeauth Port: 443 Script: '/commonspot/build/schema/all-updates.cfm'
[Request made from console]
...request mode: : undefined
minimized stack with 22 entries
0: ........java.lang.Object.wait[Object.java:-2]
1: ........lucee.commons.io.SystemUtil.wait[SystemUtil.java:688]
2: ........lucee.runtime.db.DatasourceConnectionPool.getDatasourceConnection[DatasourceConnectionPool.java:73]
3: ........lucee.runtime.db.DatasourceManagerImpl.getConnection[DatasourceManagerImpl.java:73]
4: ........lucee.runtime.tag.Query.executeDatasoure[Query.java:847]
5: ........lucee.runtime.tag.Query.doEndTag[Query.java:622]
6: CFM....commonspot.build.schema.all_updates_cfm180$cf.udfCall[/commonspot/build/schema/all-updates.cfm:76]
13: CFM....commonspot.build.schema.all_updates_cfm180$cf.call[/commonspot/build/schema/all-updates.cfm:294]

=============
so... bottom line is that there are two separate problems here:

  1. The call to getDatasourceConnection() appears to have no protection against the 'no available connections' case (this probably shouldn't fail instantly, but if you can't get a connection within a few seconds... you're probably dead anyway)

  2. The 'close()' method on the object returned by coldfusion.server.ServiceFactory needs to return the object to the connection pool (or, at a minimum, remove it from the pool completely so it no longer counts toward the limit)

Environment

CFML Engine: Lucee
CFML Engine Version: Lucee 5.2.1.9 (Apache Tomcat/8.5.14)
JVM version: 1.8.0_144-b01 - [64 bits, Linux]
JVM memory (MB): max:494.94 total:247.56 free:178.2

(also tested with 5.2.2.71 on Windows - same result)

Attachments

1
  • 04 Aug 2017, 08:21 pm

Activity

Show:

Dominic Watson 14 June 2019 at 14:11
Edited

I have just experienced the same symptoms on a live environment, albeit on the latest 4.5. These using datasources in the “regular” way. We had > 100 threads stuck on:


// lucee-java/lucee-core/src/lucee/runtime/db/DatasourceConnectionPool.java 69. synchronized (stack) { 70. while(max!=-1 && max<=_size(datasource)) { 71. try { 72. //stack.inc(); 73. stack.wait(10000L); // << code stuck here

 

Checking the database showed zero active connections.

In our case, we had disabled request timeouts which meant that the server would stay up and serve other requests not using this datasource quite happily. With the default 50 second timeout, we have been seeing system-wide hangs with no apparent cause and I wonder if these threads had all been killed but locks remained open.

Tim Parker 27 September 2017 at 02:50

Michael.. please elaborate on the 'ways to get and release clean connections'. We're using the ServiceFactory approach for two reasons:

1) the same code works with ACF
2) we don't want to be storing credentials for the database connections

We can live without #1 - that's a 'nice to have', but it's easy enough to build a wrapper.. but... #2 is critical for us...

Andrew Myers 27 September 2017 at 01:42

I'm trying to debug some server issues we are having, there things seem to be hung up obtaining a new connection, and I found this particular issue.

Are you saying that connections created through the Lucee Web or Server Administrator are not succeptible to this? I didn't quite understand if that was the intention of your comment.

Thanks,
Andrew.

Michael Offner 14 August 2017 at 20:15

Lucee provides ways to get and release connections in a clean way, so this issue is limited to "coldfusion.server.ServiceFactory".

Pothys - MitrahSoft 8 August 2017 at 05:50

I've analyzed this ticket & confirmed the issue happened on lucee latest version of lucee-express-5.2.2.70-RC.
Issue reproduced steps:
Create datasource with connection limit 10.
Run above file check-connection-release.cfm
It throw timeout exception
Trying to verify the datasource it also throws error.

Work-around
If we set the connection limit

  • inf -

for the data source, Issue goes away & making open connection for that dataSource.

Details

Assignee

Reporter

Priority

New Issue warning screen

Before you create a new Issue, please post to the mailing list first https://dev.lucee.org

Once the issue has been verified, one of the Lucee team will ask you to file an issue

Affects versions

Created 4 August 2017 at 20:50
Updated 2 April 2025 at 16:06