datasource connections not reliably returned to connection pool
Description
Environment
CFML Engine: Lucee
CFML Engine Version: Lucee 5.2.1.9 (Apache Tomcat/8.5.14)
JVM version: 1.8.0_144-b01 - [64 bits, Linux]
JVM memory (MB): max:494.94 total:247.56 free:178.2
(also tested with 5.2.2.71 on Windows - same result)
Attachments
- 04 Aug 2017, 08:21 pm
relates to
Activity

Dominic Watson 14 June 2019 at 14:11Edited
I have just experienced the same symptoms on a live environment, albeit on the latest 4.5. These using datasources in the “regular” way. We had > 100 threads stuck on:
// lucee-java/lucee-core/src/lucee/runtime/db/DatasourceConnectionPool.java
69. synchronized (stack) {
70. while(max!=-1 && max<=_size(datasource)) {
71. try {
72. //stack.inc();
73. stack.wait(10000L); // << code stuck here
Checking the database showed zero active connections.
In our case, we had disabled request timeouts which meant that the server would stay up and serve other requests not using this datasource quite happily. With the default 50 second timeout, we have been seeing system-wide hangs with no apparent cause and I wonder if these threads had all been killed but locks remained open.

Tim Parker 27 September 2017 at 02:50
Michael.. please elaborate on the 'ways to get and release clean connections'. We're using the ServiceFactory approach for two reasons:
1) the same code works with ACF
2) we don't want to be storing credentials for the database connections
We can live without #1 - that's a 'nice to have', but it's easy enough to build a wrapper.. but... #2 is critical for us...

Andrew Myers 27 September 2017 at 01:42
I'm trying to debug some server issues we are having, there things seem to be hung up obtaining a new connection, and I found this particular issue.
@Michael Offner Are you saying that connections created through the Lucee Web or Server Administrator are not succeptible to this? I didn't quite understand if that was the intention of your comment.
Thanks,
Andrew.

Michael Offner 14 August 2017 at 20:15
Lucee provides ways to get and release connections in a clean way, so this issue is limited to "coldfusion.server.ServiceFactory".

Pothys - MitrahSoft 8 August 2017 at 05:50
I've analyzed this ticket & confirmed the issue happened on lucee latest version of lucee-express-5.2.2.70-RC.
Issue reproduced steps:
Create datasource with connection limit 10.
Run above file check-connection-release.cfm
It throw timeout exception
Trying to verify the datasource it also throws error.
Work-around
If we set the connection limit
inf -
for the data source, Issue goes away & making open connection for that dataSource.
Details
Assignee
Michael OffnerMichael OffnerReporter
Tim ParkerTim ParkerPriority
MajorLabels
New Issue warning screen
Before you create a new Issue, please post to the mailing list first https://dev.lucee.org
Once the issue has been verified, one of the Lucee team will ask you to file an issue
Details
Details
Assignee

Reporter

Priority
Labels
New Issue warning screen
Before you create a new Issue, please post to the mailing list first https://dev.lucee.org
Once the issue has been verified, one of the Lucee team will ask you to file an issue
(see attached code)
NOTE: This may be related to https://luceeserver.atlassian.net/browse/LDEV-119#icft=LDEV-119
Connections passed to Java methods are not returned to the connection pool, which can result in exhaustion of the connection pool and requiring restart of the Tomcat instance (hung requests, timeouts ignored)
The test case in the attached code is simplified to the bare minimum - the 'real world' case, of course, would actually use the connection to do something useful (PreparedStatement >> ResultSet)
The attached code works as expected with ACF 2016
To reproduce...
Edit the attached code to use a valid datasource for 'datasource_name'
Update the 'connection limit' for the selected datasource to 10
Browse to check-connection-release.cfm
After running this code, any use of the affected datasource will hang - including the 'verify' action in the administrator
If left running long enough (10 minutes is the default request timeout), the following error may be returned (but... not reliably):
==============
This is a stack dump of another thread which is blocked because it's unable to get a datasource connection - the connection timeout is 10 minutes, but the thread has not been aborted:
=============
so... bottom line is that there are two separate problems here:
The call to getDatasourceConnection() appears to have no protection against the 'no available connections' case (this probably shouldn't fail instantly, but if you can't get a connection within a few seconds... you're probably dead anyway)
The 'close()' method on the object returned by coldfusion.server.ServiceFactory needs to return the object to the connection pool (or, at a minimum, remove it from the pool completely so it no longer counts toward the limit)