Fixed
Details
Assignee
Brad WoodBrad WoodReporter
Brad WoodBrad WoodPriority
NewLabels
Fix versions
New Issue warning screen
Before you create a new Issue, please post to the mailing list first https://dev.lucee.org
Once the issue has been verified, one of the Lucee team will ask you to file an issue
Details
Details
Assignee
Brad Wood
Brad WoodReporter
Brad Wood
Brad WoodPriority
Labels
Fix versions
New Issue warning screen
Before you create a new Issue, please post to the mailing list first https://dev.lucee.org
Once the issue has been verified, one of the Lucee team will ask you to file an issue
Created 28 October 2020 at 17:46
Updated 2 days ago
Resolved 30 October 2020 at 20:55
The Controler thread that checks every JDBC connection with isValid() does so inside of a synchronized method that locks the entire DCStack object (which represents the connection pool). When one of the isValid() checks hangs for a long time, it completely kills the server since no other threads are allowed to get or return a connection to the pool.
Find a way to check out each connection and test them OUTSIDE of a synchronized method. It's simply too dangerous to have the entire pool tied up while you wait for a network connection to a remote resource, which could be slow, or never return. I realize this means that the pool may be changing while the timeout checks are running, but honestly I think this can be worked around.
If a new connection has been added, we know it's good because it's brand new.
And if a connection was removed from the pool while we're checking, then we know it's good as well because it's being used!
More information: I've run into this same issue (and brought it up) years ago and it's really biting a current client of mine using an Azure managed database. Azure uses an internal failover mode that can switch the physical machines around on the backend and is documented to have intermittent network drops that your application needs to work around. The isValid() timeout also seems to have documented bugs in MS SQL's JDBC driver where it can hang forever:
https://github.com/microsoft/mssql-jdbc/issues/1128
My client has updated to the very latest 8.4.1 version of MS's JDBC driver and it does not fix their issue. Below is what the stack trace looks like when an isValid() check goes out to lunch and never comes back and it takes down the entire server by keeping the connection pool locked. Ultimately, this is partially a problem with the underlying JDBC driver, but Lucee needs to do more to prevent a reset network connection from taking down the entire server.
My client has over 100 datasources defined, and by default Lucee's controller thread is checking every connection on every datasource once a minute. That's potentially hundreds to thousands of isValid() calls being made to the DB every minute!
"Thread-99468" Id=101598 RUNNABLE (in native) java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.readInternal(IOBuffer.java:1015) at com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.read(IOBuffer.java:1005) at sun.security.ssl.InputRecord.readFully(Unknown Source) at sun.security.ssl.InputRecord.read(Unknown Source) at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source) - locked java.lang.Object@254e1ded at sun.security.ssl.SSLSocketImpl.readDataRecord(Unknown Source) at sun.security.ssl.AppInputStream.read(Unknown Source) - locked sun.security.ssl.AppInputStream@69eb7bec at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:2054) at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6596) - locked com.microsoft.sqlserver.jdbc.TDSReader@605bbee0 at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7760) at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:870) at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:767) at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7375) at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3206) - locked java.lang.Object@74765a62 at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:247) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:222) at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQueryInternal(SQLServerStatement.java:699) at com.microsoft.sqlserver.jdbc.SQLServerConnection.isValid(SQLServerConnection.java:5795) at lucee.runtime.db.DCStack.isValidEL(DCStack.java:197) at lucee.runtime.db.DCStack.clear(DCStack.java:154) at lucee.runtime.db.DCStack.clear(DCStack.java:172) at lucee.runtime.db.DCStack.clear(DCStack.java:139) - locked lucee.runtime.db.DCStack@156d0c4e at lucee.runtime.db.DatasourceConnectionPool.clear(DatasourceConnectionPool.java:189) at lucee.runtime.engine.Controler.control(Controler.java:325) at lucee.runtime.engine.Controler.control(Controler.java:236) at lucee.runtime.engine.Controler.access$000(Controler.java:60) at lucee.runtime.engine.Controler$ControlerThread.run(Controler.java:113)