Lucee Queue Timeouts don't respond with any meaningful indication of timeout

Description

If Lucee is configured with the Lucee Queue either in the administrator or via LUCEE_QUEUE_ENABLE=true environment variables it correctly times out a request that is over the queue size and has been sitting in queue past the timeout value.

This option is awesome and is a far easier way to deal with load shedding then tomcat's maxThreads etc which is pretty confusing for when a server gets too many incoming requests and then starts to be unable to fulfill them.

When a request times out of the queue the request returns with a status code of 200 and content length of 0. This could however be a valid response from the server although unlikely that a request would have a content length of 0. In this way we can "detect" that we have hit this queue overload and retry the request or otherwise alert staff that we are filling the queues.

application.log does get a message to the effect of as below. With tools like splunk etc I can detect these and alert, but harder to drive things based on them.

application.log:"ERROR","http-nio-8888-exec-8","02/28/2022","10:24:47","controller","Concurrent request timeout (10) [10 ms] has occurred, server is too busy handling other requests. This timeout setting can be changed in the server administrator.;Concurrent request timeout (10) [10 ms] has occurred, server is too busy handling other requests. This timeout setting can be changed in the server administrator.;

This is driven by the logic in ThreadQueueImpl.java ~68

This then gets caught by CFMLEngingeImpl.java ~1212

which logs it to application.log

Then it returns a 200 with content length of zero which will get written to the tomcat access logs as such.

It would be nice if it would instead return a status code of 429 or a configurable status code so that upstream ( or is it downstream ) servers can react by either retrying the response or even better by scaling the fleet.

Thanks for you consideration. If the proper way to handle this would be to IOException to something like RequestQueueTimeoutException and catching like it is for RequestTimeoutException I could try to throw together a PR, but dealing with configuration and setting a status code in this part of the code is not clear to me.

Activity

Show:

eric twilegar 2 March 2022 at 23:25

Yah, 503 is a pretty good match. I do get them from other layers such as load balancers when a server's health check fails.

I could really live with anything as long as it was a consistent way. Preferable it be configurable so I could use something like 456 to 100% know it was it.

The more I think about it my servers health checks will fail if i I don’t get a 200 which is both what I want and not what I want. I’d rather the server just start shedding vs the load balance take it out of service, but I could see different places wanting different things depending on what the load balance was capable of. For me I tend to take care of it at the app level so I get retries etc.

Setting a header and/or status code would make it fit most use cases.

Brad Wood 2 March 2022 at 23:06

I don’t think 429 is the correct status code as that implies the client itself has sent too many request. While that may be the case, it’s also possible the client has only sent a single request and the queue was filled by other clients. I do agree, it should NOT be a 200. I think a 503 is probably the most appropriate one.

Also, Lucee should allow the configuration of a static HTML page to be returned with a default being some generic “Server too busy” text just so it’s not a white page. Adobe has similar settings which you can see here

The main difference is adobe returns a 500, which I don’t think is quite as appropriate as a 503.

Details
Assignee
Unassigned
Reporter
eric twilegar
Labels
exception
New Issue warning screen
Before you create a new Issue, please post to the mailing list first https://dev.lucee.org
Once the issue has been verified, one of the Lucee team will ask you to file an issue
Priority
New

Created 2 March 2022 at 22:07

Updated 3 March 2022 at 10:48

Lucee Queue Timeouts don't respond with any meaningful indication of timeout

Description

Activity

eric twilegar 2 March 2022 at 23:25

Brad Wood 2 March 2022 at 23:06

DetailsAssigneeUnassignedUnassignedReportereric twilegareric twilegarLabelsexceptionNew Issue warning screenBefore you create a new Issue, please post to the mailing list first https://dev.lucee.orgOnce the issue has been verified, one of the Lucee team will ask you to file an issuePriorityNew

Details

Assignee

Reporter

Labels

New Issue warning screen

Priority

Details
Assignee
Unassigned
Reporter
eric twilegar
Labels
exception
New Issue warning screen
Before you create a new Issue, please post to the mailing list first https://dev.lucee.org
Once the issue has been verified, one of the Lucee team will ask you to file an issue
Priority
New