Corrupt search index file causes whole site to fail

Description

We have experienced an issue whereby somehow the search index file for a site became corrupt and therefore invalid XML and this causes the whole site to fail and therefore becomes unavailable. The file in question is in the web configuration directory in search/search.xml. This corrupt XML file caused the following error to be reported in the catalina.out log:

[Fatal Error] :2576:56: XML document structures must start and end within the same entity. Oct 13, 2017 6:00:17 AM org.apache.catalina.core.StandardContext loadOnStartup SEVERE: Servlet [CFMLServlet] in web application [] threw load() exception javax.servlet.ServletException: XML document structures must start and end within the same entity. at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at lucee.runtime.search.SearchEngineSupport.init(SearchEngineSupport.java:83) at lucee.runtime.config.ConfigWebFactory.loadSearch(ConfigWebFactory.java:3842) at lucee.runtime.config.ConfigWebFactory.load(ConfigWebFactory.java:395) at lucee.runtime.config.ConfigWebFactory.newInstance(ConfigWebFactory.java:270) at lucee.runtime.engine.CFMLEngineImpl.loadJSPFactory(CFMLEngineImpl.java:227) at lucee.runtime.engine.CFMLEngineImpl.addServletConfig(CFMLEngineImpl.java:181) at lucee.loader.engine.CFMLEngineFactory.getInstance(Unknown Source) at lucee.loader.servlet.CFMLServlet.init(Unknown Source) at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1269) at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1182) at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1072) at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:5362) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5660) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:145) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1694) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1684) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

A similar error was output on the site when visiting in the web browser. While this is a Tomcat error and outside the scope of Lucee, it feels like Lucee should be configured in such a way that having a corrupt search index file doesn't cause the whole site to no longer function, e.g. why does Tomcat even care about this file? Can the Lucee process be changed so that having a corrupt search XML file doesn't bring down the whole site?

Environment

None

Activity

Show:

Bruce Kirkpatrick 1 December 2018 at 17:49
Edited

I looked at the lucee 4.5 source code, and found it is not designed to ignore the error. The master branch of Lucee 5 doesn't have this fix either. The master branch of lucee is currently at Lucee 5.2.9 which is the highest release quality version of Lucee.

What you are waiting for seems to be a quick fix to that master branch so that you don't have to incur the risk of the 5.3 branch which this problem appears to have been fixed already in October. There are a lot of bugs during that time in 5.3 actively being looked at, that make it hard to use any version but the newest snapshots. The snapshots are available in the lucee admin, and they usually work ok, but some of the admin changes have been annoying, less of a problem if you are not using lucee exclusive features.

5.2:
https://github.com/lucee/Lucee/blob/master/core/src/main/java/lucee/runtime/config/XMLConfigWebFactory.java

5.3:
https://github.com/lucee/Lucee/blob/5.3/core/src/main/java/lucee/runtime/config/XMLConfigWebFactory.java

The log object is not being passed to the function on lucee 5.2, so it can't run the same code, so you'd have to implement a lot more changes :
try {
// loadxml
catch (Exception e) {
log(config, log, e);
}
I looked at the commit history, and he has this massive commit of over 400 changes required to get logs centralized.
https://github.com/lucee/Lucee/commit/9eda810c5a87a10707ff2e5f5895ead5cdae7256#diff-7d71424455db44db33dd5c5ce06d5315

and then a few months later, he came up with this commit which fixes your problem by making a lot of those things optional to startup when they fail:
https://github.com/lucee/Lucee/commit/0ed3937f7b22fe9112b56e1430fd1b07656a58fa#diff-7d71424455db44db33dd5c5ce06d5315

It looks like a lot to ask to pull all those changes into the stable 5.2 branch.

If you really want to quick fix your problem without using 5.3 (corrected), you'd have to modify loadSearch in that file to just silently fail or do some custom log to keep it simple. It could be weeks or months until 5.3 is officially release quality and stable. I run 5.3.2.16-snapshot in production, and other then the bugs I reported, it is better then any other lucee release.

Of course any number of other issues could effect you if you choose to run the latest snapshot, so do so at your own risk. I would definitely use a test environment first.

Andrew Dixon 16 November 2018 at 06:53
Edited

- This is still happening and it has just happened two days in a row, which isn't good for our client. Any chance someone can take a look? If it is something you think might be different in 5.x as the search engine has been moved out into an extension as maybe updated (????) then let me know as it might be possible to update the server from 4.5.x to 5.x.

Patrick Quinn 21 November 2017 at 16:37

Thanks again for the update, Andrew. Still on my watch list, so I'll be sure to keep an eye on it, priority-wise, as we wrap up the development year, and as we're planning the 2018 development schedule. Stay tuned.

Andrew Dixon 20 November 2017 at 12:26

- We have just had this again and this time it wasn't associated with a Tomcat restart, if it becomes corrupt at any time it causes the whole site to become unavailable. If you then go into the directory and delete the search.xml the site starts functioning again, without having to restart Tomcat. This time the corruption coincided with an update of the search index, so I assume it is the search index process corrupting the file. I think on the previous occurrences when we saw this, it was probably already an issue before the Tomcat restart, it is just we noticed after the restart as we checked some sites. We now have monitoring on this specific site, hence noticing when Tomcat had not been restarted. I've updated the description to reflect this.

Patrick Quinn 2 November 2017 at 22:09

Thanks for these updates, . That should help us as we work on this one.

Won't Fix

Details

Assignee

Reporter

Priority

New Issue warning screen

Before you create a new Issue, please post to the mailing list first https://dev.lucee.org

Once the issue has been verified, one of the Lucee team will ask you to file an issue

Affects versions

Created 14 October 2017 at 09:29
Updated 26 July 2024 at 09:08
Resolved 26 July 2024 at 09:08

Flag notifications