Uploaded image for project: 'Lucee Development'
  1. LDEV-1640

Java heap memory exhaustion, pegged CPU, and unresponsive server

    Details

    • Type: Bug
    • Status: Deployed
    • Priority: New
    • Resolution: Fixed
    • Affects Version/s: 5.2.2.71, 5.2.4.37, 5.2.5.20
    • Fix Version/s: 5.2.7.21
    • Environment:

      Ubuntu 16.04
      Java 1.8.0_151
      1.5GB RAM
      MSSQL Datasource
      JAVA_OPTS="-Xms256m -Xmx512m -XX:MaxPermSize=128m"

    • Sprint:
      March 2018

      Description

      Since upgrading to 5.2.2.71, my production Lucee server would become unresponsive within about 24 hours. Lucee was still running, but using 95%+ of the CPU and all HTTP requests to the server just hung. This is a serious issue that prevents some users from upgrading beyond version 5.2.1.9.

      With some difficulty, I have been able to reproduce this locally in releases after and including 5.2.2.71. Version 5.2.1.9 does not have this problem, and is stable in production.

      By capturing the output of getMemoryUsage() repeatedly shows the following pattern of memory usage in the problematic Lucee versions:

      1. Tenured generation space increases fairly rapidly, with only small portions being reclaimed by garbage collection runs.
      2. Tenured generation space becomes completely full.
      3. Used Eden space starts increasing.
      4. Eden space becomes completely full.

      A little while after this, Lucee starts using a high percentage of CPU (presumably attempting repeated GC runs) and the server becomes unresponsive. By contrast, in version 5.2.1.9, tenured generation memory usage increases very slowly, and never maxes out even after weeks of uptime in production. See the attached charts for memory and CPU usage in the last good version, the first bad version, and the current version.

      5.2.5.20 (bad)

      5.2.2.71 (bad)

      5.2.1.9 (good)

      At this time, I unfortunately do not have a reproducible test case that I can share. Reproducing the problem involves crawling my site while capturing memory stats to a log file. Each test takes about an hour, and currently requires my application's custom code, which I am not at liberty to share. However, from the discussion boards, this seems to be a common complaint from other Lucee users, and affects applications other than mine (such as Mura CMS), so I wanted to create an authoritative place to collect information and discuss the issue.

      References to other reports that I suspect may be experiencing this issue:

        Attachments

          Activity

            People

            • Assignee:
              21solutions Igal Sapir
              Reporter:
              sb_leon Leon Miller-Out
            • Votes:
              10 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: