ColdFusion MX hanging condition investigation
The cases below provide details on several types of reported "hanging" conditions with ColdFusion MX, along with fix status and suggested workarounds. If you believe your situation is not reflected in these cases, please contact Macromedia Support with a reproducible case.
ColdFusion MX hanging conditions investigated so far fall into these categories:
- ColdFusion hangs - all requests for database connections stack up but HTML pages and ColdFusion pages with no database interaction work fine
- ColdFusion MX hangs - long running queries (any database type)
- Attempting to clean up idle database connections causes hanging with any Oracle driver
- Apache connector issues causing instability on Linux and Unix
The following diagnostic steps can be used to determine which condition you might be experiencing:
- Run
cfstatto see how many templates are active - Obtain a stack trace at the moment the system is hung
- Check all logs for errors: web server error logs,cf_root/logs, cf_root/runtime/logs,cf_root/runtime/lib/wsconfig/[N]/jrun020303.log (IIS)
- Which JVM is being used? Has the jvm.config been modified?
For best results, the following configuration is recommended:
- If using one database, create separate data sources for batch operations and the general application. Do not pool the batch data source.
- Use a separate data source for client variables. This lessens the need for a pooled connection in the same data source as the application when the
cfapplicationtag is invoked, as it doesn't compete for pooled connections with running queries in other templates.
Pooled connections are used for those data sources which have the option for Maintain Connections enabled. The number of connections in the pool is determined by the setting forRestrict connections to.
ColdFusion hangs and all requests for database connections stack up. HTML pages and ColdFusion pages with no database interaction work fine.
When using an IP address for the database server, either in a JDBC URL or in ODBC data source settings, the driver does a reverse lookup to find the server name for new connections. On some systems that contact remote database servers, DNS issues can cause the connection pool to lock for an extended period of time waiting for the reverse lookup to complete, bringing ColdFusion to a standstill.
This is due to the way the ODBC driver (SQLSrv32.DLL) and the JDBC DataDirect driver are coded. Microsoft uses the same scheme for it's ODBC driver mentioned in Q300420: Connection to SQL Server Database using IP Address is Unusually Slow. This affects a wide range of users with remote database servers (servers not on the same machine as ColdFusion MX).
NOTE: A template with no database interaction will need a connection if client variable storage is using the same data source.
Additional details
In ColdFusion MX, client variables are purged on an hourly basis. If the client variables are stored in a database, this condition may occur on an hourly basis, as ColdFusion connects to the database to perform the purge.
In this case, examining a stack trace would show requests stuck in the cfapplication tag and the template referenced as "Application.cfm" with the checkOut() call. If not storing client variables in the database, the stack trace will show a number of templates, mostly stuck at checkOut().
Example stack trace fragment:
"jrpp-0" prio=5 tid=0x457c39e0 nid=0x6ec runnable [0x4863f000..0x4863fdbc] at java.net.InetAddressImpl.getHostByAddr(Native Method) at java.net.InetAddress.getHostName(Unknown Source) at java.net.InetAddress.getHostName(Unknown Source) at macromedia.jdbc.sqlserver.tds.TDSConnection.getServerHostname(Unknown Source) at macromedia.jdbc.sqlserver.tds.TDSLoginRequest.submitRequest(Unknown Source) at macromedia.jdbc.sqlserver.SQLServerImplConnection.open(Unknown Source) "jrpp-9" prio=1 tid=0x83e63b8 nid=0x1cfd waiting for monitor entry [0xbabff000..0xbabff890] at jrun.sql.pool.JDBCPool.checkOut(JDBCPool.java:437) <-- trying to check a connection out of the pool at jrun.sql.pool.JDBCPool.requestConnection(JDBCPool.java:740) at jrun.sql.pool.JDBCManager.requestConnection(JDBCManager.java:126)
- One thread stuck at getHostByAddr() locking the entire data source connection pool from checkins or checkouts.
- Many threads waiting atjrun.sql.pool.JDBCPool.checkOut() trying to get a connection from the pool.
- A few threads waiting atjrun.sql.pool.JDBCPool.checkIn() trying to free a connection back to the pool.
This bug causes a cascade effect. For example, with 100 requests for connections (checkOut()) and 10 requests for checkIn() and none available in the pool, a new hard connection to the database is required. The reverse lookup takes time (piling up more checkOut() threads) and completes and the connection is made and used. The next thread to lock the pool is probably a checkOut() and anothernew hard connection will be required which starts the cycle all over again.
On Windows, the maximum amount of time allowed for a connection to complete is based on the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\ Parameters\TcpTimeWaitDelay registry key. The default value for this key is 4 minutes. If the connection, due to the IP reverse lookup, takes longer than the value of this setting, all requests for connections trying to checkOut() of the connection pool will be closed by the operating system.
Status
Macromedia bug 52537 and DataDirect bug 15007412 are logged for this issue. Most often seen with SQL Server data sources, although the issue applies to all DataDirect 3.1 drivers shipped with ColdFusion MX.
WorkAround
Confirm IP lookup problem with DNSTest program on the ColdFusion system.
- Download DNSTest.
- Unzip DNStest.zip into a directory on the ColdFusion MX system.
- From a command prompt, run DNSTest (command is case-sensitive):
java DNSTest {IP address of database server system}Example:java DNSTest 10.2.1.55
- Note the time it takes between "got InetAddress" and "got host name".
java DNSTest 10.2.1.55 got InetAddress for 10.2.1.55 got host name 10.2.1.55 for 10.2.1.55
The amount of time between the two is how long the driver is locking the pool each time a new connection is needed that isn't in the pool.
Add the IP address and server name of the database server into the hosts file:
Windows:- Open \WINDOWS\system32\drivers\etc\hosts in a text editor.
- Add the IP address and server name of the database server.
- Save the changes to the file.
- Open /etc/hosts in a text editor.
- Add the IP address and server name of the database server.
- Save the changes to the file.
- Open /etc/nsswitch.conf in the text editor.
- Move
filesto the beginning of thehostslist, if it is not already there. For example:hosts: files dns nis [NOTFOUND=continue]
- Save the changes to the file.
ColdFusion MX hangs - long running queries (any database type)
Macromedia has found incorrect behavior related to the internal pooling of database connections. Two database pooling maintenance threads, the "skimmer" and the "lifeguard", could cause deadlocks for other request threads waiting for database connections from the pool. Because this issue is specifically related to database connections, HTML pages and ColdFusion pages with no queries still run properly.
Types of maintenance threads
The lifeguard thread tries to rescue dormant connections. The default interval is 20 minutes.
The skimmer thread evaluates all connections in the pool to determine whether to return them to the pool or destroy them if they are timed out. The default interval is 20 minutes.
When a connection has been used LONGER than the timeout (20 minute default) for the data source, such as a long batch operation, it will hold up all other requests for database connections on that data source if the data source is pooled.
Additional details
Example stack trace fragments:
"obj-lifeguard" daemon prio=5 tid=0x00e92308 nid=0x39 waiting for monitor entry [2a781000..2a781998] at oracle.jdbc.driver.OracleConnection.setAutoCommit(OracleConnection.java:1207) - waiting to lock <0x9278ecb0> (a oracle.jdbc.driver.OracleConnection) at coldfusion.server.j2ee.sql.JRunConnection.setAutoCommit(Unknown Source) at coldfusion.server.j2ee.sql.JRunConnection.clean(Unknown Source) at coldfusion.server.j2ee.pool.ObjectPool.checkTimeout(Unknown Source) - locked <0x9146d978> (a coldfusion.server.j2ee.sql.pool.JDBCPool) at coldfusion.server.j2ee.pool.LifeGuardThread.run(Unknown Source) at java.lang.Thread.run(Unknown Source)
- One thread holding a lock on aoracle.jdbc.driver.OracleConnection - executing a prepared statement.
- One thread waiting for the lock inoracle.jdbc.driver.OracleConnection to AutoCommit, and holding the coldfusion.server.j2ee.sql.pool.JDBCPool lock.
"jrpp-23" prio=1 tid=0x83e63b8 nid=0x1cfd waiting for monitor entry [0xbabff000..0xbabff890] at jrun.sql.pool.JDBCPool.checkOut(JDBCPool.java:437) <-- trying to check a connection out of the pool at jrun.sql.pool.JDBCPool.requestConnection(JDBCPool.java:740) at jrun.sql.pool.JDBCManager.requestConnection(JDBCManager.java:126)
- Many threads waiting for thecoldfusion.server.j2ee.sql.pool.JDBCPool lock to get a connection out of the pool.
Status
It has been determined the lifeguard thread hangs when trying to evaluate connections on long running queries that it thinks are dormant connections. This locks the entire connection pool. Macromedia bug 52485 is logged for this issue and is fixed in ColdFusion MX 6.1. It is strongly recommended that customers apply this free update to ColdFusion MX.
WorkAround
Use a separate data source for long-running (batch) operations. DO NOT POOL THIS DATA SOURCE.
Attempting to clean up idle database connections causes hanging with any Oracle driver.
When an unused connection to an Oracle data source is closed by the skimmer (20 minute intervals), the connection pooling code in ColdFusion MX performs an operation that requires a connection to the database. If the connection is no longer active on the Oracle side (dropped, closed due to a "CONNECTION IDLE TIMEOUT" setting), the server hangs and requests requiring a connection from the connection pool back up until this operation times out. This occurs will all Oracle drivers, such as the DataDirect driver shipped with ColdFusion MX or the Oracle thin driver.
Additional details
Example stack trace fragments:
"obj-skimmer" daemon prio=1 tid=0x815a028 nid=0x2e1 runnable [0xbbbff000..0xbbbff890] at java.net.SocketInputStream.socketRead(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:85) at macromedia.util.UtilSocketDataProvider.getArrayOfBytes(Unknown Source) at macromedia.util.UtilBufferedDataProvider.cacheNextBlock(Unknown Source) at macromedia.util.UtilBufferedDataProvider.getArrayOfBytes(Unknown Source) at macromedia.jdbc.oracle.OracleDepacketizingDataProvider.receive(Unknown Source) at macromedia.util.UtilByteOrderedDataReader.receive(Unknown Source) at macromedia.jdbc.oracle.net8.OracleNet8NSPTDAPacket.sendRequest(Unknown Source) at macromedia.jdbc.oracle.OracleImplConnection.setTransactionIsolation(Unknown Source) at macromedia.jdbc.base.BaseConnection.setTransactionIsolation(Unknown Source) at jrun.sql.JRunConnection.setTransactionIsolation(JRunConnection.java:532) <--Connection is already closed at jrun.sql.JRunConnection.clean(JRunConnection.java:238) at jrun.sql.JRunConnection.close(JRunConnection.java:466) at jrun.sql.pool.JDBCPool.expire(JDBCPool.java:721) at jrunx.pool.ObjectPool.cleanUp(ObjectPool.java:357) at jrunx.pool.PoolSkimmerThread.run(PoolSkimmerThread.java:44) at java.lang.Thread.run(Thread.java:479)
- One thread stuck atjrun.sql.JRunConnection.setTransactionIsolation()
- Many threads stuck atjrun.sql.pool.JDBCPool.checkOut()
All pooled connections are used. When the pooling code tries to expire a connection that no longer exists on the database side, it tries to call setTransactionIsolation(). This causes a hang and a connection pool lock since the database connection is already gone. All other requests for new connections get backed up, locking both requests to checkIn() to free used connections and checkOut() of new cached connections. The stack trace may contain dozens of "jrpp-" threads with checkOut() calls at the top line (cfquery, etc.) trying to get a connection from the LOCKED data source connection pool.
"jrpp-71" prio=1 tid=0x83e63b8 nid=0x1cfd waiting for monitor entry [0xbabff000..0xbabff890] at jrun.sql.pool.JDBCPool.checkOut(JDBCPool.java:437) <-- trying to check a connection out of the pool at jrun.sql.pool.JDBCPool.requestConnection(JDBCPool.java:740) at jrun.sql.pool.JDBCManager.requestConnection(JDBCManager.java:126) at jrun.sql.JRunDataSource.getConnection(JRunDataSource.java:235) at jrun.sql.JRunDataSource.getConnection(JRunDataSource.java:175) at coldfusion.sql.DataSrcImpl.getCachedConnection(Unknown Source)
Status
When a connection in the pool remains unused for the default 20 minute data source timeout, the skimmer code tries to set attributes on it after it has been closed on the database side. This locks the pool trying to run setTransactionIsolation() on a bad connection. No other connections can be checked out from the pool during this time. Macromedia bug 52567 is logged for this issue and is fixed in ColdFusion MX 6.1. It is strongly recommended that customers apply this free update to ColdFusion MX.
WorkAround
Increasing the Oracle connection IDLE TIMEOUT setting to a value equal to or higher than the 20 minute default connection timeout in the ColdFusion MX data source settings may resolve the issue in some cases.
Apache connector issues causing instability on Linux and Unix.
See the ColdFusion MX support on Linux and Unix with Apache and FAQ for ColdFusion MX connector configuration for details regarding instability issues caused by the Apache connector.
WorkAround
Upgrade to at least ColdFusion MX Updater 3.
Additional Information
- ColdFusion MX 6.1
- ColdFusion MX 6.1: 100% CPU utilization and other issues using DataDirect 3.2 JDBC drivers
- Debugging stack traces in ColdFusion MX
- ColdFusion MX support on Linux and Unix with Apache
- FAQ for ColdFusion MX connector configuration
This content requires Flash
To view this content, JavaScript must be enabled, and you need the latest version of the Adobe Flash Player.
Download the free Flash Player now!
