While monitoring the MS SQL Server I suggest keeping an eye on Buffer Manager/Memory Manager counter which can help us in identifying the memory contention issues in the MSSQL Database. The memory contention issues are quite easier to find in case we have single database hosted on the box and it becomes really hard job when the box is shared by multiple database and memory consumed by each database varies. Below are some of the steps which Microsoft recommends while working with Memory issues in MS SQL Database.
The following steps should help you troubleshoot memory errors.
-
Verify if the server is operating under external memory pressure. If external pressure is present, try resolving it first, and then see if the problem/errors still exist.
-
Start collecting performance monitor counters for SQL Server: Buffer Manager, SQL Server: Memory Manager.
-
Verify the memory configuration parameters (sp_configure), min memory per query, min/max server memory, awe enabled, and the Lock Pages in Memory privilege. Watch for unusual settings. Correct them as necessary. Account for increased memory requirements for SQL Server 2005.
-
Check for any non-default sp_configure parameters that might indirectly affect the server.
-
Check for internal memory pressures.
-
Observe DBCC MEMORYSTATUS output and the way it changes when you see memory error messages.
-
Check the workload (number of concurrent sessions, currently executing queries).
Now since in most cases performance team never has access to database, so I suggest monitoring the database specific counter to identify memory contention issues. Below is short description of Buffer Manager Performance Object as given in MSDN Site,
The Buffer Manager object provides counters to monitor how SQL Server uses:
-
Memory to store data pages, internal data structures, and the procedure cache.
-
Counters to monitor the physical I/O as SQL Server reads and writes database pages.
Monitoring the memory and the counters used by SQL Server helps you determine:
- If bottlenecks exist from inadequate physical memory. If it cannot store frequently accessed data in cache, SQL Server must retrieve the data from disk.
- If query performance can be improved by adding more memory, or by making more memory available to the data cache or SQL Server internal structures.
- How often SQL Server needs to read data from disk. Compared with other operations, such as memory access, physical I/O consumes a lot of time. Minimizing physical I/O can improve query performance.
We need to keep in mind that Database operations are most often are memory intensive and at OS level there exists lot pull/push efforts done by various processes in case if the box is short in memory or is shared database server.
In addition to Available Bytes, Page/second, Paging file usage patterns, below counters also helps in some analysis,
SQLServer: Buffer Manager — Page Life Expectancy: It is number of seconds a page will stay in the buffer pool without references. The longer the page life expectancy, the healthier the server looks from a memory perspective. A server suffering from memory pressure will typically see page life expectancy values of 200 seconds or below. If we find this, we’ll have reasonable grounds to suspect a low memory condition.
SQLServer: Buffer Manager — Buffer cache hit ratio: Percentage of pages found in the buffer cache (Physical Memory) without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses. After a long period of time, the ratio moves very little. Because reading from the cache is much less expensive than reading from disk, you want this ratio to be high. Generally, you can increase the buffer cache hit ratio by increasing the amount of memory available to SQL Server. If data pages are not found in the buffer, SQL Server must read them into the buffer from disk. This is usually a slow process because of disk latency and seek times. Even on a fastest storage drives, the time to read a page from disk compared with time to read a page from memory is many multiples greater. Buffer cache hit ratio less than 90% is often considered as some issues on the database side and needs to investigated.
SQLServer: Buffer Manager: Stolen pages: Stolen pages are those pages in memory which are stolen by another process. Servers which are experiencing memory pressure will typically show high quantities of stolen pages relative to the total target pages.
SQLServer: Memory Manager — Memory Grants Pending: Memory grants pending is effectively a queue of processes awaiting a memory grant. In general, if you have any processes queuing waiting for memory, you should expect degraded performance. The ideal situation for a healthy server is no outstanding memory grants.
SQLServer: Buffer Manager — Checkpoint pages/sec: It is number of pages flushed to disk per second by a checkpoint or other operation that require all dirty pages to be flushed.The SQL Server checkpoint operation requires all dirty pages to be written to disk. The checkpoint process is expensive in terms of disk input/output (I/O). When a server is running low on memory the checkpoint process will occur more frequently than usual as SQL Server attempts to create space in the buffer pool. If you observer sustained high checkpoint pages/second compared with normal rates for your server, it’s a good indication of a low memory condition.
SQLServer: Buffer Manager — Lazy writes/sec: Number of buffers written per second by the buffer manager’s lazy writer. The lazy writer is a system process that flushes out batches of dirty, aged buffers (buffers that contain changes that must be written back to disk before the buffer can be reused for a different page) and makes them available to user processes. The lazy writer eliminates the need to perform frequent checkpoints in order to create available buffers. This counter records the number of times per second that SQL Server relocates dirty pages from the buffer pool (in memory) to disk. Again, disk I/O is expensive and you should attempt to provide SQL Server with enough space for the buffer pool that lazy writes are as close to zero as possible. If you’re seeing lazy writes of 20 per second or more, then you can be sure the buffer pool isn’t big enough.
SQLServer: Buffer Manager — Total Pages: It is number of pages in the buffer pool (includes database, free, and stolen pages).This Buffer Manager/Total Pages counter will expose the total number of pages acquired by SQL Server.
SQLServer: Buffer Manager — Target Pages: Ideal number of pages in the buffer pool. This Buffer Manager/Target Pages counter records the ideal number of pages required for the SQL Server Buffer Pool.
If the target and total pages values are the same, SQL Server has sufficient memory. If the target is greater than the total, it’s likely that some external memory pressure (usually another Windows process) is preventing SQL Server acquiring as much memory as it would like to operate.
All these information often helps in troubleshooting hard to find performance issues specially in environments where performance team do not have access to query DMV’s or system tables to gather direct and precise information from the database.
Sometimes I feel monitoring and making sense of all these data is so boring, but again most of the time as a performance engineer we don’t have a choice or any other option given that getting an direct access to system tables is almost impossible in most projects for various reasons.