Exadata provides a lot of useful metrics to monitor the Cells.
The Metrics can be of various types:
- Cumulative: Cumulative statistics since the metric was created.
- Instantaneous: Value at the time that the metric is collected.
- Rate: Rates computed by averaging statistics over observation periods.
- Transition: Are collected at the time when the value of the metrics has changed, and typically captures important transitions in hardware status.
You can found some information on how to exploit those metrics in those posts:
But I think those types of metrics are not enough to answer all the basic questions.
Let me explain why with 2 examples:
Let’s have a look to the metrics GD_IO_RQ_W_SM and GD_IO_RQ_W_SM_SEC (restricted to one Grid Disk for lisibility):
dcli -c cell1 cellcli -e "list metriccurrent attributes name,metricType,metricObjectName,metricValue where name like \'.*GD_IO_RQ_W_SM.*\' and metricObjectName ='data_CD_disk01_cell'"
cell1: GD_IO_RQ_W_SM Cumulative data_CD_disk01_cell 2,930 IO requests
cell1: GD_IO_RQ_W_SM_SEC Rate data_CD_disk01_cell 0.3 IO/sec
So we can observe that this “cumulative” metric shows the number of small write I/O requests while its associated “rate” metric shows the number of small write I/O requests per seconds.
or
Let’s have a look to the metrics CD_IO_TM_W_SM and CD_IO_TM_W_SM_RQ (restricted to one Cell Disk for lisibility):
dcli -c cell1 cellcli -e "list metriccurrent attributes name,metricType,metricObjectName,metricValue where name like \'.*CD_IO_TM_W.*SM.*\' and metricobjectname='CD_disk07_cell'"
cell1: CD_IO_TM_W_SM Cumulative CD_disk07_cell 1,512,939 us
cell1: CD_IO_TM_W_SM_RQ Rate CD_disk07_cell 168 us/request
So we can observe that this “cumulative” metric shows the small write I/O latency in us while its associated “rate” metric shows the small write I/O latency in us per request.
But how can I answer those questions:
- How many small write I/O requests have been done during the last 80 seconds? (Unfortunately 0.3 * 80 will not necessary provide the right answer as it depends of the “observation period” of the rate metrics)
- What was the small write I/O latency during the last 80 second ?
You could ask for the same kind of questions on all cumulative metrics.
To answer all those questions I created a perl script exadata_metrics.pl (click on the link and then on the view source button to copy/paste the source code) that extracts exadata real-time information metrics based on cumulative metrics.
That is to say the script works with all the cumulative metrics (the following command list all of them) :
cellcli -e "list metriccurrent attributes name,metricType where metricType='Cumulative'"
To extract real-time information the script takes a snapshot of cumulative metrics each second (default interval) and computes the differences with the previous snapshot.
So, to get the answer to our first question :
./exadata_metrics.pl 80 cell=cell1 name='GD_IO_RQ_W_SM' metricobjectname='data_CD_disk01_cell'
04:30:38 CELL NAME OBJECTNAME VALUE
04:30:38 cell1 GD_IO_RQ_W_SM data_CD_disk01_cell 0.00 IO requests
--------------------------------------> NEW
04:31:58 CELL NAME OBJECTNAME VALUE
04:31:58 cell1 GD_IO_RQ_W_SM data_CD_disk01_cell 20.00 IO requests
As you can see 20 small write I/O requests have been generated during the last 80 seconds (which is different from 0.3*80).
To get the answer to our second question :
./exadata_metrics.pl 80 cell=cell1 name_like='.*CD_IO_TM_W.*SM.*' metricobjectname='CD_disk07_cell'
06:48:33 CELL NAME OBJECTNAME VALUE
06:48:33 cell1 CD_IO_TM_W_SM CD_disk07_cell 0.00 us
--------------------------------------> NEW
06:49:53 CELL NAME OBJECTNAME VALUE
06:49:53 cell1 CD_IO_TM_W_SM CD_disk07_cell 3613.00 us
As you can see we the small write I/O latency has been 3613 us during the last 80 seconds.
Let’s see the help of the script:
./exadata_metrics.pl help
Usage: ./exadata_metrics.pl [Interval [Count]] [cell=] [top=] [name=] [metricobjectname=] [name_like=] [metricobjectname_like=]
Default Interval : 1 second.
Default Count : Unlimited
Parameter Comment Default
--------- ------- -------
CELL= comma separated list of cell to display
TOP= Number of rows to display 10
NAME= ALL - Show all cumulative metrics ALL
NAME_LIKE= ALL - Show all cumulative metrics ALL
METRICOBJECTNAME= ALL - Show all objects ALL
METRICOBJECTNAME_LIKE= ALL - Show all objects ALL
Example: ./exadata_metrics.pl cell=cell1,cell2 name_like='.*FC.*'
Example: ./exadata_metrics.pl cell=cell1,cell2 name='CD_IO_BY_W_LG'
Example: ./exadata_metrics.pl cell=cell1,cell2 name='CD_IO_BY_W_LG' metricobjectname_like='.*disk.*'
The script is based on the dcli and the cellcli commands and their regular expressions (wich are described into Kerry Osborne’s post).
- You can choose the number of snapshots to display and the time to wait between snapshots.
- You can choose to filter on name and metricobjectname based on like or equal predicates.
- You can work on all the cells or a subset thanks to the mandatory CELL parameter.
- A cell os user allowed to run dcli without password (celladmin for example) can launch the script (ORACLE_HOME must be set).
Please don’t hesitate to tell me if this is useful for you and if you find any issues with this script.
Updates:
- New features have been added, please see this post.
- You should read this post for a better interpretation of the utility: Exadata Cell metrics: collectionTime attribute, something that matters