In Oracle Database 11g, the performance of partition pruning has been enhanced by using bloom filtering instead of subquery pruning. While subquery pruning was activated on a cost-based decision and consumed internal (recursive) resources, pruning based on bloom filtering is activated all the time without consuming additional resources.

A Bloom Filter is a probabilistic algorithm that quickly tests membership in a large set using multiple hash functions into a single array of bits. A join filter can be used when an execution plan fetches all of the rows of one table before fetching any rows from another table that joins to it:

The join filter builds an array of bits, and turns on bits on for each row of the first table that matches the search conditions

When scanning the second table, rows that could not possibly join with the first table are rejected.

As already mentioned, in a parallel hash join with inter-node parallelism the interconnect can become a bottleneck. Pre filtering using bloom filter can reduce communication overhead.

example of an explain plan with bloom filter:

select count(*) from empp e, deptp d where e.deptno = d.deptno

In Oracle Database 11g, the Bloom Filter is enabled by default.

Smart Scan is a subset of Cell Offload Processing and is perhaps the feature that is most highly associated with Exadata. Smart Scan allows Oracle databases to push query information down to the storage level, specifically:

Filtration (predicate information)
Column Projection
Join Filtering

To understand what this means think of a very simple example where a table with ten million rows is the subject of a full table scan by an Oracle database. In a non-Exadata environment this data has to be read from disk over the IO subsystem (most commonly fibre channel) using large, sequential multi-block IOs. During the time of this IO request the database process waiting for the data will be left in a wait state (known as db file scattered read) whilst the blocks from disk are scattered to available slots in the data cache (in memory on the database server). This will inevitable cause many useful blocks to be aged out of the cache, having adverse implications on performance going forward. Once the read completes Oracle will apply a number of filters to the data based on the query being run. For example:

SELECT     invoice_amount, sales_rep       <- Column projection
FROM       invoice_data
WHERE      country_code = 'UK'             <- Filtration
AND        invoice_quarter = '2011Q3';     <- Filtration

In this query we are only interested in a subset of the total number of columns (and there may be a significant number of additional columns on the INVOICE_DATA table). Likewise we are only interested in a subset of the rows, those which match the filters of being related to the UK and the third quarter of 2011. If this table contains rows for the last seven years this represents a significant amount of data which is not going to be part of the final result. To put this in different terms, an often significant percentage of the work undertaken is wasted effort.

In the world of data warehousing, where this type of activity is commonplace, the traditional strategies to avoid this situation are partitioning and parallel processing. The partitioning feature allows segments such as tables to be broken up into partitions which can be read place of the whole table – to use the example from above each financial year could be placed in a different table partition resulting in our query only needing to scan one-year’s worth of data instead of seven years. By using parallelism Oracle can then create multiple slave processes which can divide up the work of scanning the partition and run concurrently. This also has an additional advantage in that parallel reads are “direct path” which means they bypass the data cache, avoiding the impact to the useful blocks located there.

Oracle Exadata makes considerable use of parallel processing in order to achieve better performance – the obvious example being the 14 storage cells in a full rack which divide up the work accordingly (not through any managed process, but simply because ASM randomly scatters data blocks across all storage cells). Partitioning is still a separately licensed cost option but, in comparison to the cost of an Exadata system, one that most customers would be expected to purchase.

However, Exadata’s unique selling point – arguably the most important feature of the product – is its ability to offload the column projection and row filtration to the storage cells. This means that the details of SQL being executed are available to the storage cells and they can use them to reduce the volume of data being returned across the storage network to the database servers.

In the example above the cells are able to discard all column data other than those requested (the invoice amount and the details of the sales rep) as well as discarding all data which does not conform to the predicates of being in the UK and in the third quarter of 2011. Only the small subset of data which still fits these criteria will be returned over the storage network to the database servers for processing. Note that results are returned to calling session’s private memory and not the data cache.

In addition to column projection and row filtration Exadata is also able to offload some join filtration to the storage cells. Since Oracle Database 11g the possibility has existed to use Bloom Filters to reduce the amount of work undertaken when joining sets of data together (see page 21 of white paper). Exadata is able to use this method to examine two sets of data being returned from different tables which need to be joined on the database server – in a normal join only intersecting data from each data set will be retained. A Bloom filter is a probabilistic algorithm which can quickly allow for a large number of the non-intersecting rows to be discarded. By offloading this work to the storage cells Exadata can further reduce the amount of data being sent back to the database processes. Oracle claims that a “10x reduction in data sent to DB servers is common” (see page 24 of this slide deck).

[ A possible misconception about Bloom filters is that they can only be employed by Exadata when joining two tables – however this is incorrect and Exadata can apply multiple Bloom filters to a table scan.]

An important fact to understand about Smart Scan is that the full scans of segments still take place at the storage level. The reduction in workload (and consequential performance increase) comes from the ability to avoid sending data over the storage network to the database servers and then process it. Exadata does have another feature, “Storage Indexes”, aimed at avoiding the need to read from disk at all – as well as the Smart Flash Cache which attempts to keep useful data blocks on flash storage, but the basic Smart Scan operation does not avoid the need to perform reads.

Scope of Smart Scan

Whilst Smart Scan is a feature which Oracle is alone in being able to introduce (as the owner of the database code), the actual impact of Smart Scan is limited by the database workload. Although Oracle claims that a “10x reduction in data sent to DB servers is common”, this only applies to data returned via Smart Scan and so is restricted to full table scans and fast full index scans.

Full table scans and fast full index scans are common in data warehousing environments but almost non-existent in a typical OLTP environment.

A further restriction on Smart Scan is that only the standard “heap” table in Oracle is a suitable candidate; objects such as clusters and index organised tables cannot be Smart Scanned.

The decision on whether to use Smart Scan is taken by Oracle at run time, i.e. when a query is executed. This means that it is impossible to predict with accuracy whether a smart scan will take place when writing code and designing applications. Traditionally developers and performance architects would use the explain plan method within Oracle to find out what type of execution plan the optimizer would use for a given piece of SQL. This allowed for SQL tuning and hinting to take place as well as the redesigning of a schema’s indexing strategy.

Since Smart Scan is a run-time decision, the execution plan is only able to offer an indication of whether a query might be able to use Smart Scan or not.

My Exadata

Wednesday, April 17, 2019

dcli vs cellcli

Grid disk vs ASM Disk group

Tuesday, April 16, 2019

Bloom Filters

Smart Scan

Scope of Smart Scan