In this week, let me take an opportunity to write more about the DAX concepts. The previous posts include – Installing SQL Server 2012 Analysis Services–Tabular mode and Introduction to DAX in Excel.
To help understand how DAX functions work, particularly the ones that are used in the complex calculation scenarios, we need to spend some time on some of the key concepts that are the foundation for DAX functionality. This blog post talks about the terminologies used with DAX and how they function inside Excel or SQL Server Analysis Services.
Up to the current SQL Server 2012 release, DAX only works against tabular SSAS database. The tabular database can be a PowerPivot workbook inside Excel, a sandbox database loaded to SSAS instance running in SharePoint integration mode, or a tabular database deployed from a BISM model. In SQL Server 2008 R2 PowerPivot release, the entire tabular database needs to be stored in memory or error would be returned. In SQL Server 2012, paging capability is enabled so the data can be paged to disk if there is not enough memory for the entire database.
Table stores data about a dimension (entity) or the fact data. Often a table gets data from an external data source, such as SQL Server relational database. For PowerPivot running in Excel, a table can also be a local table saved in the Excel file and linked to the PowerPivot workbook through the PowerPivot linked table feature or a table copied/pasted from a regular Excel workbook in a different Excel file.
Tabular database supports many-to-one relationship between two tables. In tabular database, there can be only one relationship between any two tables. Typically two tables are related through the primary key and the foreign key constraint. From SQL Server 2012, we support multiple relationships between two tables, therefore some of the relationships would be based on non-key columns as well. When multiple relationships exist between two tables, only one relationship is active at any time and that relationship is used by default to navigate between the related tables. In order to use other relationship when evaluate a DAX expression, USERELATIONSHIP function is needed. For example, the following expression calculates the sales amount grouped by Ship Date.
=CALCULATE( FactInternetSales[TotalSalesAmount], USERELATIONSHIP( DimTime[TimeKey], FactInternetSales[ShipDateKey]))
Proper table relationship eliminates the need to maintain redundant data in multiple tables, and ensures DAX calculation returns the correct result. Table relationship also enables performing calculations involving looking up data in the related table by following the relationship. For instance, in the product table, if we create a calculated column with the following definition
For each product in the product table, we will go to the internet sales table, get the subset of rows that contain sales data for the corresponding product, sum the value in the SalesAmount column in the matching rows and return the calculation result to the calculated column for the current product in the product table.
Column is represented as ‘table name’[column name]. The data type for a column can be text, float, whole number, currency, date, and Boolean. These datatypes are same as what we discussed at Introduction to DAX in Excel.
By definition, unique column does not allow duplicate value and null value. The uniqueness is enforced by setting the cardinality of the column to one. This property does not stop a null row from being added to the query results to represent the unknown member when necessary.
When table relationship is created, it internally sets the column on the one side of the one-to-many relationship to be unique, hence preventing duplicate value and blank value. When the relationship is deleted, the unique flag is removed from the column.
There is no way to mark a column as unique in PowerPivot for Excel add-in. In BISM, under Reporting Properties, one can set Row Identifier property for a column to be True. Setting a column as Row Identifier would mark the corresponding attribute as the key attribute of the dimension, which implies uniqueness as well. Only one column in a given table can be the Row Identifier (key attribute).
Row context refers to the current row in the current table that is being evaluated. This applies primarily to the calculated column. For a complex calculation, row context may change as we switch from one DAX function to another DAX function. Row context controls the value that is returned for the column in the current function.
Here are some examples of calculated column definitions using AdventureWorksDW database as the data source to show the row context and row context change.
Calculated column in FactInternetSales table:
=’FactInternetSales’[SalesAmount] – ‘FactInternetSales’[TotalProductCost]
The above formula calculates the profit for each row in the FactInternetSales table. The row context is the current row being evaluated. As the calculation moves to the next row in the same table, the row context moves to the next row as well.
Calculated column in DimProduct table:
=SUMX( RELATEDTABLE( FactInternetSales), FactInternetSales[SalesAmount])
This calculates the total sales amount for each product in the product table. The calculation involves two row contexts, the first one is for the current row in the ‘DimProduct’ table. The second row context is the row in the ‘FactInternetSales’ table that is being scanned when we are inside the RELATEDTABLE function to retrieve the rows that have the same product key.
Filter context consists of members in row label, column label, vertical slicer, horizontal slicer and report filter in the PivotTable. It is used to evaluate the calculated measure. Filter context filters out rows in the selected workbook tables so the calculation is carried out on the selected rows after the filter.
When CALCULATE function is evaluated, if there is a row context associated, the row context is promoted to filter context, the SetFilter arguments passed to CALCULATE function will then be applied on top of the query context to produce the final context before we evaluate the expression passed to CALCULATE function.
I will try to explain these functions in future posts. This is just a start. In this post, let us just learn the concepts.
BLANK in DAX is the same as ‘blank’ in Excel and ‘empty cell value’ in MDX. PowerPivot table does not include blank row. For instance, COUNTROWS(Employee) includes the number of actual employees, not including one extra row for "null" or "unknown" employee.
A blank row will be added into the query result when we encounter referential integrity violations or null foreign keys. ALL() function will return all rows in the result including the blank row if added.
ALLNOBLANKROW will return the same result as ALL except the blank row will only appear if there is a blank row in the data – it will not appear if it is only present as the result of a referential-integrity violation or a null foreign key.
When we aggregate values (SUM, COUNT, AVERAGE, etc.) and all the values being aggregated are BLANK, the result will be BLANK. In arithmetic operations of addition and multiplication, we treat BLANK as zero when some operant is non blank value, for instance, BLANK+3+4+BLANK=7. When all the operant is BLANK, the result is BLANK, i.e. BLANK + BLANK = BLANK. In arithmetic division operation, when the nominator or the denominator is BLANK, the result is BLANK.
This behavior is different from Excel where BLANK is explicitly converted to zero when being calculated. The reason DAX do not convert BLANK value to zero is to avoid displaying members that do not have value. For instance, Cities table has 200 cities, for a particular week, only five cities have sales, when build PivotTable to show the sales for that week, and place cities on the row labels, we want to show five rows of data, instead of 200 rows with 195 of them being zero. We refer to this behavior as non-empty behavior, the same concept as in MDX.
Null has different semantic from BLANK. When arithmetic operation involves NULL operant, the result is always NULL. For example, NULL + 3 = NULL. NULL is not used in DAX. When we encounter a NULL value at import time, we convert it to BLANK.
These were some of the basic concepts we need to know when it comes to DAX. In next blog posts, we will look at the various functions available in DAX and learn more about this feature.