One of my pet peeves in Tableau is that you cannot have Dynamic Parameters, that is parameters that can be set to a calculation to determine its value at runtime. A common requirement here I get is to have a dynamic date range (e.g. Using the last whole 12 Months, so if today is 15 March 2016, then the range needs to be 1 March 2015 - 29 Feb 2016).
Standard Tableau Approach:
A standard tableau approach (for the above range) would be to use a relative date filer and set to the last 13 months, but that will also include this month data.
So, to exclude this months data, we can create a calculated measure that checks if the date dimension is in the current month, then filter when it is:
BUT! This is fine if the date range is static and always the :Last full 12 months". What if you want to have your dashboard default to the last 12 months as a start, then allow your users to change the date range to suite their needs? Then you'll have to show both filters and explain to the users how the combined approach works. Not very user intuitive.
In my hack approach, I add two parameters to the tableau workbook ([Start Date] and [End Date]), use that to filter the data, and deploy it to Tableau Server.
I then have a scheduled PowerShell script that uses the Tableau tabcmd command line utility. At a high level I:
This allows for a default date range, that is updated on a schedule (monthly in my case, but can be daily, weekly, etc), plus allows the users to change the date range interactively to suite their needs.
Here is the PowerShell script I used (number are for referencing purposes):
Breakdown of the code:
Lines 1-10 i declare my variables. This includes:
And that's it. I then schedule the script to run on a monthly basis and the workbook is updated every month with the new End Date.
How to determine the path of the workbook on the server?
Navigate to the workbook on tableau server and hover your mouse over the "Download" link. You'll see the path in the status bar at the lower left (in Google Chrome)
Tabcmd Publishing Gotcha
When publishing with tabcmd, by default all the dashboards and views are visible (regardless of how it was previously published using tableau desktop). So if you want to only have the dashboards available for you users, and not the views, then you'll need to hide the views in your Tableau Workbook (right click, Hide Sheet):
One of my goals for 2016 is to Learn VFX, and on of (if not the) best VFX tool out here is SIdeFX's Houdini. Before I can get into the various in depth VFX simulations, I first needed to get up to speed with the shading and rendering workflow in Houdini. Based on the PluralSight course, Introduction to Materials in Houdini, I shaded and rendered a 2011 Ford Mustang GT500.
I attended the 2016 Gartner Business Intelligence, Analytics & Information Management Summit that was held on 22 - 23 February in Sydney. Below are my notes surrounding Self Service Data Preparation. This will be my final post regarding the 2016 Gartner Business Intelligence, Analytics & Information Management Summit.
Self Service Data Preparation Overview
What is Self Service Data Preparation?
Data preparation is an iterative process for exploring and transforming raw data into forms suitable for data science experiments, data discovery, and analytics.
Challenges With Current (IT Based) Data Preparation Approaches
Core Features of Self-Service Data Preparation Tools
Types of Self Service Data Preparation
Note: Microsoft Power Query is also a component within Power BI
Processes to Promote Business User Generated Content to Enterprise Data Integration and Governance
I attended the 2016 Gartner Business Intelligence, Analytics & Information Management Summit that was held on 22 - 23 February in Sydney. Below are my notes surrounding the Internet of Things (IoT).
What is the Internet of Things?
The Gartner IoT Solution Scope Reference Model
Recommendations for BI and Analytical Leaders
I attended the 2016 Gartner Business Intelligence, Analytics & Information Management Summit that was held on 22 - 23 February in Sydney. Below are my notes surrounding Hadoop:
What is Hadoop?
Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. Hadoop is not just 1 thing, rather the term “Hadoop” has come to refer to the “ecosystem”, or collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Spark, and others.
Hadoop provides a range of data processing options which sits on top of a distributed file system, with redundancy. Batching is Hadoop's strongest component, but also includes interactive SQL, and Streaming/Events data. Hadoop is ever evolving.
What Hadoop is used for?
Hadoop is not the best fit for consumption of analytics.
Architecture Patterns for Analytics on Hadoop
The Logical Data Warehouse
The Logical Data Warehouse (LDW) is a data management architecture for analytics which combines the strengths of traditional repository warehouses with alternative data management and access strategies. It is a clear demarcation between centralized repository approaches and managed data services for analytics.
Hadoop adoption recommendations
I attended the 2016 Gartner Business Intelligence, Analytics & Information Management Summit that was held on 22 - 23 February in Sydney. Below are my notes surrounding Data Lakes.
What is a Data Lake?
A collection of storage instances of various data assets additional to the originating data sources. These assets are stored in a near-exact, or even exact, copy of the source format. The purpose of a data lake is to present an unrefined view of data to only the most highly skilled analysts, to help them explore their data renement and analysis techniques independent of any of the system-of-record compromises that may exist in a traditional analytic data store (such as a data mart or data warehouse)
Staffing Data Science Teams
Data Lake Integration
Data Lake Recommendations
I attended the 2016 Gartner Business Intelligence, Analytics & Information Management Summit that was held on 22 - 23 February in Sydney. Below are my notes from the keynote:
The Keynote of the Gartner BI Summit focused on Information Yield: "the dividend received from an investment in information management and analytics". Only 15 percent of all data and analytics strategies reviewed by Gartner analysts over the past two years contained concrete business outcomes. This is as information specialists tend to focus on the structure, consistency and perfection of the data, rather than the desired business outcomes. Rather we should use information yield to tell a story, a narrative, that takes everyone on a journey, rather than produce a spreadsheet. Gartner outlined 3 steps to telling a good Information Yield story:
The keynote went on then to discuss the current landscape of the BI and Analytics Market:
In summary, the keynote provided the following key recommendations:
Group Manager -.Data & Insights