A while back Stephen Few argued that data can not be described as being beautiful. At that point in time I did not agree to this statement. I do think data can be beautiful, but as the term “smart data” is popping up more and more I find myself unable to use this term to describe data. Am I being a hypocrite? Can data be beautiful, but not smart?
To me data is one of the dumbest things out there, it just exists. Data on it’s own creates absolutely nothing, it wouldn’t even have existed if someone didn’t create it. So even though I find myself saying that data can be beautiful I don’t like the term smart data. Data can be structured, organized and formatted beautifully, but being smart? No!
By stating that data is smart we take away the credit from many people who works with data each day. Smart people, really smart people! They are the ones that deserves the credit for all the amazing things data can be used for, not the data itself. Do we call trees smart just because we can do amazing things with them? No, but the inventors of paper, housing and so on they were geniuses!
Give credit where credit is due. Data is lazy, it would lie on the couch all day if it could. The people that works with data on the other hand and gets the data up from the couch and creates amazing products and services with it, they do all the dirty work and should also get all the glory!
In a project we are using Talend to load a lot of data each night and we are experiencing randomly getting “Connection does not exists” error messages during our data load. This can happen at any time both during the connection phase, and so far we have been unable to see any real signs of why it is happening and when. In addition this leads connection reset often leads to our data being corrupt and unusable meaning we have to start all over. We have therefore set up an error handling when reading from this data source.
Setting up a try/catch in Talend
Create a context variable that we call continueLooping. This boolean will be used to end our loop when we reach our maximum number of attempts or the connection has been successful
Add a tJava where you initialize the variable to true
Then add a tLoop as a While and condition context.continueLooping
Now we add a tJavaflex where our try/catch block will be. Put the try block in the start code and the catch block in the end code. Mine look something like this. Feel free to add some logging in here as well so you can keep track of where the error is happening. in the main code I make the job sleep for a little while in order to give our connection some time to get back up.
Add a tJava with a “On component Ok” trigger on your database connection. Here set the continueLooping to false to stop the loop.
In the end it should look something like this:
Extending the error handling
In our case we are already looping our read by using a job above this one to read data one month at a time. The output of this job is large .csv files which we then upload to Azure blob storage in order to use Polybase to finally move the data into our SQL DWH data warehouse. Since we know we can loose connection in the middle of our read we need to clean up our .csv files before starting the loop all over for the month that we are reading. This is done by adding an If trigger on the tJavaflex where our trigger is to check which iteration we are in. If we are not in the first iteration of our loop something has gone wrong and we need to do some cleanup to make sure our data is correct in the end. We therefore remove all rows for the month we are supposed to read before we let the loop start over. Now, the only way I have been able to do this is by creating a copy of our existing file, filter out rows for current month and then write it back as the original file. In the end it looks like this:
Overall it seems to work very nicely when we are unable to trust that our data source will keep our connection open for the whole duration.
Continuing our journey on applying Andy Kirks tips and tricks from his series “The Little of Visualization Design” we are now at part 2, Clever Axis Scaling. As last time I suggest you read his post first so we already have some common ground.
Why use clever axis scaling
Clever axis scaling is a tool in order to create some drama in your visualization. It can also help you highlight values and draw your consumers eye towards it. Things that stand out will get attention, our brain is simple in this regard, and in this case that is what we are after.
In the example the y-axis is set to 50, but the maximum value is 76. Now, my quick thought was “Great! this is easy, lets just set the y-axis to 50 so the helper line is 50”. This is easy to do, however the chart actually gets cut at 50. So it ends up looking the image below, which is not what we want at all. We are now hiding the most interesting data!
So I tried setting it to 100. Now, that works okay, but the highest line on the y-axis is now not 50 so the dramatic effect of 76 shooting way above the last helper line on the y-axis disappears even though we still see the biggest increase on the chart to the left. So what is the solution? You need to try out what works best with your data. In this case it seems to work fine to choose 76, the maximum value in the chart. Now this will not always be the case because we can not control how many lines on the y-axis we get. If I make the chart higher you can see that the y-axis changes with numbers as well.
Use it carefully
This solution also has the drawback that you are hardcoding the minimum and maximum value. So if you suddenly have a value higher than 76 you will loose it! In the end it comes down to what you want to tell with your chart, if your chart is going to change values often and how dramatic you want it. If you have no idea how your numbers will behave in the future I will not advice you to hard code min/max values unless you need it for a specific occation like a presentation. When you are done with that spesific occasion I suggest you turn them back to automatic to minimize confusion.
As with all tools PowerBI has some limitations compared to custom code and for example using something like D3.js where you can do absolutely everything you want! Having these limitation can make it a challenge to use all these tips and tricks going forward, but we will do the best we can! In this case we might have some problems trying to create a more dramatic effect in our storytelling. Both with your axis’ as we have seen here, but also with data labeling as PowerBI does not let you choose which data points to highlight. So if you try to label the highest value without hovering over it it is not possible. Or at least I didn’t manage to, but if you do please let me know how you did it.
Also, if anyone has a way of hiding the ESRI logo in Power BI Desktop please let me know. They are not pretty and are driving me crazy!
Isn’t the beauty of beauty that we all find different things beautiful?
The other day Stephen Few posted a blog post named “Data Is Not Beautiful”. I have strong reasons to believe this was written because we at /r/dataisbeautiful contacted him and wondered if he was interested in doing an AMA, Ask Me Anything, where our users could ask question and he could provide his point of view and thoughts about data visualizations. Now, I have great respect for Stephen Few and I often send people interested in data visualization his way to read his material because he is so black and white and therefore it is easy to grasp what he think good data visualization is. In this blog post however, I think he misses the mark.
My beautiful might not be your beautiful
I have never been a fan of discussions regarding what is beautiful. What is beautiful is completely up to the consumer of some material, which can be music, art, nature or even data. Everyone has different taste and I never feel like these kind of discussion solves anything or contributes to making a disucssion go forward in any sort. The goal often seems to simply try and put themselves above others by claiming that they are not entitled to call anything beautiful. However, isn’t the beauty of beauty that we all find different things beautiful?
Data can indeed be beautiful
There are many things that can be beautiful about data and even though I am understanding Few’s point that this means that it is the attributes about data that is beautiful and not the data itself I find that to be a weird formulation. What about music is beautiful? It’s the combination of attributes, the compositon, volume, chords, etc. All of which is attributes of a music piece, which indeed can be beautiful.
Attributes of data can be things like structure, format, how it is organized and beyond that we can find a story to be beautiful. However, even without the story the data itself can be in what I’d call a beautiful state. It is not often I go to a customer and find beautiful data, but when I do it is a beauitufl sight indeed! Data that is well organized, formated and of good quality. I have no problem of using the word beautiful to describe this phenomenon.
I believe that data can indeed be beautiful, but in the end even if the data is beautiful it has little to no value if all it does is lie hidden within a database or an Excel sheet. Therefore it is our job to take that data and turn it into something useful for example through good visualizations. And when it comes to this I believe this older blog post from Few, “Should Data Visualizations Be Beautiful” is a lot more interesting than his recent post.
On a last note: As the moderator that reached out to Stephen Few and asked if he was interested in doing and AMA I still think it is a shame he declined. I personally find his point of view very interesting and I am sure a lot of users on /r/dataisbeautiful would have thought so too. There is something refreshing about a person speaking so loud about what he believe is right so if he at any time changes his mind we would be very happy to have him do an AMA at /r/dataisbeautiful.
I got some great new during my vacation I got a mail saying I was selected to give a talk at SQLSaturday in Oslo this year. I also got notified by the tweet below which I had completely forgotten from last year. Nice surprise to see I have succesfully made a goal of mine, even though I had forgotten I had set it! I’ve been to every SQLSaturday that has been in Oslo and it has always been a great event so looking forward to be able to contribute with a talk myself this year!
My talk is titled “Data Visualization – More Than a Hygiene Factor”, based on a quote from this Medium Post. You can read my abstract below.
"For many companies data visualization is still a hygiene factor; necessary but not crucial"
In a world where everyone wants to use data to drive their business forward it is important to be able to communicate and speak the language of data even though data itself can be complex. One way of doing this is by making good data visualisations. Good data visualisations are engaging, they are informative and they let your data tell you its story. Too often data visualisation gets a low priority making the final result feeling lacklustered and making the users uninspired.
In this session we look at some data visualisation principles and best practices, in order to deliever your message with a clear point of view and minimize confusion. Lastly we will look at how you can use these practices with Power BI in order to improve how data can be communicated to your end users in the best possible way making them come back over and over.
SQLSaturday is a free 1-day training event for Microsoft Data Platform and SQL Server professionals, providing a variety of high-quality technical sessions. If you work on the Microsoft Data Platform SQLSaturday is a great way to get inspired and hear about new things. You can find more information about SQLSaturday, September 2nd in Oslo here!
Andy Kirk has an excellent series called “The Little of Visualization Design” where he gives small tips and tricks that can improve your data visualizations. If you have not seen it I strongly recommend it. Now, what I am going to try and do every week after summer vacation is to try and show you have you can take these tricks and use them with Power BI. But let’s kick start it now with part 1, dual labeling. I suggest that you read the original post by Andy first so we are at a common ground about what we are going to look at which is this pie chart.
Dual labeling. It is suprisingly normal to see and it generates more cluster on your data visualisation than you need. Repeating something will not make things clearer, it will just create more ink on your graph and make it harder to focus on what’s important.
Now if you punch in the data and create a pie chart in Power BI we get what is shown below.
So Power BI does not provide you with a dual labeling issue at front, but it is quite easy to reproduce it with Power BI. In the “Format” pane you have a bunch of options which usually are great, but you have to use it with care and have a clear vision of why you are changing the original chart if not you can end up with all of these different variations.
The one in the bottom left is probably the closest to the one in the original post. It has dual labeling, and it has quite similar colors on the pie slices. Andy Kirk’s proposed solution is to remove the labeling and provide it directly onto the pie since the colors in the original graph is so similar. Now, that doesn’t sound to far away from the default graph that Power BI provides us with. However the default is not perfect and here is what I would do in order to improve it:
In Label Style choose “Category, data value”. This makes us see the actual number.
Increase font size of detail label.
Increase font size of the title. In general I think all default font sizes in Power BI are too small. I always feel like I need stronger contact lenses when creating a chart…
Sort the chart by value so the slices appear in order of size.
Note: I had originally made the font size of the detail label a bit bigger. However, this made the detail label for Canada disappear. Probably because it would take up the same space as Israel. So I wish they could make the position of the label a bit more dynamic.
In the end we end up with the chart below. So all in all the default chart Power BI created wasn’t too bad, but it could be improved. And make sure you are aware that not all options in the format pane in Power BI makes your data visualisation better, it could make it worse!
I’m looking forward to some weeks of summer and then I’ll continue this series when I am back! Thanks for reading. If you have any questions or feedback drop me a comment, it is greatly appreciated.
I used about 5 minutes in Power BI to create a remake that was a bar chart in order to show why that is a better graph choice to show ratio between different categories. However, as some other users commented that removed some of the complexity which also is quite interesting. The fact that more than 14 000 maps are being played at in a month in a game that is ten years old is quite impressive.
So I ended up starting to think about a remake and asked the user who posted the pie chart if he had the dataset he had used. The internet is full of awesome people and in about one hour I had a csv files with 14 000 datapoints.
I have always gone code first, but I thought this time I would at least try to sketch some ideas on paper. I’m not the greatest at drawing, but that is also not the point. So I made this sketch. Initially I wanted to be able to create an overview which then could be broken down by some interaction. Unfortunately I did not have both the datasets I wanted at the time I built the visualisation, but I do now so maybe I have to revist this some time.
With that design in mind I built the visualisation. I’m not going to go into detail about how I did that here. You can see my cource code here if you want to have a look. The original is interactive so you can hover over a bubble to see more information.
Context, context, context
Now, even though I made an interactive version I ended up deciding to post a static image of the chart. That was a mistake. The original pie chart had created quite a bit of discussion at /r/dataisbeautiful so my head took it for granted that everyone knew what this represented. They did not. Internet is big, and not everyone lives in one subreddit. So many users did not get the background or the context, and even though I had written that in a comment, comments are not read by a lot of Reddit’s users. The chart had no context and as of today the comment with the highest rating still reads: “Looks nice but no fucking idea what anything is representing”.
However, a lot of people did actually get what I was trying to show. How complex and how distrubuted the player base in Team Fortess is. I have gone through the thread and followed up question and such and as long as you can filter out the trolls and take things as constructive feedback you can get some really nice feedback. Where else would you get feedback from many, many thousands of people on your work?
Context is hard on Reddit as a platform. You basically only have the title to explain as many users won’t read anything else, only look at the image.
I should have posted the interactive version. It provides some more detail that people where missing. I also, for some reason, had removed the text stating “hover over a bubble for more information” in the static image. Including that text might have made more people go to the interactive version.
Check more colorblind simulators. I used some simulators, but I think I forgot it with the last version. THere will be colorblind people when you share it with a userbase this big. If you need to test yours you can try this one.
People in general have a lower threshold for saying negative things so you need to be able to filter comments quite hard and which you decide to listen to.
It is written bubble, while boubble, as I had written all over the comments, is a river in France!
All in all, sharing your work is a really good exercise. Working on a project it is easy to go blind, but the internet won’t let you go easy if they see something to pick on.
If you create something you are proud of make sure to post it to /r/dataisbeautiful over at Reddit. We are always looking for original content!
Alternative title: Finding string value from a dimension with highest numeric value
The company i work for uses Yammer and I have founded a Data Visualization group which I am, manually, keeping some statistics about using Power BI. The other day I found myself wanting a dynamic card in Power BI in order to highlight which day was the most active when it came to amount of posts. In order to do this I had to figure out how to make a measure returning not the max value, but the day which had the maximum amount of posts. My usecase is finding a weekday, but maybe you want to see which product is the most popular, which county has the most purchases or which salesperson has the most sales. All these cases should be able to reuse this measure.
My inital thought was to create a table in my calculation and then slice it to return only one row and one column leaving me with one cell which had the day with the most number of posts. I made a measure which used a TOPN returning the row with most posts followed by a SELECTCOLUMN to select only the column which had the weekday in it. Now, this turned out to return me the overall correct day, but it did not work when I added a filter and the POwer BI visualization returned errors so I had to start over.
I have created three measures to solve this.
1) One simple sum of [Number of Posts]:
Number of Posts = SUM(Sheet1[NumberOfPosts])
2) Finding the day with the most posts by using MAXX
MaxPostsPerWeekDay = MAXX(VALUES(Sheet1[WeekDay]);[Number of Posts])
3) Using FIRSTNONBLANK on my WeekDay column and then return the value where the sum equals the maximum value. So in the end my measure looks like this.
Most Popular Day =
ISBLANK([Number of Posts]);
[Number of Posts] =
The first IF is to remove days that has no posts, in my case there is not much activity in the weekends so they will get filtered out. The beauty of this measure is that it is not limited to crads, but also easy to use with your filters and in tables where you would want it.
2017 is here and once again I’ll have a goal of writing here at least one time every month. Last year I failed this goal, but this year? Let’s give it another go.
I have a list of projects I want to do and I’ve finally started to track what I am doing every day. I’ve thought of this for a while as I think it would be interesting to see if I am able to find out if there are parameters in my life that affect things like my mood and evergy level. To do this tracking I’ve made a very simple website that is hosted in Azure with an Azure SQL Database behind to store my data. This was needed to make the registration as easy as possible. I had an Excel sheet the first couple of days before I got this site up and running, and I would not recommand that approach at all if you’re going to do the same. It is a too big effort to open an excel sheet every time you drink something or whatever you want to register. So far I am very happy with how it is working out, but I have started to get quite the backlog of increasing size with ideas for improvements. I have told myself to not change anything during January and then do an evalutation of how I think it is working out and which changes are absolutely needed. I’ve also decided not too look at any of the data until January is over. I don’t want to be too aware of what I am logging in the beginning, but rather try to make logging into somethig I just do and then when I have some months of data I can see if something sticks out from them.
So far I have split registration into two parts. First part is activities during the day. Things such as what I eat, drink, if I work out and for how long etc. The other part is a registration I do at the end of each day. Here I note down my mood, energy level, stress level and overall feeling as well as some small notes about f.ex if I hang out with friends during the day or did 100 math puzzles on an app I downloaded. As mentioned I am hoping to see if there is something that affects f.ex my mood and I am able to more actively apporach these things in the future. Not really sure if this will pop up, but at least I’ll have some more data on myself and what I do during the day.
Ever done something like this or have ideas of things that are worth noting down during the day? Let me know. I know some people take these kind of things to the extreme and I probably won’t do that, but I’m always interested in new input or ideas.
Note at the end: I really like how easy it is to create a web app in Azure and everytime I now do a push to my master branch in my Git reposit the website updates within seconds. Easy to set up a Visual Studio Team Services account and create the reposit to start working right away.
One of the projects I’ve used Datazen was a project that needed operational reporting. These dashboards was to be updated at least every 10 minutes and was placed on big screens around the work area where it was needed. We also created an overview, or landing page, for super users that needed to have quick access to all dashboards without waiting for the dashboards loop on the big screens.
In the beginning this landing page was just a static html site, as we didn’t want people to use the Datazen portal at that point. Every dashboard had a static thumbnail and when a user clicked on a thumbnail they were brought to the respective Datazen dashboard.
This worked fine, but it required every user to go into each dashboard to have a look at the status since the thumbnail did not change. So we decided to spend some time to create a dynamic landing page, or a dashboard of dashboards that showed real time thumbnails of the dashboards. By doing this they couldn’t necesserily see the numbers on all of the dashboards, but they would be able to see status and where things where green or red.
Datazen can be embedded in web pages by the use of iframes. Doing this gave us a more dynamic page that was refreshed when a user entered the site or after a given interval if they stayed on it. What you will notice if you try this is that a click on a dashboard will not open the dashboard itself, but instead you will be able to interact with the dashboard in the iframe. In our case, this was not what we wanted as the purpose was to just see the status and when a user clicked on a dashboard they were sent to that specific dashboard to get more details and in full screen.
In order to solve this we created a new <div> called clickCatcher with the same size as the Datazen “thumbnail”, and made transparent. This allowed us to display the dashboard in the size of a thumbanil, but open the dashboard when it was clicked instead of interacting with the small version. The code for one thumbnail is posted below.
It’s a neat little trick that made us able to create a solution that was easier, an better, for the customer. Too see how you we auto refreshed dashboard pages you can take a look here.