Beyond Bar Charts: Data with Sankey, Circular Packing, and Network Graphs
Updated: 4 days ago
Unconventional visualizations: when and when not to wield their power
Post also available on medium: https://medium.com/towards-data-science/beyond-bar-charts-data-with-sankey-circular-packing-and-network-graphs-fd1d50478b68
If you’ve delved into the world of data analysis, you’re likely familiar with the power of charts like bar graphs, line plots, scatter plots, and pie charts. These visuals not only make data more accessible but also enhance understanding for various audiences — whether they’re stakeholders, customers, or even yourself, seeking insights from the data. However, there are instances where the data complexities demand more intricate and captivating displays.
Imagine this scenario: You’re stepping into the shoes of a fledgling data scientist at our fictional company MM Awesome Data Inc. The management is grappling with the integration of new data sources within the existing data framework and they really need to understand the big picture. While a pie chart might serve a portion of the purpose, imagine the impact and flair of presenting a flowchart, such as a captivating sankey diagram or a dynamic stream graph. This post revolves around such scenarios. In the vast landscape of available data visualizations, there are some hidden gems that often remain underutilized. Recognizing the fact that we can’t talk about all these awesome visualization in one post, we’ll focus on three of them here. So, let’s dive in.
Let’s kick things off by exploring the fascinating world of Sankey Diagrams.
Sankey Diagrams are an incredibly cool way to visualize the flow of data. They offer a unique way to see how things are moving from one stage to another. Imagine understanding the journey of a product from its inception to the final outcome, or how different categories come together or split apart. That’s where Sankey Diagrams shine the brightest.
However, it’s worth mentioning that these diagrams are best suited for scenarios that involve tracking flows, like the examples we mentioned. In other cases, they might not be the best fit. So, let’s dive in and uncover their magic!
Before we jump into the code and explore the practical implementation of the diagram we’ve just seen lets discuss the use cases.
Scenarios best suited for Sankey Diagrams
Sankey diagrams are a fantastic choice if your situation falls into any of these categories:
Data Flow Analysis: This involves illustrating the distribution, transformation, and transitions of various resources, quantities, or general data. They help highlight the major contributors, pathways, and losses within a system.
Spotting Resource Bottlenecks: Sankey diagrams are invaluable tools for evaluating the efficiency of resource utilization. They excel at identifying bottlenecks and suggesting areas that could be optimized.
Scenarios not well suited for Sankey Diagrams
While Sankey diagrams are powerful tools, there are cases where they might not be the best fit:
Numerical Precision: Sankey diagrams prioritize a qualitative and relative representation of flows. If you need precise numerical values, other visualization methods might be more suitable.
Handling Extensive Data: If your dataset involves numerous nodes and connections, a Sankey diagram can quickly become cluttered and challenging to interpret. To overcome this, consider simplifying the data or exploring alternative visualizations like network graphs or hierarchical diagrams.
Alternative Visualizations for Similar Use Cases
From my experience, I’ve found that stream graphs, parallel coordinates plots, and flow charts share a similar purpose with Sankey diagrams. Additionally, some individuals might see networks as closely resembling the functionality of Sankey diagrams.
Now that we have a clearer grasp of when to embrace Sankey diagrams and when to consider other options, let’s delve into the code behind this impressive visualization!
Diving into the Python Implementation of Sankey
Let’s now shift our focus to the Python code that brings the Sankey diagram to life. My goal was to create a comprehensive Sankey diagram with a multitude of features, making it easily adaptable for various levels of complexity.
The following example delves into the analysis of an organization’s data pipeline. It traces the journey of data from its source to its eventual transformation into products or reports. It’s important to note that the data used here is entirely synthetic. Thus, while this visualization serves as an excellent guide for crafting an impressive Sankey diagram, it’s advisable not to infer more than its instructional value in creating captivating visuals.
I use holoview for this demonstration and what’s particularly intriguing about holoview is its capability to generate HTML-based interactive diagrams. That’s definitely a feature worth exploring.
Circular Packing offers a stylish twist on the conventional tree maps, and truth be told, it’s incredibly easy on the eyes. This technique involves nestling circles within circles to portray hierarchical data, and although, you can go as deep as you need, I must admit, things might get quite intricate after two to three layers — unless, of course, you’re considering a visualization that spans an entire wall! Circular packing is an ideal choice when you’re aiming to showcase both the hierarchy and proportions within a dataset.
Now, let’s explore some scenarios where the circular packing really makes a point:
Scenarios best suited for Circular Packing
In considering the appropriate scenarios for circular packing, it’s worth noting that it can essentially fulfill the role of a simple bar chart. However, it really is better to use the bar chart in those scenarios. Let’s uncover the contexts where circular packing truly serves it’s purpose at it’s best:
Hierarchical Data: Circular packing is a great choice for illustrating hierarchical structures up to 2 to 3 levels. It can quite beautifully demonstrates the nesting of categories or groups within one another.
Proportional Representation: When the goal is to showcase proportions within each hierarchy level, circular packing excels. The area of each circle directly corresponds to the value it represents, offering a straightforward means to compare relative sizes.
Scenarios not well suited for Circular Packing
Space Constraints: If your visualization space is limited, circular packing could become cluttered and challenging to decipher — particularly when dealing with numerous hierarchy levels or small data point values, as evident in the illustration to an extent as well. In such cases, alternatives like tree maps or bar charts could offer clearer insights.
Exact Values: Again just like sankey diagrams, this representation is not that effective for precise quantitative values. Estimating exact values from circles even with annotation can be unintuitive and intricate.
Comparing Multiple Sets: This continues from the previous point, when the objective involves comparing multiple hierarchical structures side by side, circular packing could lead to confusion. While it can provide an overview, drawing meaningful conclusions might prove difficult. For these scenarios, bar charts, line charts, or grouped bar charts could be more fitting.
Alternative Visualizations for Similar Use Cases
Depending on the use case grouped bar charts, tree maps, tree diagrams and even pie charts can serve as an alternate to circular packing diagrams.
Now that we’ve talked about the use cases, let’s dive into the code:
Diving into the Python Implementation of Circular Packing
For the practical implementation of Circular Packing, I chose to
In this illustration, I took a specific scenario as an example. The example and synthetic data revolves around the number of students enrolled in data science courses across various domains like Machine Learning, Big Data, and Data Visualizations in a given semester. There are only 3 layers to this scenario but you can observe, even by the third hierarchical layer, circular packing becomes a little difficult to interpret and thus I had do several manual adjustments in the code to maintain visual clarity. Despite the challenges, this representation stands as an incredibly cool visualization and I personally find it intuitively quite easy to grasp.
Network graphs present a powerful means to visually depict relationships between entities — referred to as nodes, along with the connections or relationships that link them — known as edges. The scope of applications for network graphs is quite vast to the extent that a single post won’t be able to fully capture their potential. Moreover, things tend to get rather complex when dealing with network graphs.
In essence, network graphs are useful in scenarios where data relationships aren’t straightforward paths from point A to point B. Whenever there’s a web of complex connections, network graphs are lent as a useful tool of visualization that can reveal intricate relationships in a visually obvious and intuitive manner.
So first let’s move towards the use cases of network graphs
Scenarios best suited for Network Graphs
Although network graphs can render themselves quite useful in many scenarios, a few are the following.
Social Network Analysis: They can be used to visualize relationships between various entities, such as social connections, communication networks, or collaboration networks.
Influence and Centrality: When you want to analyze the influence or centrality of nodes within a network, node attributes can help highlight key entities based on specific characteristics.
Community Clusters: Network graphs can help identify clusters or communities within a dataset. Node attributes can be used to label and distinguish different clusters.
Identifying Dependencies: In scenarios where entities represent tasks or processes, and edges represent dependencies, node attributes can provide information about the duration, resources, status or dependants of each task.
Scenarios not well suited for Network Graphs
While network graphs are powerful, they might not be the best choice for every situation. Some cases where network graphs don’t quite shine:
Simple Hierarchies: If data involves a clear hierarchy and simple parent-child relationships, other visualization methods like hierarichal tree diagrams might offer a more straightforward representation.
Numerical Precision: As with our previous two visuals, again this is not a representation to convey precision in data, while you can add data to your graph, it’d demand a high level of scrutiny compared some of the more suited visualizations.
Many Nodes and Edges: Large and complex networks can become cluttered and difficult to interpret. If the dataset has too many nodes and edges, representing a subset of data will a better choice.
Alternative Visualizations for Similar Use Cases
Several related yet not entirely synonymous visualizations include hierarchical tree diagrams, which offer greater clarity for more sequential hierarchies. Hive plots present another viable alternative, particularly suited for specific types of relationships in data. Furthermore, chord diagrams can elegantly emulate the circular layout found in networkx circular_layout visualizations.
Diving into the Python Implementation of Networks Graph
For the network graph visualization, I aimed for a social network analysis scenario with synthetically generated data. The setup involves 20 influencers, each specializing in distinct niches spanning fashion, fitness, and technology. These influencers possess varying numbers of followers and engagement levels. In the graphical representation, edges depict connections between influencers. Connections within the same niche are shaded in dark gray, while connections bridging different niches are shaded in dark orange. The objective is to assess the behavior of the most followed influencers and potentially gain insights from their interactions with their peers.
It’s important to note that the code itself won’t yield any insights, as the connections are assigned randomly, with a precise choice over the density of the network.
I use the networkx package here, and I dedicated some effort in refining the layout and annotations to ensure wider adaptability for a subset of simpler scenarios.
As we come to the close of this story, I sincerely hope that you gained valuable insights in these visualizations, and now have a clearer picture for their best case use scenarios.
Feel free to share your thoughts and propose other visualization types that you’ve come across or would want to learn more about.