{"id":1185,"date":"2013-05-08T18:01:58","date_gmt":"2013-05-08T18:01:58","guid":{"rendered":"http:\/\/www.ralphstraumann.ch\/blog\/?p=1185"},"modified":"2014-06-17T22:27:53","modified_gmt":"2014-06-17T22:27:53","slug":"creating-a-hexagonal-cartogram","status":"publish","type":"post","link":"https:\/\/www.ralphstraumann.ch\/blog\/2013\/05\/creating-a-hexagonal-cartogram\/","title":{"rendered":"Creating a hexagonal cartogram"},"content":{"rendered":"<p><strong>Some weeks\u00a0ago I visualised the Swiss cantons (states) and their population numbers using what information visualization scientists call a\u00a0<em>linked view<\/em>.\u00a0You can click through to the actual, interactive visualization:\u00a0<a href=\"http:\/\/ralphstraumann.ch\/cartogram_cantons\/\">here in German<\/a>\u00a0or\u00a0<a href=\"http:\/\/ralphstraumann.ch\/cartogram_cantons_fr\">here in French<\/a>. In what follows I&#8217;ll describe the steps of data preparation for this visualization. I decided to keep the specifics on the implementation in D3.js for a <em>third<\/em> post in order to spare your scroll-wheel and -finger (so stay tuned for that one).\u00a0<\/strong><\/p>\n<h1>Intro<\/h1>\n<p>Welcome to the second part of this series in which I describe the production of this linked view with a population cartogram (top right):<\/p>\n<p><a href=\"http:\/\/ralphstraumann.ch\/cartogram_cantons\/\"><img loading=\"lazy\" decoding=\"async\" id=\"i-1075\" class=\" wp-image aligncenter\" src=\"http:\/\/www.ralphstraumann.ch\/blog\/wp-content\/uploads\/2013\/04\/cartogram_of_swiss_cantons1.png\" alt=\"Image\" width=\"504\" height=\"336\" \/><\/a><\/p>\n<p>In case you missed it: in the first post of this series, you can <a href=\"http:\/\/www.ralphstraumann.ch\/blog\/2013\/05\/conceptualisation-of-a-d3-linked-view-with-hexagonal-cartogram\/\">read about the conceptual thinking that went into this visualization<\/a>. But now let&#8217;s dive into some geodata-crunching:<\/p>\n<p><strong style=\"font-size: 1.5rem; line-height: 1.5;\">Technically<\/strong><\/p>\n<h2>GIS pre-processing<\/h2>\n<p>In what follows, I&#8217;ll try to give you a thorough description of my approach at data processing. I&#8217;ll include some screenshots of intermediate results. Obviously, I don&#8217;t know how familiar you are with <strong><a href=\"http:\/\/en.wikipedia.org\/wiki\/GIS\">GIS<\/a> <\/strong>and spatial analysis terminology, so please bear with me if my description is too exhaustive. Conversely, speak up in the comments section, if I have forgotten something or something is not clear. I did all of the GIS analysis in Esri ArcGIS, however, <strong><a href=\"http:\/\/www.qgis.org\/\">any GIS<\/a>\u00a0that can handle vector data will do<\/strong>.<\/p>\n<p>I started off with the following input data:<\/p>\n<ul>\n<li>Outlines of <strong>administrative units<\/strong> (cantons and cities)<\/li>\n<li><strong>Spatially distributed population data<\/strong> from Swiss census<\/li>\n<\/ul>\n<p>The preparation of the administrative units was quite straightforward: I applied a <strong>Union operation<\/strong> in GIS (ArcGIS Help Topic <a href=\"http:\/\/help.arcgis.com\/en\/arcgisdesktop\/10.0\/help\/index.html#\/\/00080000000s000000\">here<\/a>). Then I did some tidying of the attributes and applied a set of <strong>geometric simplifications<\/strong> (polygon outline generalisations). The purpose of these is basically <a href=\"http:\/\/bost.ocks.org\/mike\/simplify\/\">weeding out vertices from the geometries while preserving shape<\/a> as well as possible. The bigger goal being, of course, simplifying the geometries enough for a <strong>fluid web experience<\/strong> down the line.<\/p>\n<p>Swiss census data comes as a point grid at 100 meters resolution. Precise data characteristics don&#8217;t matter too much. And one could also use a thematic variable that comes at the same resolution as the display units \u2013 cantons and cities in this case. While the handling of canton\/city level thematic data would be much easier, the <strong>spatially distributed thematic variable <\/strong>in this case<strong> allows for a more representative cartogram<\/strong>.\u00a0If you wonder why, consider, for example, a US setting: Salt Lake City would cause a big <em>local<\/em> distortion in a cartogram using spatially distributed data, whereas its population would be spread out uniformly throughout all of Utah, if you use state-level data. This effect causes visible differences in the cartogram in regions where population distribution is not spatially uniform.<\/p>\n<p>The <strong>GIS processing chain<\/strong> starts with these steps:<\/p>\n<ul>\n<li><strong>Generation of a grid<\/strong> (in my case at 5 km resolution, but that number is a bit dependent on the resolution of your input data, your area of interest and maybe your application; as a rule of thumb, I&#8217;d suggest a grid resolution that is similar to the size of your hexagons). Any regular tesselation other than a rectangular grid will also do.<\/li>\n<li><strong><a href=\"http:\/\/help.arcgis.com\/en\/arcgisdesktop\/10.0\/help\/index.html#\/\/00080000000s000000\">Union operation<\/a> on the grid cells and the administrative units<\/strong>. This yields smaller spatial analysis units, that follow the boundaries between administrative units.<\/li>\n<li><strong><a href=\"http:\/\/help.arcgis.com\/en\/arcgisdesktop\/10.0\/help\/index.html#\/\/00080000000q000000\">Spatial join<\/a> of thematic variable to the new spatial units<\/strong>. A spatial join is a GIS operation where the spatial relationship of entities in two different datasets is evaluated. If a specified relationship is fulfilled, the characteristics of the features in the join dataset are joined to the features in the target dataset. The spatial relationship for this operation was <strong>containment <\/strong>(i.e. the criterion was:\u00a0<i>is a given census data point within the spatial unit at hand?<\/i>). The join operation encompassed summing up the values. The overall process yields the sum of the population at all census data points which fall within a given spatial analysis unit \u2013 or, without the GIS lingo: the <strong>total population per unit<\/strong>).<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h2>For distortions you need a Scape&#8230; toad<\/h2>\n<p>The resulting data in <a href=\"http:\/\/en.wikipedia.org\/wiki\/Shapefile\">Shapefile format<\/a> was then transferred to the <strong>cartogram software\u00a0<a href=\"http:\/\/scapetoad.choros.ch\/\">Scapetoad<\/a><\/strong>. Scapetoad is a freely available Java software developed in the <a href=\"http:\/\/choros.epfl.ch\/\">Choros Laboratory at EPFL<\/a> in Lausanne. It employs the <strong>diffusion-based cartogram algorithm by <a href=\"http:\/\/www.pnas.org\/content\/101\/20\/7499.short\">Gastner\u2013Newman<\/a><\/strong>. I did several model runs and iteratively tuned the algorithm parameters. That encompassed mainly striking an acceptable balance between subjective quality of the result and cartogram computation time. Unfortunately, I cannot give heuristics for this, you&#8217;ll really simply have to try with your data.<\/p>\n<p>When I was happy with the result, I re-imported the cartogram dataset from Scapetoad into the GIS and used a\u00a0<strong><a href=\"http:\/\/help.arcgis.com\/en\/arcgisdesktop\/10.0\/help\/index.html#\/\/00170000005n000000\">Dissolve operation<\/a>\u00a0to aggregate the units back into regions<\/strong> (again, any GIS will do, but the precise name for the operation may vary).<\/p>\n<figure id=\"attachment_1180\" aria-describedby=\"caption-attachment-1180\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1180\" src=\"http:\/\/www.ralphstraumann.ch\/blog\/wp-content\/uploads\/2013\/05\/cartogram_production_part1.gif\" alt=\"Cartogram production part 1: (1) Preparation of cantons and cities dataset (2) Union of dataset with grid (3) Import into Scapetoad and distorting (4) \" width=\"400\" height=\"273\" \/><figcaption id=\"caption-attachment-1180\" class=\"wp-caption-text\">Cartogram production part 1: (1) Preparation of cantons and cities dataset (2) Union of dataset with grid (3) Import into Scapetoad and distorting (4) Re-import into GIS and dissolving the geometries<\/figcaption><\/figure>\n<h2><!--more-->Enter the hexagons<\/h2>\n<p>After these steps, I used a\u00a0<a style=\"line-height: 1.714285714; font-size: 1rem;\" href=\"http:\/\/www.jennessent.com\/arcgis\/repeat_shapes.htm\">third-party add-on<\/a> to ArcGIS to create a <strong>hexagonal grid<\/strong> (other GISs may have built-in support for creating hexagons). I chose the resolution of this grid to be similar to the one used for creating the spatial units before the spatial join and cartogram generation. I think that is an okay heuristic for dealing with <strong>resolution sensitivity<\/strong>\u00a0or\u00a0<strong>scale issues<\/strong> and <strong><a style=\"line-height: 1.714285714; font-size: 1rem;\" href=\"http:\/\/en.wikipedia.org\/wiki\/Modifiable_areal_unit_problem\">MAUP<\/a><\/strong>\u00a0(each of these can spark long discussions, but I&#8217;ll spare you).<\/p>\n<p>Then I used another spatial join: this time on the distorted geometries and the hexagonal grid. Thus, I could <strong>automatically assign hexagons the respective region code<\/strong>, whenever the hexagons where located completely\u00a0inside a distorted region. I did not use automatic conflict resolution on hexagons located on borders between distorted regions. While doing this would be perfectly possible in GIS, I actually wanted the wiggle room these unassigned hexagons gave me.<\/p>\n<p>To conclude the cartogram generation, I <strong>manually assigned the border hexagons to adequate administrative units<\/strong>. In this subjective approach I employed two important cartographic principles:<\/p>\n<ul>\n<li><strong><span style=\"line-height: 1.714285714; font-size: 1rem;\">shape preservation<\/span><\/strong><\/li>\n<li><strong><span style=\"line-height: 1.714285714; font-size: 1rem;\">topology preservation<\/span><\/strong><\/li>\n<\/ul>\n<p><span style=\"line-height: 1.714285714; font-size: 1rem;\">It may seem odd to talk of <\/span>shape preservation<span style=\"line-height: 1.714285714; font-size: 1rem;\"> in the case of a cartogram (whose point are the distortions), but I hypothesise that, also for cartograms, <strong>preserving some key features helps people<\/strong> appreciate the geometries better. As an example, I maintained the small &#8220;antennae&#8221; of Grisons (near the right\/eastern edge of below graphic) although I thus locally overestimated the population a bit (see graphic below).\u00a0<\/span><span style=\"line-height: 1.714285714; font-size: 1rem;\">I also <strong>overemphasised the bays of the lakes bordering some cities<\/strong>\u00a0(Geneva (bottom-left), Lucerne (below center) and\u00a0Zurich (biggest city)).\u00a0Features such as these are so well-known by people familiar with the geography, that they <\/span><strong style=\"line-height: 1.714285714; font-size: 1rem;\">help <\/strong><strong>those users<\/strong><strong style=\"line-height: 1.714285714; font-size: 1rem;\"> recognise the unusual geometries<\/strong><span style=\"line-height: 1.714285714; font-size: 1rem;\">.<\/span><\/p>\n<figure id=\"attachment_1181\" aria-describedby=\"caption-attachment-1181\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1181\" src=\"http:\/\/www.ralphstraumann.ch\/blog\/wp-content\/uploads\/2013\/05\/cartogram_production_part2.gif\" alt=\"Cartogram production part 2: (1) \" width=\"400\" height=\"278\" \/><figcaption id=\"caption-attachment-1181\" class=\"wp-caption-text\">Cartogram production part 2: (1) Overlay of hexagonal grid onto geometries (2) Spatial join and manual clean-up (3) Dissolving the hexagons into distorted administrative units<\/figcaption><\/figure>\n<p>Examples for<strong> preservation of topology<\/strong>\u00a0were cantons of Obwalden, Nidwalden and Uri (near the center). In an early iteration, these touched the border of Italy in the south, between Valais and Ticino. This is <em>not<\/em> the case in geographic space, though. I manually overrode that configuration in order to replicate the topology of the geographic regions closely, thus also fostering recognition.<\/p>\n<figure id=\"attachment_1182\" aria-describedby=\"caption-attachment-1182\" style=\"width: 400px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-1182\" src=\"http:\/\/www.ralphstraumann.ch\/blog\/wp-content\/uploads\/2013\/05\/cartogram_production_qc.gif\" alt=\"Comparison between the conventional (Gastner-Newman) cartogram and one with the &quot;hexagon treatment&quot;\" width=\"400\" height=\"278\" \/><figcaption id=\"caption-attachment-1182\" class=\"wp-caption-text\">Comparison between the conventional (Gastner-Newman) cartogram and one with the &#8220;hexagon treatment&#8221;<\/figcaption><\/figure>\n<h2>Quality checking and file format conversion<\/h2>\n<p><span style=\"line-height: 1.714285714; font-size: 1rem;\">Throughout these manual interventions, I <\/span><strong style=\"line-height: 1.714285714; font-size: 1rem;\">kept an eye on the quality of the representation<\/strong><span style=\"line-height: 1.714285714; font-size: 1rem;\"> using a simple, <\/span><strong style=\"line-height: 1.714285714; font-size: 1rem;\">dynamically updated scatterplot<\/strong><span style=\"line-height: 1.714285714; font-size: 1rem;\"> that related the number of hexagons per distorted region with the total population of the respective region. The coefficient of correlation, R<\/span><sup>2<\/sup><span style=\"line-height: 1.714285714; font-size: 1rem;\">, started out high and with the end result I achieved a value greater than 0.99, that is the <\/span><strong style=\"line-height: 1.714285714; font-size: 1rem;\">representation in the cartogram was <em>very<\/em> close to the actual population numbers<\/strong><span style=\"line-height: 1.714285714; font-size: 1rem;\">: Nice!<\/span><\/p>\n<p>For the visualization I couldn&#8217;t use the antiquated Shapefiles, but instead opted for the\u00a0<strong><a href=\"https:\/\/github.com\/mbostock\/topojson\">Topojson format<\/a><\/strong>\u00a0by Mike Bostock (who also happens to be the creator of D3)<strong>.<\/strong> Topojson of course plays well with Javascript and thus also D3.\u00a0In my visualization, I wanted to <a href=\"http:\/\/www.ralphstraumann.ch\/blog\/2013\/05\/conceptualisation-of-a-d3-linked-view-with-hexagonal-cartogram\/\">display three datasets<\/a>: the aggregated distorted geometries of cantons and cities, the tiny hexagons which they consist of as well as the\u00a0undistorted geometries for my reference map. Thus,\u00a0I converted all these datasets to Topojson files using\u00a0an online service called\u00a0<strong><a href=\"http:\/\/www.shpescape.com\/\">shpescape<\/a><\/strong>. But other options do exist, such as\u00a0<a href=\"http:\/\/www.gdal.org\/ogr\/\">GDAL\/OGR<\/a>\u00a0(see Mike&#8217;s approach with that tool\u00a0<a href=\"http:\/\/bost.ocks.org\/mike\/map\/#converting-data\">in his tutorial<\/a>).<\/p>\n<p><strong>With the first visualization prototype, a problem became apparent<\/strong>: For the cartogram, the numerous small hexagons were supposed to be loaded and displayed first. Only after, the cantons and cities should be overlaid on them, with a slight transparency. But ever so often the considerably bigger hexagon Topojson file would be loaded and displayed in D3 only after the cantons and cities and thus the hexagons were on top of the latter instead of the other way around. An easy way to avoid this was the <strong>merging of all data files into one<\/strong>\u00a0big file. To that end, I used the following syntax adapted from the afore-mentioned <a href=\"http:\/\/bost.ocks.org\/mike\/map\/#converting-data\">tutorial by Mike<\/a>\u00a0(<a href=\"https:\/\/github.com\/mbostock\/topojson\">topojson<\/a> needs to be installed at this point):<\/p>\n<p><code>topojson -o swiss_regions.json hexagons.json distorted_units.json undistorted_units.json<\/code><\/p>\n<p>&nbsp;<\/p>\n<p><strong>And with the first line of code: That&#8217;s it with data-processing<\/strong>. I started out from a &#8220;normal&#8221; official cantons and cities dataset and Swiss census data. Through <strong>various GIS processing steps<\/strong>, the use of <strong>Scapetoad<\/strong> for distorting, some more GIS including <strong>manual interventions<\/strong> and conversion, I obtained the Topojson file that would be at the core of my visualization.<\/p>\n<p>The manual steps in the above process may seem tedious, but they took maybe an hour at most in my case. It&#8217;s really a question of your setting: complexity of the shapes you&#8217;re dealing with and the size of your hexagons, mostly (one of these you can choose ;).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Some weeks\u00a0ago I visualised the Swiss cantons (states) and their population numbers using what information visualization scientists call a\u00a0linked view.\u00a0You can click through to the actual, interactive visualization:\u00a0here in German\u00a0or\u00a0here in French. In what follows I&#8217;ll describe the steps of data preparation for this visualization. I decided to keep the specifics on the implementation in &hellip; <a href=\"https:\/\/www.ralphstraumann.ch\/blog\/2013\/05\/creating-a-hexagonal-cartogram\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Creating a hexagonal cartogram<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":1132,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[7],"tags":[137,138,52,55,87,125],"class_list":["post-1185","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-note","tag-cartogram","tag-d3","tag-geo","tag-gis","tag-politics","tag-visualization"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/www.ralphstraumann.ch\/blog\/wp-content\/uploads\/2013\/04\/detail_cartogram.png","jetpack_shortlink":"https:\/\/wp.me\/p3pPwF-j7","jetpack_sharing_enabled":true,"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts\/1185","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/comments?post=1185"}],"version-history":[{"count":33,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts\/1185\/revisions"}],"predecessor-version":[{"id":1749,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/posts\/1185\/revisions\/1749"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/media\/1132"}],"wp:attachment":[{"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/media?parent=1185"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/categories?post=1185"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ralphstraumann.ch\/blog\/wp-json\/wp\/v2\/tags?post=1185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}