Section 60 Homogenity of paths
(Fiona’s suggestion for a metric)
Scenario 1) 20 people might have said that A links to B and another different 20 might have said that B leads to C with no overlap between the groups of people.
Scenario 2) the same 20 people say that A links to B and that B leads to C.
Conventional ways to combine the information from the different sources would produce the same diagram in both scenarios. But we want the user to see that there is some kind of weakness in scenario 1. Can’t we just show this on the arrows somehow?
In the case of isolated paths with no forks in them, which of course are very rare, this wouldn’t present a big problem. Assume that we are showing, as we do, now the total number of mentions for each section represented by for example the width of the arrow.
We can construct a measure which we could call “homogeneity” for each section. So if in a long path, each of the sections were mentioned by more or less the same people the stretch would have high homogeneity and you could for example show it unbroken. In contrast, for an arrow in which separate sections were mentioned by different groups of sources, it would have low homogeneity and you could show that for example with a very broken arrow with lots of gaps in it. Or by making it almost transparent.
However I don’t think this works in the much more common case when paths are of course constantly diverging and rejoining.
For example, suppose you have an arrow from B to C and then an arrow from C to X and another from C to Y; suppose there was high homogeneity from B to C to X in the sense that both of these sections were mentioned by many sources but the section from C to Y was mentioned by a different bunch of sources. How would you mark this?
The problem is that homogeneity as I’ve described it as a metric of entire paths and so you can’t really show it in individual sections when there are forks, i.e. when one section can have different homogeneities because it is part of different paths.
60.1 (non-) solution 1)
What one could do is see whether there is any clustering within sources rather than within variables. So you might find there is a bunch of people who tend to mention many arrows the same and another bunch of people who mention a different set of arrows. It would certainly be possible to automatically or manually create subgroups of respondents. Then there are ways to show how different sections of paths were mentioned by these different subgroups, but it doesn’t really answer the question.
60.2 Solution 2)
You can certainly report the homogeneity of each individual path in the Report tab of the app, but that would be quite long-winded.
60.3 Solution 3)
You could show the same information interactively. For example when you click on B in the example above, you would see the width of the section from C to Y shrink but the section from B to Y would stay the same. However I am not a big fan of information which you can only discover by twiddling.
60.4 Solution 4)
You might want to summmarise the most important results in Solution 3 e.g. in a legend on the diagram, where you could (automatically) mention individual paths with particularly high or low homogeneity. You could probably develop a metric for the overall homogeneity of a whole diagram, and you might want to mention if a whole diagram had unusually low homogeneity.
60.5 Solution 5)
Allow the viewer to “play” the different sources one by one, if there aren’t too many of them.