Datasets - Descriptions - Tasks
See the history of updates for all the contest materials pages.
We arbitrarily chose 3 types of trees. We could have picked many more (e.g. shallow fixed depth trees, or small trees with 100s of attributes, attributes that vary as a function of depth) but we felt that even though it would have been more complete, it would have also diluted the contest as well as made judging harder and results more difficult to explain. In other words: there is enough to run the InfoVis contest for 10 years just on trees, but we had to start somewhere!
We separated general tasks and tasks specific to particular datasets. The specific tasks are often broad goal setting tasks, but could also be instantiations of the general tasks that highlight the special needs of the datasets.1- See the list of general tasks , i.e. tasks commonly encountered while analyzing tree data: topology tasks, attribute based tasks, and comparison tasks.
2- See the specific tasks in each domain background below.
IMPORTANT: For most questions we do not want a detailed result list but an explanation (or illustration or demonstration) of how the visualization helped you find the answer (or not). For example when we say: "Which nodes have been deleted?" we do not need to see the list of nodes... but we want enough information to judge how the tool helped you see what was deleted.
The simple XML format is spcified in treeml.dtd
We provide a small sample tree , and of course you can look at the datasets themselves as examples.
Download all data files at once : iv03contest_data(+date created as year-month-day).zip (about 7Mg).PHYLOGENIES
Specific tree characteristics
The trees are small binary trees (60 leaf nodes.) Link length is often considered important by researchers using this data. No attributes. On the other hand the analysis can be very complex, and there are no good interactive tools available today for scientists to make hypotheses about the matching of those trees.Application Domain Background and Specific Tasks
Datasets
phylo_A_ABC(+date).xml (about 15Kb)
phylo_B_IM(+date).xml
CLASSIFICATIONS
Specific tree characteristics
The trees are very large (about 200,000 leaf nodes) with large fanouts. There are only tthree attributes, all nominal. Labelling, search and showing results in context is important.Application Domain Background and Specific Tasks
Datasets
classif_A(+date).xml (about 40Mg)
classif_B(+date).xml
NOTE: If you really have to use a subset of the tree because you cannot handle so many nodes, work on the "mammal" subtree.
FILE SYSTEM AND USAGE LOGS
Specific tree characteristics
The trees are large (about 70,000 leaf nodes). Here we have more attributes available, numerical and nominal. Changes between the two trees can be topological changes and attribute value changes. Each file corresponds to a given period during which the usage logs were collected. We provide more than two trees but the focus of the contest remains on pair comparisons so pick the pairs you want.Application Domain Background and Specific Tasks
Datasets
logs_A(+dateposted).xml Period A ending 1-19 (about 20Mg)
logs_B(+dateposted).xml Period B ending 1-25
logs_C(+dateposted).xml Period C ending 2-1
logs_D(+dateposted).xml Period D ending 2-8
NOTE: If you really have to use a subset of the tree because you cannot handle so many nodes, work on the "HCIL" subtree (i.e. everything under /projects/hcil).