Dijkstra's algorithm

This one just keeps coming back to haunt you, doesn't it? As the mastermind of this generation of projects once said, ``you should not be awarded a diploma unless you can adequately explain how Dijkstra's works.'' Just Google it. A good resource for Dijkstra's algorithm for Java is here:

renaud.waldura.com/doc/java/dijkstra

Some students may have correctly implemented an efficient shortest/fastest path algorithm in the past. Others have implemented it, but have not made it efficient. Therefore, in order to support shortest path calculations on large data sets within a reasonable amount of time, you will be required to run your shortest path algorithm in $ O(E \log V)$ time. In order to achieve this, you may need to restructure parts of your adjacency list. Note that, in general, insertion/deletion from this structure is $ O(\log n + \log
m)$, where $ n$ is the number of nodes in the graph and $ m$ is the degree of the graph (the max number of edges incident to a single vertex). This represents a binary search to find the correct row of the list to find the starting vertex, followed by a binary search to check for existence of the ending vertex.

Recall that in general, graphs are traversed in depth first order by expanding a start node, PUSHing all the kids onto a stack, and choosing the next node to expand by POPing from the stack. Similarly, traversing a graph in breadth first order is performed by inserting the children of a node into a FIFO queue, and choosing the next node by dequeuing the first element. Furthermore, greedy algorithms attempt to produce a global minimum (or maximum) value by choosing the locally optimal value at every step, and not all greedy algorithms actually converge.

Fortunately, Dijkstra's algorithm is a convergent greedy algorithm which finds minimum length (cost, or weight as appropriate) paths from a given start vertex to all possible destination vertices for graphs with positive edge weights. Since the algorithm converges, if one is only interested in the minimum cost to a single destination, it may be possible to stop the algorithm early. However, it will be possible to pass the stress test without making this type of an optimization. Note that Dijkstra's algorithm visits or expands vertices (our loci) in priority order, where the priority for our project is the weight.

To implement an efficient Dijkstra's algorithm you will need a priority queue, which is implemented Java version 1.5 (java.util). Or, in the absence of Java 1.5, this can be implemented through clever use of a TreeSet/Map and Comparators (the problem with a plain set is that you can't have two elements with the same priority, so you need a way to always break ties) to implement a binary heap (the mere heap of CMSC 351).

In Part 2, you'll use the graph to implement Dijkstra's algorithm to find the shortest path from a given start vertex to all other vertices from that one (often referred to as single-source all-destination algorithm. The core of the algorithm uses a priority queue to supply vertices in increasing order of distance from the vertices that have already been processed. Since the algorithm works for directed graphs and in our project we will use undirected graphs,

The execution time of the algorithm depends on the method used to implement the priority queue, as discussed briefly in the excerpt from a prior spec. Using the binary heap, the expected runtime of Dijkstra's is $ (V+E)\log (V)$, where V is the number of vertices and E is the number of edges. If we instead use a Fibonacci heap, which should appear in Part 3, the expected runtime will e (amortized) $ O(V \log V + E)$. It would be nice if you understand where this bound comes from. We've invited Kevin Conroy, 2004 Dorfman Research Prize winner, to explain:

Allow me to sketch the algorithm to explain the running time (I'm not trying to teach the algorithm here--see your book/class/newsgroup/Google). Every iteration of this algorithm you are guaranteed to find the correct shortest path to exactly one node--so we know right away there will be $ V$ iterations. At each state your graph is split in two sections--the `solved9' section, for which you know the correct distances, and the rest, which have some distance values associated with them which may or may not be accurate--this set is stored in some kind of priority queue. An iteration begins by selecting or dequeing the first node (call it `$ N$') from this queue. This node is the closest to the solved set since elements are removed from the priority queue in increasing order of distance from the solved set. We add the new node to the 'solved' set, and then 'relax' all its edges. Relaxing is when for each node adjacent to $ N$ we check to see if it is faster to get to that node through $ N$ than though its current best known path (prior to inclusion of $ N$ in the solved set). We update distances and back-pointers appropriately (so we know what the shortest path actually is when we finish), and that ends the round. Note that if a node's distance value is changed, its position in the priority queue has to be fixed up somehow. (this is where the magical Fibonacci heap beats the binary heap it's got an advantage in this `fix up' step). Regardless of the choice of heap, an efficient way to update the priority queue to reflect the new 'solved set' is to merely insert an new element into the queue with the new distance, giving multiple copies associated with the same vertex. Then, if a node at the head of the queue has already been added to the `solved' set, you merely discard it and get the next node. (this is the approach I will use in the explanation), or else to remove the old value before reinserting it. Either works, but the binary heap implementation makes the second method difficult.

Now, how long does all this take. There are $ V$ rounds, and in every round we have to pull something out of the front of the priority queue using the binary heap is in $ \Theta(1)$, and using the F-heap is amortized $ \Theta(1)$. Rather than try and deal with how many elements are added to and removed from the queue in any single round, it is easier to think about how many such operations can occur in the life of the algorithm. Every single edge in the graph has exactly one opportunity to add a vertex to the queue (during a relax operation), so there are $ O(E)$ possible insertions.

If we allow multiple entries in the queue corresponding to a single vertex (created by making a new element in the queue when the path length from the as-yet-unknown vertex to the known set changes) then the size of the queue still grows only to $ O(E)$. Each addition to the queue takes time in $ O(\log E)$. That gives $ O(E \log E)$ running time for all queue operations during the life of the algorithm. Together that gives a running time of $ O((V + E) \log E)$ And that's the required running time of your algorithm with the binary heap for Part 1.

As we shall see later in lecture, the Fibonacci heap improves priority queue performance for more complex graphs than those you will see in this project. But, the big win is in merging queues, since merging binary heaps is in $ O(n \log n)$. However, we do not anticipate requiring you to implement a Fibonacci heap this semester; we just wanted to make sure that you knew that other structures exist to build priority queues. The running time is (amortized) $ \Theta(E)$ for all queue operations during the life of the algorithm. Together that gives a running time of $ O(V \log E + E)$, which will be the required running if we were to replace the Java priority queue with one based on a Fibonacci heap.

Please be careful with your implementation details. In particular, do not use a linear time priority queue if you choose to ignore the API. This nailed a lot of people last semester. We will test your project on very large inputs, and it will be clear whether your shortest path works in $ O((V + E) \log E)$ or linear time. If you choose to implement a Fibonacci heap or a leftist heap because you are bored with Part 1, be forewarned. As will be discussed in class, the downside of the f-heap, relative to this course, is that the f-heap is defined in a way that optimizes the f-heap operations for a manage-your-own-memory language. For example, the f-heap is often defined to be a forest of binomial heaps10 implemented as a circular linked list. Since the linked list is not optimal for Java, you are welcomed to explore the implementation of your choice in Java, as long as the resulting structure satisfies the asymptotic performance bounds of the f-heap. One way is to use an arraylist of or arraylists instead of linked lists. Regardless, we'll not be checking the implementation of your priority queue-merely the logarithmic asymptotic order of complexity. So, using the priority queue from the API seems to be the most appropriate choice. Consider this just food for thought.

MM Hugue 2017-10-12