Quick look at popular search algorithms

Search Algorithms

If you think that algorithms were a dry subject you learnt way back in college, think again. Search algorithms are still an area where path-breaking research is on. In this article, we will go through some of the most common search algorithms available in the marketplace.

Simple searching algorithms

List search algorithms such as Binary Search algorithms are still in vague, as many C/C++ programmers resort this for array based data mining. There have been many improvements to the original techniques. A binary search algorithm (or binary chop) is a technique for finding a particular value in a linear array, by "ruling out" half of the data at each step.

A binary search finds the median, makes a comparison to determine whether the desired value comes before or after it, and then searches the remaining half in the same manner. Refer code 1.

Code 1

 function binarySearch(a, value, left, right)
    while left ≤ right
        mid := floor((left+right)/2)
        if a[mid] = value
            return mid
        if value < a[mid]
            right := mid-1
            left  := mid+1
    return not found

The simplest search algorithms are the depth first and breadth first algorithms. A breadth-first search (BFS) is a tree search algorithm used for traversing or searching a tree, tree structure or graph.

Check out the psuedo code of breadth-first search (code 2).

Code 2

function breadthFirstSearch (Start, Goal) { 
    while notEmpty(Queue) {
        Node := dequeue(Queue)
        if Node = Goal {
            return Node /*the code below does not get executed*/
        for each Child in Expand(Node) {
            if notVisited(Child) {
                enqueue(Queue, Child)

Depth-first search (DFS) is an algorithm for traversing or searching a tree, tree structure or graph. Intuitively, one starts at the root (selecting some node as the root in the graph case) and explores as far as possible along each branch before backtracking. The psuedo code of depth-first search is as given in code 3.

Code 3

def depthFirstSearch( start, goal ):
    stack = Stack()
    stack.push( start )
    while not stack.empty():
        node = stack.top()
        if node == goal:
            return stack # stack contains path to solution
            child = node.findUnvisitedChild()
            if child == none:
                stack.push( child )

Depth-first and breadth-first searches both have some advantages. Which is best depends on the properties of the problem you are solving. For tree search at least, depth-first search tends to require less memory, as you only need to record nodes on the `current' path. If there are lots of solutions, but all at a comparable ‘depth’ in the tree, then you may hit on a solution having only explored a very small part of the tree. On the other hand, that may not be the best solution. Also, depth-first search may get stuck exploring long (and potentially infinite) blind alleys, when there is in fact a solution path of only one or two steps (We could prevent this by setting a depth limit to our search, but then it wouldn't be exhaustive). So, depth-first is good when there are many possible solutions and you only want one (and you don't care which one). It may be less appropriate when there is only one solution, or if you want the shortest one.

Breadth-first search may use more memory, but will not get stuck in blind alleys and will always find the shortest path first (or at least, the path that involves the least number of steps). This may be ideal when exploring very large search spaces where there is an expected solution that takes a relatively small number of steps or when you are interested in all the solutions (perhaps up to some depth limit).

Best-first is an algorithm that optimizes depth-first search by expanding the most promising node chosen according to some rule.

Judea Pearl (1984) described best-first search as estimating the promise of node n by a "heuristic evaluation function f(n) which, in general, may depend on the description of n, description of the goal, information gathered by the search up to that point and, most important, on any extra knowledge about the problem domain."

Another derivative A* (pronounced “A star”) is a graph search algorithm that finds a path from a given initial node to a given goal node (or one passing a given goal test). It employs a “heuristic estimate” that ranks each node by an estimate of the best route that goes through that node. It visits the nodes in order of this heuristic estimate. The A* algorithm is therefore an example of best-first search.

A* begins at a selected node. Applied to this node is the "cost" of entering this node (usually zero for the initial node). A* then estimates the distance to the goal node from the current node. This estimate and the cost added together are the heuristic, which is assigned to the path leading to this node. The node is then added to a priority queue, often called "open".

The algorithm then removes the next node from the priority queue (because of the way a priority queue works, the node removed will have the lowest heuristic). If the queue is empty, there is no path from the initial node to the goal node and the algorithm stops. If the node is the goal node, A* reconstructs and outputs the successful path and stops. This path reconstruction from the stored closed nodes means it is not necessary to store the path-so-far within each node.

If the node is not the goal node, new nodes are created for all admissible adjoining nodes; the exact way of doing this depends on the problem at hand. For each successive node, A* calculates the "cost" of entering the node and saves it with the node. This cost is calculated from the cumulative sum of costs stored with its ancestors, plus the cost of the operation that reached this new node.

The algorithm also maintains a "closed" list of nodes that have been checked. If a newly generated node is already in this list with an equal or lower cost, no further processing is done on that node or on the path associated with it. If a node in the closed list matches the new one, but has been stored with a higher cost, it is removed from the closed list and processing continues on the new node.

Next, an estimate of the new node's distance to the goal is added to the cost to form the heuristic for that node. This is then added to the "open" priority queue, unless an identical node with lesser or equal heuristic is found there.

Once the above three steps have been repeated for each new adjoining node, the original node taken from the priority queue is added to the "closed" list. The next node is then popped from the priority queue and the process is repeated.

Adversarial search

In games such as chess, there is a game tree of all possible moves by both players and the resulting board configurations, and we can search this tree to find an effective playing strategy. This type of problem has the unique characteristic that we must account for any possible move our opponent might make. To account for this, game playing computer programs, as well as other forms of artificial intelligence like machine planning, often use search algorithms like the minimax algorithm, search tree pruning and alpha-beta pruning.

Minimax theory has been extended to decisions where there is no other player, but where the consequences of decisions depend on unknown facts. For example, deciding to prospect for minerals entails a cost that will be wasted if the minerals are not present, but will bring major rewards if they are. One approach is to treat this as a game against Nature and, using a similar mindset as Murphy's Law, take an approach that minimizes the maximum expected loss, using the same techniques as in the two-person zero-sum games.

No free lunches

The no-free-lunch theorem (NFLT) is a theorem in the field of combinatorial optimization developed by physicists David H. Wolpert and William G. Macready. It states that all algorithms that search for an extremum of a cost function perform exactly the same when averaged over all possible cost functions." Taking its name from the phrase "there ain't no such thing as a free lunch", this theorem explains why, over the set of all mathematically possible problems, each search algorithm will do on average as well as any other. This is due to the bias in each search algorithm, because sometimes the assumptions that the algorithm makes are not the correct ones.

The theorem is used as an argument against using generic searching algorithms such as genetic algorithms and simulated annealing without using as much domain knowledge as possible. It's also been applied to other generic problems.