Simpler algorithms that every developer need to know

In this article we take a look at some of the simpler algorithms that every developer need to know. Most of these algorithms are ideas that you learn while in college. However, remembering them is of tremendous use as you grow in your career. Some of them might even be useful in the next job interview you might attend.

1) The Fibonacci Series

 Generating Fibonacci Series is one of the simplest algorithms that you come across as you learn any new language. After you went through the first few steps in getting used to the syntaxes and other quirkiness associated with a language try writing a Fibonacci series algorithm. Simple as it might sound, it is also important that you find an optimal solution using the language of your choice to resolve the algorithm.

Problem

Fibonacci series is used in many practical areas of studies from architecture to anthropology. It was first noted by Leonardo of Piza, a noted mathematician of the middle ages, though it was later found out that Indian mathematicians had studied such series many centuries before Leonardo figured it out.  The series essentially throws up a sequence that was useful in studying the multiplication of different species such as rabbits, bees and even the bovines.

In mathematics, the Fibonacci numbers form a sequence defined by the following recurrence relation:

The numbers form the Fibonacci series: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811…

The code to solve this series using programming is very easy. However a smart programmer will use recursions to generate the series or find a member of the series.

fib( n ) = fib( n - 1 ) + fib( n – 2 )

If your programming language supports generators, finding members in the series using generators is the right way to solve the problem. Here is a piece of Java code that generates the series using recursion.

 

 
public class FibonacciSeries {

        public static void main(String[] args) {
               FibonacciSeries fibonacciSeries = new FibonacciSeries();
               for (int i = 1; i < 11; i++) {
                       System.out.print(fibonacciSeries.fibonacciSeries(i) + ", ");
               }
        }

        private int fibonacciSeries(int n) {
               return n == 1 ? 1 : (n == 2 ? 1 : fibonacciSeries(n - 1)
                               + fibonacciSeries(n - 2));
        }
}

2. Generating prime numbers

 While generating prime numbers is purely an academic algorithm, it is imperative that you remember it. The algorithm was solved by Eratosthenes a Greek Mathematician two millenniums back. The method is known as Eratosthenes’ sieve.

 

Problem: Generate a series of prime numbers or check whether a number is prime or not.

 

Solution:

  1. Write a list of numbers from 2 to the largest number you want to test for primality. Call this List A.
  2. Write the number 2, the first prime number, in another list for primes found. Call this List B.
  3. Strike off 2 and all multiples of 2 from List A.
  4. The first remaining number in the list is a prime number. Write this number into List B.
  5. Strike off this number and all multiples of this number from List A. The crossing-off of multiples can be started at the square of the number, as lower multiples have already been crossed out in previous steps.
  6. Repeat steps 4 through 6 until no more numbers are left in List A.

Though a faster algorithm which involves modulo and quadratic equation called Sieve of Atkin has been evolved to better the Eratosthenes Sieve, the ancient algorithm is good enough unless you are working on high performance math specific project. You can check on www.developeriq.com/snippets to find code for generating prime numbers.

3).Linear Search Algorithms

These days you do not write any search algorithms youself. Almost all modern languages are equipped with built-in utilities which takes care of the search . Many languages have packages or modules that cater to search.

The simplest of Search Algorithms is Linear Search algorithm. It operates by checking every element of a list one at a time in sequence until a match is found. Linear search runs in O(N). If the data are distributed randomly, on average N/2 comparisons will be needed. The best case is that the value is equal to the first element tested, in which case only 1 comparison is needed. The worst case is that the value is not in the list (or is the last item in the list), in which case N comparisons are needed.

Problem: Search for an item in a list

Solution: The pseudo code is self explanatory

For each item in the list:

  Check to see if the item you're looking for matches the item in the list.

    If it matches.
      Return the location where you found it (the index).

    If it does not match.
      Continue searching until you reach the end of the list.

If we get here, we know the item does not exist in the list. Return -1.

Linear Search algorithm is hardly useful in the current scenario, but you must still know this basic algorithm.

4) Binary Search

A binary search algorithm (or binary chop) is a technique for finding a particular value in a sorted list. It makes progressively better guesses, and closes in on the sought value, by comparing an element halfway with what has been determined to be an element too low in the list and one too high in the list. A binary search finds the median element in a list, compares its value to the one you are searching for, and determines if it’s greater than, less than, or equal to the one you want. A guess that turns out to be too high becomes the new top of the list, and one too low the new bottom of the list. The binary search's next guess is halfway between the new list's top and bottom. Pursuing this strategy iteratively, it narrows the search by a factor 2 each time, and finds your value. A binary search is an example of a divide and conquer algorithm (more specifically a decrease and conquer algorithm) and a dichotomic search

Problem: Search for an item in a list

Solution: You continuously divide a list till you reach the value you are searching for. The spuedocode is selfexplanatory. You will find more code on the same on www.developeriq.com/snippets

BinarySearch(A[0..N-1], value, low, high) {

       if (high < low)
           return not_found
       mid = (low + high) / 2

       if (A[mid] > value)
           return BinarySearch(A, value, low, mid-1)
       else if (A[mid] < value)
           return BinarySearch(A, value, mid+1, high)
       else
           return mid
   }

5) Hash Searching and Sorting

A hash table, or a hash map or a dictionary is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person's name), find the corresponding value (e.g. that person's telephone number). It works by transforming the key using a hash function into a hash, a number that is used to index into an array to locate the desired location ("bucket") where the values should be.

Hash tables support the efficient addition of new entries, and the time spent searching for the required data is independent of the number of items stored (i.e. O(1).)

A good hash function is essential for good hash table performance. A poor choice of a hash function is likely to lead to clustering, in which probability of keys mapping to the same hash bucket (i.e. a collision) is significantly greater than would be expected from a random function. Nonzero probability of collisions is inevitable in any hash implementation, but usually the number of operations to resolve collisions scales linearly with the number of keys mapping to the same bucket, so excess collisions will degrade performance significantly. In addition, some hash functions are computationally expensive, so the amount of time (and, in some cases, memory) taken to compute the hash may be burdensome.

Since hashes are essentially key value pairs, searching or sorting has to be based on either key or value. They are no clear fast or efficient hashing algorithms which  works best always.

Problem: Search or Sort Hash Table

Solution:

The primary operation it supports efficiently is a lookup: given a key (e.g. a person’s name), find the corresponding value (e.g. that person’s telephone number). It works by transforming the key using a hash function into a hash, a number that is used to index into an array to locate the desired location (”bucket”) where the values should be.  

6) Binary Tree

A binary tree is a tree data structure in which each node has at most two children. The binary tree is useful in building relationship diagrams which can be implemented through graphs. 

Problem :Searching a  binary tree.

Solution: There are two common methods the DepthFirst and BreadthFirst. The depthfirst can be illustrated with the following piece of pusedo code. The code uses recursion.

dfs(v)
    process(v)
    mark v as visited
    for all vertices i adjacent to v not visited
        dfs(i)

The second option is breadth first. Both algorithms have their own advantages and disadvantages. But Bread First is more uniform and logical, and is often called a sweeping algorithm. BFS is a uninformed search method that aims to expand and examine all nodes of a graph systematically in search of a solution. In other words, it exhaustively searches the entire graph without considering the goal until it finds it. It does not use a heuristic.

From the standpoint of the algorithm, all child nodes obtained by expanding a node are added to a FIFO queue. In typical implementations, nodes that have not yet been examined for their neighbors are placed in some container (such as a queue or linked list) called "open" and then once examined are placed in the container "closed".

Here is a C implementation of the code.

bool BFS(const std::vector& graph, int start, int end) {

  std::queue next;
  std::vector parent(graph.size(), 0); // 0 means filled with zeros.

  parent[start] = -1;
  next.push(start);
  while (!next.empty()) {
    int u = next.front();
    next.pop();
    // Here is the point where you can examine the u th vertex of graph

    // For example:
    if (u == end) return true;

    for (std::vector::const_iterator j = graph[u].out.begin(); j != graph[u].out.end(); ++j) {

      // Look through neighbors.

      int v = *j;
      if (parent[v] == 0) {
        // If v is unvisited.
        parent[v] = u;
        next.push(v);
      }
    }
  }
  return false;
}

Travelling Sales Man

Travelling Sales Man problem is one of the famous problems where a number of solution exist. 
 
The shortest path problem is the problem of finding a path between two vertices such that
 the sum of the weights of its constituent edges is minimized. An example is finding the 
quickest way to get from one location to another on a road map; in this case, the vertices 
represent locations and the edges represent segments of road and are weighted by the time 
needed to travel that segment.
 
There are several solutions to the problem. One of them is Dijkstra's algorithm, named after 
its discoverer, Dutch computer scientist Edsger Dijkstra.
 
 
Problem: To find the shortest path between two vertices in a graph

Solution: The input of the algorithm consists of a weighted directed graph G and a source vertex s in G. We will denote V the set of all vertices in the graph G. Each edge of the graph is an ordered pair of vertices (u,v) representing a connection from vertex u to vertex v. The set of all edges is denoted E. Weights of edges are given by a weight function w: E → [0, ∞); therefore w(u,v) is the cost of moving directly from vertex u to vertex v. The cost of an edge can be thought of as (a generalization of) the distance between those two vertices. The cost of a path between two vertices is the sum of costs of the edges in that path. For a given pair of vertices s and t in V, the algorithm finds the path from s to t with lowest cost (i.e. the shortest path). It can also be used for finding costs of shortest paths from a single vertex s to all other vertices in the graph.

The algorithm works by keeping, for each vertex v, the cost d[v] of the shortest path found so far between s and v. Initially, this value is 0 for the source vertex s (d[s]=0), and infinity for all other vertices, representing the fact that we do not know any path leading to those vertices (d[v]=∞ for every v in V, except s). When the algorithm finishes, d[v] will be the cost of the shortest path from s to v — or infinity, if no such path exists.

The algorithm maintains two sets of vertices S and Q. Set S contains all vertices for which we know that the value d[v] is already the cost of the shortest path and set Q contains all other vertices. Set S is initially empty, and in each step one vertex is moved from Q to S. This vertex is chosen as the vertex with lowest value of d[u]. When a vertex u is moved to S, the algorithm relaxes every outgoing edge (u,v). That is, for each neighbor v of u, the algorithm checks to see if it can improve on the shortest known path to v by first following the shortest path from the source to u, and then traversing the edge (u,v). If this new path is better, the algorithm updates d[v] with the new smaller value.

The pusedocode is self explantory.

function Dijkstra(Graph, source):
        for each vertex v in Graph:           // Initializations
           dist[v] := infinity               // Unknown distance function from s to v
       previous[v] := undefined
            dist[source] := 0                     // Distance from s to s
        Q := copy(Graph)                      // Set of all unvisited vertices
        while Q is not empty:                 // The main loop
           u := extract_min(Q)               // Remove best vertex from priority queue                          
           for each neighbor v of u:
               alt = dist[u] + length(u, v)
               if alt < dist[v]              // Relax (u,v)
                   dist[v] := alt
                   previous[v] := u

8) String Searching Algorithm

String searches are more than useful for almost all applications. While you use regular expressiions to implement these thisdays, there exists more than 12 algorithms that does effective string searching.

Problem: To search string for matches.

Solutiion:The Boyer-Moore algorithm is considered as the most efficient string-matching algorithm in usual applications. A simplified version of it or the entire algorithm is often implemented in text editors for the «search» and «substitute» commands.

The algorithm scans the characters of the pattern from right to left beginning with the rightmost one. In case of a mismatch (or a complete match of the whole pattern) it uses two precomputed functions to shift the window to the right. These two shift functions are called the good-suffix shift (also called matching shift and the bad-character shift (also called the occurrence shift). The below code is an example.

void preBmBc(char *x, int m, int bmBc[]) {

   int i;

   for (i = 0; i < ASIZE; ++i)
      bmBc[i] = m;
   for (i = 0; i < m - 1; ++i)
      bmBc[x[i]] = m - i - 1;
}

void suffixes(char *x, int m, int *suff) {
   int f, g, i;

   suff[m - 1] = m;
   g = m - 1;
   for (i = m - 2; i >= 0; --i) {
      if (i > g && suff[i + m - 1 - f] < i - g)
         suff[i] = suff[i + m - 1 - f];
      else {
         if (i < g)
            g = i;
         f = i;
         while (g >= 0 && x[g] == x[g + m - 1 - f])
            --g;
         suff[i] = f - g;
      }
   }
}

 
void preBmGs(char *x, int m, int bmGs[]) {

   int i, j, suff[XSIZE]; 

   suffixes(x, m, suff); 

   for (i = 0; i < m; ++i)
      bmGs[i] = m;
   j = 0;

   for (i = m - 1; i >= 0; --i)
      if (suff[i] == i + 1)
         for (; j < m - 1 - i; ++j)
            if (bmGs[j] == m)
               bmGs[j] = m - 1 - i;
   for (i = 0; i <= m - 2; ++i)
      bmGs[m - 1 - suff[i]] = m - 1 - i;
}
 

void BM(char *x, int m, char *y, int n) {
   int i, j, bmGs[XSIZE], bmBc[ASIZE];

   /* Preprocessing */
   preBmGs(x, m, bmGs);
   preBmBc(x, m, bmBc); 

   /* Searching */

   j = 0;
   while (j <= n - m) {
      for (i = m - 1; i >= 0 && x[i] == y[i + j]; --i);
      if (i < 0) {
         OUTPUT(j);
         j += bmGs[0];
      }
      else
         j += MAX(bmGs[i], bmBc[y[i + j]] - m + 1 + i);
   }
}

   

9) Phonetic Algorithm

Problem: To provide accurate spellings on phonetically mispelled words.

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for names with the same pronunciation to be encoded to the same representation so that they can be matched despite minor differences in spelling. Soundex is the most widely known of all phonetic algorithms and is often used (incorrectly) as a synonym for "phonetic algorithm". Improvements to Soundex are the basis for many modern phonetic algorithms. 

The Soundex code for a name consists of a letter followed by three numbers: the letter is the first letter of the name, and the numbers encode the remaining consonants. Similar sounding consonants share the same number so, for example, the labial B, F, P and V are all encoded as 1. Vowels can affect the coding, but are never coded directly unless they appear at the start of the name.

The exact algorithm is as follows:

  1. Retain the first letter of the string
  2. Remove all occurrences of the following letters, unless it is the first letter: a, e, h, i, o, u, w, y
  3. Assign numbers to the remaining letters (after the first) as follows:
    • b, f, p, v = 1
    • c, g, j, k, q, s, x, z = 2
    • d, t = 3
    • l = 4
    • m, n = 5
    • r = 6
  4. If two or more letters with the same number were adjacent in the original name (before step 1), or adjacent except for any intervening h and w (American census only), then omit all but the first.
  5. Return the first four characters, right-padding with zeroes if there are fewer than four.

Using this algorithm, both "Robert" and "Rupert" return the same string "R163" while "Rubin" yields "R150".

10) Byzantine General Problem

In computing, the Byzantine is a thought experiment meant to illustrate the pitfalls and design challenges of attempting to coordinate an action by communicating over an unreliable link. Used extensively in

Each division of Byzantine army are directed its own general.  Generals, some of which are traitors, communicate each other by messengers. All loyal generals decide upon the same plan of action  A small number of traitors cannot cause the loyal generals to adopt a bad plan

 The problem can be restated as  All loyal generals receive the same information upon which they will somehow get to the same decision

 The information sent by a loyal general should be used by all the other loyal generals

Solution:  The above problem can be reduced into a series of one commanding general and multiple lieutenants problem - Byzantine Generals Problem :  All loyal lieutenants obey the same order

 If the commanding general is loyal, then every loyal lieutenant obeys the order she sends

Reliability by Majority Voting

 One way to achieve reliability is to have multiple replica of system (or component) and take the majority voting among them

 In order for the majority voting to yield a reliable system, the following two conditions should be satisfied:  All non-faulty components must use the same input value

 If the input unit is non-faulty, then all non-faulty components use the value it provides as input  No solution exists if less than or equal to 2/3 generals are loyal

A Solution with Oral Messages - No Signature

Oral Message Requirements and their Implications

 A1 - Every message that is sent is delivered correctly  The failure of communication medium connecting two components is indistinguishable from component failure

 Line failure just adds one more traitor component

 A2 - The receiver of a message knows who sent it  No switched network is allowed

 The later requirement -- A4 nullifies this constraint

 A3 - The absence of a message can be detected  Timeout mechanism is needed

If less than 1/3 generals are traitors, this problem can be solved

 Algorithm - recursive  Lieutenants recursively forward orders to all the other lieutenants

 Commander's order = majority (v(c), v(1), v(2), ..., v(n))

v(i) = majority (v(i), v(i)(2), v(i)(3), ..., v(i)(n)), 1<= i <= n

 v(i)(j) = majority (v(i)(j), v(i)(j)(3), v(i)(j)(4), ...)

A Solution with Signed Messages

Additional Requirements and their Implications

 A4:  A loyal general's signature cannot be forged

 Anyone can verify the authenticity of a general's signature

 Implication  Digital signature is required

 If at least two generals are loyal, this problem can be solved

 Algorithm - recursive  Lieutenants recursively augment orders with their signature and forward them to all the other lieutenants

 Each lieutenant maintains a set of orders she has received, i.e., the possible sets are:  { attack },

 { wait }, or

 { attack, wait }

 Lieutenant takes action according to the value of the set  { attack, wait } means the commander is a traitor

Network topology or policy could keep a general sending/receiving messages to/from another generalThis constraint makes Byzantine problem more general

Oral Message

 If the communication graph is 3m-regular and less than or equal to m generals are traitors, this problem can be solved  k regular set of neighbors of a node p  the set of all neighbors of p, whose size is k   for any node not in the set, there exists a disjoint path, not passing through the node p, from a node in the set

 k regular graph - every node has k regular set of neighbors

 Algorithm - extension of oral message Lieutenants recursively forward orders to all its k regular neighbors

 Commander's order = majority (v(c), v(1), v(2), ..., v(n))

 v(i) = majority (v(i), v(i)(2), v(i)(3), ..., v(i)(n)), 1<= i <= n

 v(i)(j) = majority (v(i)(j), v(i)(j)(3), v(i)(j)(4), ...)

Signed Message

 If the subgraph of loyal generals is connected, this problem can be solved.

11) Credit Card Number Validation

One of the common algorithms is used to check whether a credit card is valid or not.

Problem: Check whether a credit card is valid or not, by validating the number.

Solution: The Luhn’s algorithm is the most often discussed solution.

In 1954, Hans Luhn of IBM proposed an algorithm to be used as a validity criterion for a given set of numbers. Almost all credit card numbers are generated following this validity criterion…also called as the Luhn check or the Mod 10 check. It goes without saying that the Luhn check is also used to verify a given existing card number. If a credit card number does not satisfy this check, it is not a valid number. For a 16 digit credit card number, the Luhn check can be described as follows:

  1. Starting with the check digit, double the value of every second digit (never double the check digit). For example, in a 16 digit credit card number, double the 15th, 13th, 11th, 9th…digits (digits in odd places). In all, you will need to double eight digits.
  2. If doubling of a number results in a two digit number, add up the digits to get a single digit number. This will result in eight single digit numbers.
  3. Now, replace the digits in the odd places (in the original credit card number) with these new single digit numbers to get a new 16 digit number.
  4. Add up all the digits in this new number. If the final total is perfectly divisible by 10, then the credit card number is valid (Luhn check is satisfied), else it is invalid.

When credit card numbers are generated, the same steps are followed with one minor change. First, the issuer identifier and account numbers are assigned (issuer numbers are fixed for a given financial institution, whereas the account numbers are randomly allocated - I think). Then, the check digit is assumed to be some variable, say X. After this, the above steps are followed, and during the last step, X is chosen in such a way that it satisfies the Luhn check.

Essentially the formula can be summarized as follows

The formula verifies a number against its included check digit, which is usually appended to a partial account number to generate the full account number. This account number must pass the following test:

Starting with the rightmost digit (which is the check digit) and moving left, double the value of every second digit. For any digits that thus become 10 or more, add their digits together as if casting out nines. For example, 1111 becomes 2121, while 8763 becomes 7733 (from 2×6=12 → 1+2=3 and 2×8=16 → 1+6=7).

Add all these digits together. For example, if 1111 becomes 2121, then 2+1+2+1 is 6; and 8763 becomes 7733, so 7+7+3+3 is 20.

If the total ends in 0 (put another way, if the total modulus 10 is congruent to 0), then the number is valid according to the Luhn formula; else it is not valid. So, 1111 is not valid (as shown above, it comes out to 6), while 8763 is valid (as shown above, it comes out to 20).

Consider the example identification number 446-667-651. The first step is to double every other digit, starting with the second-to-last digit and moving left, and sum the digits in the result. The following table shows this step (highlighted rows indicating doubled digits):

Digit

Double

Reduce

Sum of digits

1

 

1

1

5

10

1+0

1

6

 

6

6

7

14

1+4

5

6

 

6

6

6

12

1+2

3

6

 

6

6

4

8

0+8

8

4

 

4

4

Total Sum:

40

The sum of 40 is divided by 10; the remainder is 0, so the number is valid.

A lookup table (i.e. calculate Double, Reduce, and Sum of digits only once and for all) can be used (0123456789 is mapped to 0246813579)

Digit

Double

Reduce

Sum of digits

0

0

0+0

0

1

2

0+2

2

2

4

0+4

4

3

6

0+6

6

4

8

0+8

8

5

10

1+0

1

6

12

1+2

3

7

14

1+4

5

8

16

1+6

7

9

18

1+8

9

The source code in Java should help you understand this better.

public class Luhn {   

        /**       
         * Main entry point for the test harness.    
        *   
         * @param args the command line arguments.   
         */      

        public static void main(String[] args) {      

               if (args.length < 1) {
                       System.err.println("Usage: Luhn number ...");
                       System.exit(1);
               }     

               for (int i = 0; i < args.length; i++) {
                       System.out.print("Number '" + args[i] + "'" );
                       if (!args[i].matches("^\\d{13,19}$")) {

                   System.out.println(" is not a valid credit card number (must be 13-19 digits)");
                       } else if (isValidNumber(args[i])) {
                System.out.println(" is a valid credit card number");

    } else {
                System.out.println(" is not a valid credit card number");
             }
               }
        }

        /**
      
         * Checks whether a string of digits is a valid credit card number according
         * to the Luhn algorithm.
         *
         * 1. Starting with the second to last digit and moving left, double the
         *    value of all the alternating digits. For any digits that thus become
         *    10 or more, add their digits together. For example, 1111 becomes 2121, 
        *    while 8763 becomes 7733 (from (1+6)7(1+2)3).
        *
        * 2. Add all these digits together. For example, 1111 becomes 2121, then
        *    2+1+2+1 is 6; while 8763 becomes 7733, then 7+7+3+3 is 20.
        *
         * 3. If the total ends in 0 (put another way, if the total modulus 10 is
         *    0), then the number is valid according to the Luhn formula, else it is
         *    not valid. So, 1111 is not valid (as shown above, it comes out to 6),
         *    while 8763 is valid (as shown above, it comes out to 20).
         *
         * @param number the credit card number to validate
         * @return true if the number is valid, false otherwise.
         */

        private static boolean isValidNumber(String number) {
               int sum = 0;
               boolean alternate = false;
               for (int i = number.length() - 1; i >= 0; i--) {
                       int n = Integer.parseInt(number.substring(i, i + 1));
                       if (alternate) {
                               n *= 2;
                               if (n > 9) {
                                      n = (n % 10) + 1;
                               }
                       }
                       sum += n;
                       alternate = !alternate;
               }     
      return (sum % 10 == 0);  
        }    
}
      








}