16 Algorithms you ought to know….

Algorithms in popular sciences is defined as a finite list of well-defined instructions for  accomplishing some task that, given an initial state, will terminate in a defined end-state.

Etymologically originating from Arabic and Latin, the word Algroithm is almost  synonymous with Computer Science. Algorithms are essential to the way computers  process information, because a computer program is essentially an algorithm that tells the  computer what specific steps to perform (in what specific order) in order to carry out a specified task, such as calculating interest on a loan or generating reports on sales  performances. Thus, an algorithm can be considered to be any sequence of operations that can be performed by a Turing-complete system.

When you as developers write code use, develop or get inspired directly or directly from,  a number of algorithms.

In this article, Developer IQ  will present 16 algorithms  that developers need to know……

In this article we take a look at some of the simpler algorithms that every developer need to know. Most of these algorithms are ideas that you learn while in college. However, remembering them is of tremendous use as you grow in your career. Some of them might even be useful in the next job interview you might attend.

1) The Fibonacci Series

 Generating Fibonacci Series is one of the simplest algorithms that you come across as you learn any new language. After you went through the first few steps in getting used to the syntaxes and other quirkiness associated with a language try writing a Fibonacci series algorithm. Simple as it might sound, it is also important that you find an optimal solution using the language of your choice to resolve the algorithm.

Problem

Fibonacci series is used in many practical areas of studies from architecture to anthropology. It was first noted by Leonardo of Piza, a noted mathematician of the middle ages, though it was later found out that Indian mathematicians had studied such series many centuries before Leonardo figured it out.  The series essentially throws up a sequence that was useful in studying the multiplication of different species such as rabbits, bees and even the bovines.

In mathematics, the Fibonacci numbers form a sequence defined by the following recurrence relation:

The numbers form the Fibonacci series: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811…

The code to solve this series using programming is very easy. However a smart programmer will use recursions to generate the series or find a member of the series.

fib( n ) = fib( n - 1 ) + fib( n – 2 )

If your programming language supports generators, finding members in the series using generators is the right way to solve the problem. Here is a piece of Java code that generates the series using recursion.

public class FibonacciSeries {

 


        public static void main(String[] args) {                FibonacciSeries fibonacciSeries = new FibonacciSeries();                for (int i = 1; i < 11; i++) {                        System.out.print(fibonacciSeries.fibonacciSeries(i) + ", ");                }         }

        private int fibonacciSeries(int n) {                return n == 1 ? 1 : (n == 2 ? 1 : fibonacciSeries(n - 1)                                + fibonacciSeries(n - 2));         } }

2. Generating prime numbers

While generating prime numbers is purely an academic algorithm, it is imperative that you remember it. The algorithm was solved by Eratosthenes a Greek Mathematician two millenniums back. The method is known as Eratosthenes’ sieve.

  Problem: Generate a series of prime numbers or check whether a number is prime or not.

  Solution:

  1. Write a list of numbers from 2 to the largest number you want to test for primality. Call this List A.
  2. Write the number 2, the first prime number, in another list for primes found. Call this List B.
  3. Strike off 2 and all multiples of 2 from List A.
  4. The first remaining number in the list is a prime number. Write this number into List B.
  5. Strike off this number and all multiples of this number from List A. The crossing-off of multiples can be started at the square of the number, as lower multiples have already been crossed out in previous steps.
  6. Repeat steps 4 through 6 until no more numbers are left in List A.

  Though a faster algorithm which involves modulo and quadratic equation called Sieve of Atkin has been evolved to better the Eratosthenes Sieve, the ancient algorithm is good enough unless you are working on high performance math specific project. You can check on www.developeriq.com/snippets to find code for generating prime numbers.

3).Linear Search Algorithms

These days you do not write any search algorithms youself. Almost all modern languages are equipped with built-in utilities which takes care of the search . Many languages have packages or modules that cater to search.

The simplest of Search Algorithms is Linear Search algorithm. It operates by checking every element of a list one at a time in sequence until a match is found. Linear search runs in O(N). If the data are distributed randomly, on average N/2 comparisons will be needed. The best case is that the value is equal to the first element tested, in which case only 1 comparison is needed. The worst case is that the value is not in the list (or is the last item in the list), in which case N comparisons are needed.

Problem: Search for an item in a list

Solution: The pseudo code is self explanatory

For each item in the list:
  Check to see if the item you're looking for matches the item in the list.

    If it matches.
      Return the location where you found it (the index).
    If it does not match.
      Continue searching until you reach the end of the list. 
If we get here, we know the item does not exist in the list. Return -1.

Linear Search algorithm is hardly useful in the current scenario, but you must still know this basic algorithm.

4) Binary Search

A binary search algorithm (or binary chop) is a technique for finding a particular value in a sorted list. It makes progressively better guesses, and closes in on the sought value, by comparing an element halfway with what has been determined to be an element too low in the list and one too high in the list. A binary search finds the median element in a list, compares its value to the one you are searching for, and determines if it’s greater than, less than, or equal to the one you want. A guess that turns out to be too high becomes the new top of the list, and one too low the new bottom of the list. The binary search's next guess is halfway between the new list's top and bottom. Pursuing this strategy iteratively, it narrows the search by a factor 2 each time, and finds your value. A binary search is an example of a divide and conquer algorithm (more specifically a decrease and conquer algorithm) and a dichotomic search

Problem: Search for an item in a list

Solution: You continuously divide a list till you reach the value you are searching for. The spuedocode is selfexplanatory. You will find more code on the same on www.developeriq.com/snippets

BinarySearch(A[0..N-1], value, low, high) {
       if (high < low)
           return not_found
       mid = (low + high) / 2
       if (A[mid] > value)
           return BinarySearch(A, value, low, mid-1)
       else if (A[mid] < value)
           return BinarySearch(A, value, mid+1, high)
       else
           return mid
   }

5) Hash Searching and Sorting

A hash table, or a hash map or a dictionary is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person's name), find the corresponding value (e.g. that person's telephone number). It works by transforming the key using a hash function into a hash, a number that is used to index into an array to locate the desired location ("bucket") where the values should be.

 

Hash tables support the efficient addition of new entries, and the time spent searching for the required data is independent of the number of items stored (i.e. O(1).)

A good hash function is essential for good hash table performance. A poor choice of a hash function is likely to lead to clustering, in which probability of keys mapping to the same hash bucket (i.e. a collision) is significantly greater than would be expected from a random function. Nonzero probability of collisions is inevitable in any hash implementation, but usually the number of operations to resolve collisions scales linearly with the number of keys mapping to the same bucket, so excess collisions will degrade performance significantly. In addition, some hash functions are computationally expensive, so the amount of time (and, in some cases, memory) taken to compute the hash may be burdensome.

Since hashes are essentially key value pairs, searching or sorting has to be based on either key or value. They are no clear fast or efficient hashing algorithms which  works best always.

Problem: Search or Sort Hash Table

Solution:

The primary operation it supports efficiently is a lookup: given a key (e.g. a person’s name), find the corresponding value (e.g. that person’s telephone number). It works by transforming the key using a hash function into a hash, a number that is used to index into an array to locate the desired location (”bucket”) where the values should be.  

6) Binary Tree

 A binary tree is a tree data structure in which each node has at most two children. The binary tree is useful in building relationship diagrams which can be implemented through graphs.

Problem :Searching a  binary tree.

Solution: There are two common methods the DepthFirst and BreadthFirst. The depthfirst can be illustrated with the following piece of pusedo code. The code uses recursion.

dfs(v)
    process(v)
    mark v as visited
    for all vertices i adjacent to v not visited
        dfs(i)

The second option is breadth first. Both algorithms have their own advantages and disadvantages. But Bread First is more uniform and logical, and is often called a sweeping algorithm. BFS is a uninformed search method that aims to expand and examine all nodes of a graph systematically in search of a solution. In other words, it exhaustively searches the entire graph without considering the goal until it finds it. It does not use a heuristic.

From the standpoint of the algorithm, all child nodes obtained by expanding a node are added to a FIFO queue. In typical implementations, nodes that have not yet been examined for their neighbors are placed in some container (such as a queue or linked list) called "open" and then once examined are placed in the container "closed".

 Here is a C implementation of the code.

bool BFS(const std::vector<Vertex>& graph, int start, int end) {
  std::queue<int> next;
  std::vector<int> parent(graph.size(), 0); // 0 means filled with zeros.
  parent[start] = -1;
  next.push(start);
  while (!next.empty()) {
    int u = next.front();
   next.pop();
    // Here is the point where you can examine the u th vertex of graph
    // For example:
    if (u == end) return true;
    for (std::vector<int>::const_iterator j = graph[u].out.begin(); j != graph[u].out.end(); ++j) {
      // Look through neighbors.
      int v = *j;
      if (parent[v] == 0) {
        // If v is unvisited.
        parent[v] = u;
        next.push(v);
     }
    }
  }
  return false;
}

Travelling Sales Man

Travelling Sales Man problem is one of the famous problems where a number of solution exist. 
 The shortest path problem is the problem of finding a path between two vertices such that 
the sum of the weights of its constituent edges is minimized. An example is finding the quickest
way to get from one location to another on a road map; in this case, the vertices represent locations
and the edges represent segments of road and are weighted by the time needed to travel that segment.
 There are several solutions to the problem. One of them is Dijkstra's algorithm, named after 
its discoverer, Dutch computer scientist Edsger Dijkstra.
Problem: To find the shortest path between two vertices in a graph

Solution: The input of the algorithm consists of a weighted directed graph G and a source vertex s in G. We will denote V the set of all vertices in the graph G. Each edge of the graph is an ordered pair of vertices (u,v) representing a connection from vertex u to vertex v. The set of all edges is denoted E. Weights of edges are given by a weight function w: E → [0, ∞); therefore w(u,v) is the cost of moving directly from vertex u to vertex v. The cost of an edge can be thought of as (a generalization of) the distance between those two vertices. The cost of a path between two vertices is the sum of costs of the edges in that path. For a given pair of vertices s and t in V, the algorithm finds the path from s to t with lowest cost (i.e. the shortest path). It can also be used for finding costs of shortest paths from a single vertex s to all other vertices in the graph.

The algorithm works by keeping, for each vertex v, the cost d[v] of the shortest path found so far between s and v. Initially, this value is 0 for the source vertex s (d[s]=0), and infinity for all other vertices, representing the fact that we do not know any path leading to those vertices (d[v]=∞ for every v in V, except s). When the algorithm finishes, d[v] will be the cost of the shortest path from s to v — or infinity, if no such path exists.

The algorithm maintains two sets of vertices S and Q. Set S contains all vertices for which we know that the value d[v] is already the cost of the shortest path and set Q contains all other vertices. Set S is initially empty, and in each step one vertex is moved from Q to S. This vertex is chosen as the vertex with lowest value of d[u]. When a vertex u is moved to S, the algorithm relaxes every outgoing edge (u,v). That is, for each neighbor v of u, the algorithm checks to see if it can improve on the shortest known path to v by first following the shortest path from the source to u, and then traversing the edge (u,v). If this new path is better, the algorithm updates d[v] with the new smaller value.

The pusedocode is self explantory.

 function Dijkstra(Graph, source):

        for each vertex v in Graph:           // Initializations
           dist[v] := infinity               // Unknown distance function from s to v
       previous[v] := undefined
            dist[source] := 0                     // Distance from s to s
        Q := copy(Graph)                      // Set of all unvisited vertices
        while Q is not empty:                 // The main loop
           u := extract_min(Q)               // Remove best vertex from priority queue                          
           for each neighbor v of u:
               alt = dist[u] + length(u, v)
               if alt < dist[v]              // Relax (u,v)
                   dist[v] := alt
                   previous[v] := u

8) String Searching Algorithm

String searches are more than useful for almost all applications. While you use regular expressiions to implement these thisdays, there exists more than 12 algorithms that does effective string searching.

Problem: To search string for matches.

Solutiion:The Boyer-Moore algorithm is considered as the most efficient string-matching algorithm in usual applications. A simplified version of it or the entire algorithm is often implemented in text editors for the «search» and «substitute» commands.

The algorithm scans the characters of the pattern from right to left beginning with the rightmost one. In case of a mismatch (or a complete match of the whole pattern) it uses two precomputed functions to shift the window to the right. These two shift functions are called the good-suffix shift (also called matching shift and the bad-character shift (also called the occurrence shift). The below code is an example.

void preBmBc(char *x, int m, int bmBc[]) {   
int i;
   for (i = 0; i < ASIZE; ++i)
      bmBc[i] = m;
   for (i = 0; i < m - 1; ++i)
      bmBc[x[i]] = m - i - 1;
} 
void suffixes(char *x, int m, int *suff) {
   int f, g, i;
   suff[m - 1] = m;
   g = m - 1;
   for (i = m - 2; i >= 0; --i) {
      if (i > g && suff[i + m - 1 - f] < i - g)
         suff[i] = suff[i + m - 1 - f];

      else {
         if (i < g)
            g = i;
         f = i;
         while (g >= 0 && x[g] == x[g + m - 1 - f])
            --g;
         suff[i] = f - g;
      }
   }
}
void preBmGs(char *x, int m, int bmGs[]) {
   int i, j, suff[XSIZE];
   suffixes(x, m, suff);
   for (i = 0; i < m; ++i)
      bmGs[i] = m;
   j = 0;
   for (i = m - 1; i >= 0; --i)
      if (suff[i] == i + 1)
         for (; j < m - 1 - i; ++j)
            if (bmGs[j] == m)
               bmGs[j] = m - 1 - i;

   for (i = 0; i <= m - 2; ++i)
      bmGs[m - 1 - suff[i]] = m - 1 - i;
}
void BM(char *x, int m, char *y, int n) {
   int i, j, bmGs[XSIZE], bmBc[ASIZE];
   /* Preprocessing */
   preBmGs(x, m, bmGs);
   preBmBc(x, m, bmBc);
   /* Searching */
   j = 0;
   while (j <= n - m) {
      for (i = m - 1; i >= 0 && x[i] == y[i + j]; --i);
      if (i < 0) {
         OUTPUT(j);
         j += bmGs[0];
      }
      else
         j += MAX(bmGs[i], bmBc[y[i + j]] - m + 1 + i);
   }
} 

9) Phonetic Algorithm

Problem: To provide accurate spellings on phonetically mispelled words.

Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for names with the same pronunciation to be encoded to the same representation so that they can be matched despite minor differences in spelling. Soundex is the most widely known of all phonetic algorithms and is often used (incorrectly) as a synonym for "phonetic algorithm". Improvements to Soundex are the basis for many modern phonetic algorithms. 

The Soundex code for a name consists of a letter followed by three numbers: the letter is the first letter of the name, and the numbers encode the remaining consonants. Similar sounding consonants share the same number so, for example, the labial B, F, P and V are all encoded as 1. Vowels can affect the coding, but are never coded directly unless they appear at the start of the name.

The exact algorithm is as follows:

  1. Retain the first letter of the string
  2. Remove all occurrences of the following letters, unless it is the first letter: a, e, h, i, o, u, w, y
  3. Assign numbers to the remaining letters (after the first) as follows:
    • b, f, p, v = 1
    • c, g, j, k, q, s, x, z = 2
    • d, t = 3
    • l = 4
    • m, n = 5
    • r = 6
  4. If two or more letters with the same number were adjacent in the original name (before step 1), or adjacent except for any intervening h and w (American census only), then omit all but the first.
  5. Return the first four characters, right-padding with zeroes if there are fewer than four.

Using this algorithm, both "Robert" and "Rupert" return the same string "R163" while "Rubin" yields "R150".

10) Byzantine General Problem

In computing, the Byzantine is a thought experiment meant to illustrate the pitfalls and design challenges of attempting to coordinate an action by communicating over an unreliable link. Used extensively in

Each division of Byzantine army are directed its own general.  Generals, some of which are traitors, communicate each other by messengers. All loyal generals decide upon the same plan of action  A small number of traitors cannot cause the loyal generals to adopt a bad plan

 The problem can be restated as  All loyal generals receive the same information upon which they will somehow get to the same decision

 The information sent by a loyal general should be used by all the other loyal generals

Solution:  The above problem can be reduced into a series of one commanding general and multiple lieutenants problem - Byzantine Generals Problem :  All loyal lieutenants obey the same order

 If the commanding general is loyal, then every loyal lieutenant obeys the order she sends

Reliability by Majority Voting

 One way to achieve reliability is to have multiple replica of system (or component) and take the majority voting among them

 In order for the majority voting to yield a reliable system, the following two conditions should be satisfied:  All non-faulty components must use the same input value

 If the input unit is non-faulty, then all non-faulty components use the value it provides as input  No solution exists if less than or equal to 2/3 generals are loyal

A Solution with Oral Messages - No Signature

Oral Message Requirements and their Implications

 A1 - Every message that is sent is delivered correctly  The failure of communication medium connecting two components is indistinguishable from component failure

 Line failure just adds one more traitor component

 A2 - The receiver of a message knows who sent it  No switched network is allowed

 The later requirement -- A4 nullifies this constraint

 A3 - The absence of a message can be detected  Timeout mechanism is needed

If less than 1/3 generals are traitors, this problem can be solved

 Algorithm - recursive  Lieutenants recursively forward orders to all the other lieutenants

 Commander's order = majority (v(c), v(1), v(2), ..., v(n))

v(i) = majority (v(i), v(i)(2), v(i)(3), ..., v(i)(n)), 1<= i <= n

 v(i)(j) = majority (v(i)(j), v(i)(j)(3), v(i)(j)(4), ...)

A Solution with Signed Messages

Additional Requirements and their Implications

 A4:  A loyal general's signature cannot be forged

 Anyone can verify the authenticity of a general's signature

 Implication  Digital signature is required

 If at least two generals are loyal, this problem can be solved

 Algorithm - recursive  Lieutenants recursively augment orders with their signature and forward them to all the other lieutenants

 Each lieutenant maintains a set of orders she has received, i.e., the possible sets are:  { attack },

 { wait }, or
 { attack, wait }

 Lieutenant takes action according to the value of the set  { attack, wait } means the commander is a traitor

Network topology or policy could keep a general sending/receiving messages to/from another generalThis constraint makes Byzantine problem more general

Oral Message

 If the communication graph is 3m-regular and less than or equal to m generals are traitors, this problem can be solved  k regular set of neighbors of a node p  the set of all neighbors of p, whose size is k   for any node not in the set, there exists a disjoint path, not passing through the node p, from a node in the set

 k regular graph - every node has k regular set of neighbors

 Algorithm - extension of oral message Lieutenants recursively forward orders to all its k regular neighbors

 Commander's order = majority (v(c), v(1), v(2), ..., v(n))

 v(i) = majority (v(i), v(i)(2), v(i)(3), ..., v(i)(n)), 1<= i <= n
 v(i)(j) = majority (v(i)(j), v(i)(j)(3), v(i)(j)(4), ...)

Signed Message

 If the subgraph of loyal generals is connected, this problem can be solved.

11) Credit Card Number Validation

One of the common algorithms is used to check whether a credit card is valid or not.

Problem: Check whether a credit card is valid or not, by validating the number.

Solution: The Luhn’s algorithm is the most often discussed solution.

In 1954, Hans Luhn of IBM proposed an algorithm to be used as a validity criterion for a given set of numbers. Almost all credit card numbers are generated following this validity criterion…also called as the Luhn check or the Mod 10 check. It goes without saying that the Luhn check is also used to verify a given existing card number. If a credit card number does not satisfy this check, it is not a valid number. For a 16 digit credit card number, the Luhn check can be described as follows:

  1. Starting with the check digit, double the value of every second digit (never double the check digit). For example, in a 16 digit credit card number, double the 15th, 13th, 11th, 9th…digits (digits in odd places). In all, you will need to double eight digits.
  2. If doubling of a number results in a two digit number, add up the digits to get a single digit number. This will result in eight single digit numbers.
  3. Now, replace the digits in the odd places (in the original credit card number) with these new single digit numbers to get a new 16 digit number.
  4. Add up all the digits in this new number. If the final total is perfectly divisible by 10, then the credit card number is valid (Luhn check is satisfied), else it is invalid.

When credit card numbers are generated, the same steps are followed with one minor change. First, the issuer identifier and account numbers are assigned (issuer numbers are fixed for a given financial institution, whereas the account numbers are randomly allocated - I think). Then, the check digit is assumed to be some variable, say X. After this, the above steps are followed, and during the last step, X is chosen in such a way that it satisfies the Luhn check.

Essentially the formula can be summarized as follows

The formula verifies a number against its included check digit, which is usually appended to a partial account number to generate the full account number. This account number must pass the following test:

Starting with the rightmost digit (which is the check digit) and moving left, double the value of every second digit. For any digits that thus become 10 or more, add their digits together as if casting out nines. For example, 1111 becomes 2121, while 8763 becomes 7733 (from 2×6=12 → 1+2=3 and 2×8=16 → 1+6=7).

Add all these digits together. For example, if 1111 becomes 2121, then 2+1+2+1 is 6; and 8763 becomes 7733, so 7+7+3+3 is 20.

If the total ends in 0 (put another way, if the total modulus 10 is congruent to 0), then the number is valid according to the Luhn formula; else it is not valid. So, 1111 is not valid (as shown above, it comes out to 6), while 8763 is valid (as shown above, it comes out to 20).

Consider the example identification number 446-667-651. The first step is to double every other digit, starting with the second-to-last digit and moving left, and sum the digits in the result. The following table shows this step (highlighted rows indicating doubled digits):

Digit

Double

Reduce

Sum of digits

1

 

1

1

5

10

1+0

1

6

 

6

6

7

14

1+4

5

6

 

6

6

6

12

1+2

3

6

 

6

6

4

8

0+8

8

4

 

4

4

Total Sum:

40

The sum of 40 is divided by 10; the remainder is 0, so the number is valid.

A lookup table (i.e. calculate Double, Reduce, and Sum of digits only once and for all) can be used (0123456789 is mapped to 0246813579)

Digit

Double

Reduce

Sum of digits

0

0

0+0

0

1

2

0+2

2

2

4

0+4

4

3

6

0+6

6

4

8

0+8

8

5

10

1+0

1

6

12

1+2

3

7

14

1+4

5

8

16

1+6

7

9

18

1+8

9

The source code in Java should help you understand this better.

public class Luhn {
        /**
        * Main entry point for the test harness.         *
         * @param args the command line arguments.
        */
        public static void main(String[] args) {
               if (args.length < 1) {                        System.err.println("Usage: Luhn number ...");                        System.exit(1);                }
                for (int i = 0; i < args.length; i++) {
                       System.out.print("Number '" + args[i] + "'" );
                       if (!args[i].matches("^\\d{13,19}$")) {
                               System.out.println(" is not a valid credit card number (must be 13-19 digits)");
                       } else if (isValidNumber(args[i])) {
                               System.out.println(" is a valid credit card number");
                       } else {
                               System.out.println(" is not a valid credit card number");
                       }
               }
        }
         /**
         * Checks whether a string of digits is a valid credit card number according
         * to the Luhn algorithm.
         *
         * 1. Starting with the second to last digit and moving left, double the
        *    value of all the alternating digits. For any digits that thus become
        *    10 or more, add their digits together. For example, 1111 becomes 2121,         *    while 8763 becomes 7733 (from (1+6)7(1+2)3).         *
         * 2. Add all these digits together. For example, 1111 becomes 2121, then
        *    2+1+2+1 is 6; while 8763 becomes 7733, then 7+7+3+3 is 20.         *
         * 3. If the total ends in 0 (put another way, if the total modulus 10 is
        *    0), then the number is valid according to the Luhn formula, else it is
         *    not valid. So, 1111 is not valid (as shown above, it comes out to 6),
        *    while 8763 is valid (as shown above, it comes out to 20).         *
         * @param number the credit card number to validate.
        * @return true if the number is valid, false otherwise.         */
        private static boolean isValidNumber(String number) {
               int sum = 0; 
               boolean alternate = false;
               for (int i = number.length() - 1; i >= 0; i--) {                        int n = Integer.parseInt(number.substring(i, i + 1));                        if (alternate) {                                n *= 2;                                if (n > 9) {                                       n = (n % 10) + 1;                                }                        }
                       sum += n;
                       alternate = !alternate;               
               return (sum % 10 == 0);
        }
}

12 & 13 Sorting Algorithms

The idea is simple. You need to sort a list of numbers. There are 10-12 major algorithms, and several variants of these algorithms.

Problem: Sort a List of Numbers.

Solution1:

Pigeonhole sorting, also known as count sort , is a sorting algorithm that takes (Θ(n + N)) time, where n is the number of numbers to be sorted and N is the number of "pigeonholes", or possible values of the numbers. When N is O(n), the algorithm runs in linear time (i.e., O((n)), which the best possible performance for a sorting algorithm since one must inevitably look at each of the elements being sorted at least once, regardless of the sorting algorithm.

The pigeonhole algorithm works as follows.

Set up an array of initially empty "pigeonholes", one pigeonhole for each value in the range of keys.
Go over the original array, putting each object in its pigeonhole.

Iterate over the pigeonhole array in order, and put elements from non-empty holes back into the original array.
The hidden constant for this algorithm depends critically on the density of the elements in the pigeonhole array. If there are many more array elements than items to be sorted, steps 1 and 3 will be relatively slow.

Pigeonhole sort is rarely used as the requirements are rarely met and other, more flexible, and almost as fast, sorting algorithms are easier to use. In particular, the bucket sort is a more practical variation on the pigeonhole sort.

A well-known variant of the pigeonhole sort is the tally sort; while it is only applicable to a very limited number of problems, it is widely familiar because of its use in the book Programming Pearls as an example of an unconventional solution to a particular set of limitations.
Bucket sort works as follows:

  1. Set up an array of initially empty "buckets" the size of the range.
  2. Go over the original array, putting each object in its bucket.
  3. Sort each non-empty bucket.
  4. Put elements from non-empty buckets back into the original array
function bucket-sort(array, n) is
  buckets ← new array of n empty lists
  for i = 0 to (length(array)-1) do
    insert array[i] into buckets[msbits(array[i], k)]
  for i = 0 to n - 1 do
    next-sort(buckets[i])
  return the concatenation of buckets[0], ..., buckets[n-1]

Solution2: QuickSort

Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, on average, makes  (big O notation) comparisons to sort n items. However, in the worst case, it makes Θ(n2) comparisons. Typically, quicksort is significantly faster in practice than other  algorithms, because its inner loop can be efficiently implemented on most architectures, and in most real-world data it is possible to make design choices which minimize the possibility of requiring quadratic time.

The steps are:
1. Pick an element, called a pivot, from the list.

2. Reorder the list so that all elements which are less than the pivot come before the pivot and so that all elements greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation.

3. Recursively sort the sub-list of lesser elements and the sub-list of greater elements.

The base case of the recursion are lists of size zero or one, which are always sorted. The algorithm always terminates because it puts at least one element in its final place on each iteration (the loop invariant).

function quicksort(q)  
var list less, pivotList, greater
if length(q) ≤ 1 
return q 
select a pivot value pivot from q
for each x in q
if x < pivot then add x to less
if x = pivot then add x to pivotList
if x > pivot then add x to greater
return concatenate(quicksort(less), pivotList, quicksort(greater))

14,15,&16

Cryptographical Algorithms

Problem: To hide data

Solution1: RSA

Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, on average, makes  (big O notation) comparisons to sort n items. However, in the worst case, it makes Θ(n2) comparisons. Typically, quicksort is significantly faster in practice than other  algorithms, because its inner loop can be efficiently implemented on most architectures, and in most real-world data it is possible to make design choices which minimize the possibility of requiring quadratic time.

RSA involves a public key and a private key. The public key can be known to everyone and is used for encrypting messages. Messages encrypted with the public key can only be decrypted using the private key. The keys for the RSA algorithm are generated the following way:


Solution 2: SHA Algorithm

The SHA (Secure Hash Algorithm) hash functions refer to five FIPS-approved algorithms for computing a condensed digital representation (known as a message digest) that is, to a high degree of probability, unique for a given input data sequence (the message). These algorithms are called “secure” because (in the words of the standard), “for a given algorithm, it is computationally infeasible to:

  1. find a message that corresponds to a given message digest, or
  2. find two different messages that produce the same message digest.

The five algorithms, denoted SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512, are cryptographic hash functions designed by the National Security Agency (NSA) and published by the NIST as a U. S. government standard. The latter four variants are sometimes collectively referred to as SHA-2.

SHA-1 is employed in several widely used security applications and protocols, including TLS and SSL, PGP, SSH, S/MIME, and IPsec. It was considered to be the successor to MD5, an earlier, widely-used hash function.

The psuedocode of SHA-1 is as follows

Note: All variables are unsigned 32 bits and wrap modulo 232 when calculating 
Initialize variables: 
h0 := 0x67452301
h1 := 0xEFCDAB89
h2 := 0x98BADCFE
h3 := 0x10325476
h4 := 0xC3D2E1F0
Pre-processing: 
append the bit '1' to the message
append k bits '0', where k is the minimum number >= 0 such that the resulting message
    length (in bits) is congruent to 448 (mod 512)
append length of message (before pre-processing), in bits, as 64-bit big-endian integer
Process the message in successive 512-bit chunks: 
break message into 512-bit chunks
for each chunk
    break chunk into sixteen 32-bit big-endian words w[i], 0 ≤ i ≤ 15
    Extend the sixteen 32-bit words into eighty 32-bit words:
    for i from 16 to 79
        w[i] := (w[i-3] xor w[i-8] xor w[i-14] xor w[i-16]) leftrotate 1
    Initialize hash value for this chunk:
    a := h0
    b := h1
    c := h2
    d := h3
    e := h4
    Main loop:
    for i from 0 to 79
        if 0 ≤ i ≤ 19 then
            f := (b and c) or ((not b) and d)
            k := 0x5A827999
        else if 20 ≤ i ≤ 39
            f := b xor c xor d
            k := 0x6ED9EBA1
        else if 40 ≤ i ≤ 59
            f := (b and c) or (b and d) or (c and d)
            k := 0x8F1BBCDC
        else if 60 ≤ i ≤ 79
            f := b xor c xor d
            k := 0xCA62C1D6
        temp := (a leftrotate 5) + f + e + k + w[i]
        e := d
        d := c
        c := b leftrotate 30
        b := a
        a := temp
    Add this chunk's hash to result so far:
    h0 := h0 + a
    h1 := h1 + b
    h2 := h2 + c
    h3 := h3 + d
    h4 := h4 + e
Produce the final hash value (big-endian): 
digest = hash = h0 append h1 append h2 append h3 append h4

Solution 3: MD5

The algorithm takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input. It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given prespecified target message digest. The MD5 algorithm is intended for digital signature applications, where a large file must be "compressed" in a secure manner before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA.

MD5 processes a variable-length message into a fixed-length output of 128 bits. The input message is broken up into chunks of 512-bit blocks; the message is padded so that its length is divisible by 512. The padding works as follows: first a single bit, 1, is appended to the end of the message. This is followed by as many zeros as are required to bring the length of the message up to 64 bits fewer than a multiple of 512. The remaining bits are filled up with a 64-bit integer representing the length of the original message.

The main MD5 algorithm operates on a 128-bit state, divided into four 32-bit words, denoted A, B, C and D. These are initialized to certain fixed constants. The main algorithm then operates on each 512-bit message block in turn, each block modifying the state. The processing of a message block consists of four similar stages, termed rounds; each round is composed of 16 similar operations based on a non-linear function F, modular addition, and left rotation.

Psuedo Code is as follows

//Note: All variables are unsigned 32 bits and wrap modulo 2^32 when calculating 
var int[64] r, k
//r specifies the per-round shift amounts 
r[ 0..15] := {7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22,  7, 12, 17, 22} 
r[16..31] := {5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20,  5,  9, 14, 20}
r[32..47] := {4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23,  4, 11, 16, 23}
r[48..63] := {6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21,  6, 10, 15, 21}
//Use binary integer part of the sines of integers as constants: 
for i from 0 to 63
    k[i] := floor(abs(sin(i + 1)) × (2 pow 32))
//Initialize variables: 
var int h0 := 0x67452301
var int h1 := 0xEFCDAB89
var int h2 := 0x98BADCFE
var int h3 := 0x10325476
//Pre-processing: 
append "1" bit to message
append "0" bits until message length in bits ≡ 448 (mod 512)
append bit (bit, not byte) length of unpadded message as 64-bit little-endian integer to message
//Process the message in successive 512-bit chunks: 
for each 512-bit chunk of message
    break chunk into sixteen 32-bit little-endian words w[i], 0 ≤ i ≤ 15
    //Initialize hash value for this chunk:
    var int a := h0
    var int b := h1
    var int c := h2
    var int d := h3
    //Main loop:
    for i from 0 to 63
        if 0 ≤ i ≤ 15 then
            f := (b and c) or ((not b) and d)
            g := i
        else if 16 ≤ i ≤ 31
            f := (d and b) or ((not d) and c)
            g := (5×i + 1) mod 16
        else if 32 ≤ i ≤ 47
            f := b xor c xor d
            g := (3×i + 5) mod 16
        else if 48 ≤ i ≤ 63
            f := c xor (b or (not d))
            g := (7×i) mod 16
        temp := d
        d := c
        c := b
        b := b + leftrotate((a + f + k[i] + w[g]) , r[i])
        a := temp
    //Add this chunk's hash to result so far:
    h0 := h0 + a
    h1 := h1 + b
    h2 := h2 + c
    h3 := h3 + d
var int digest := h0 append h1 append h2 append h3 //(expressed as little-endian)

 








}