Java Iteration - An evolution

When we have a collection of elements in a language code, one of the most basic things is to iterate the collection, i.e. go through the collections elements one by one. Over the years, Java has evolved in this field and has come a really long from Enumerators in 1.0 to the state-of-the-art forEach loop in 8.0, which implements internal iterations based on Lambda expressions for the first time in the history of Java. The article traces this evolution.

The Iterator pattern

An iterator is a mechanism that permits all elements of a collection to be accessed sequentially, with some operation being performed on each element. In essence, an iterator provides a means of looping over an encapsulated collection of objects. Examples of using iterators include:

  • Visit each file in a directory and display its name.
  • Visit each node in a graph and determine whether it is reachable from a given node.
  • Visit each customer in a queue (for instance, simulating a line in a bank) and find out how long he or she has been waiting.
  • Visit each node in a compiler's abstract syntax tree (which is produced by the parser) and perform semantic checking or code generation.

Properties of Iterator

In general, a good iterator implementation should have these properties:

  • We should be able to have multiple traversals in progress at the same time; that is, an iterator should allow for the concept of nested looping.
  • An iterator should also be non-destructive in the sense that the act of iteration should not, by itself, change the collection. But the operation being performed on the elements in a collection could possibly change some of the elements.
  • An iterator should support removing an element from a collection or inserting a new element at a particular point in the collection, but such changes should be explicit within the program and not a byproduct of the iteration.
  • We will also need to have iterators with different traversal methods; for instance, pre-order and post-order traversal of a tree, or depth-first and breadth-first traversal of a graph.

It is not easy to provide all these features but API writers in Java have done a reasonably good job with the iterators like List iterators.

Fail-Fast vs Fail-Safe Iterators

Based on the ability of an iterator implementation to allow or disallow concurrent updates to the collection data, these can be divided into two categories:

Fail-Fast

Fail fast iterators while iterating through the collection, instantly throw ConcurrentModificationException if there is structural modification of the collection. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future. When one or more thread is iterating over the collection and, in between, one thread changes the structure of the collection (either adding the element to the collection or by deleting the element in the collection or by updating the value at particular position in the collection) it is known as Concurrent Modification. A fail-fast iterator can throw ConcurrentModificationException in two scenarios: 

Single Threaded environment: After the creation of the iterator, structure is modified at any time by any method other than iterator's own remove method.

Multiple Threaded environment: If one thread is modifying the structure of the collection while other thread is iterating over it.
As per Oracle's own words, the fail-fast behavior of an iterator cannot be guaranteed as it is and it should be used only to detect bugs and one should not depend on it for program flow. It means we should not try to catch the exception and then decide alternate flow in the catch block.

Iterator reads internal data structure (object array) directly. The internal data structure (i.e object array) should not be modified while iterating through the collection. To ensure this it maintains an internal flag "mods". Iterator checks the "mods" flag whenever it gets the next value (using hasNext() method and next() method). Value of the mods flag changes whenever there is a structural modification, thus indicating iterator to throw ConcurrentModificationException.  HashMap, Vector, ArrayList and HashSet provide fail-fast iterators.

Code snippet 1:

  Map<String,String>  premiumPhone = new HashMap<String,String>(); 
  premiumPhone.put("HTC", "HTC  Nine"); 
  premiumPhone.put("Samsung","S6");   
Iterator<String> iterator = premiumPhone.keySet().iterator();
while (iterator.hasNext())
{ System.out.println(premiumPhone.get(iterator.next())); premiumPhone.put("Sony", "Xperia Z"); }

Output:

  iPhone 
  Exception in thread  "main" java.util.ConcurrentModificationException 
  at  java.util.HashMap$HashIterator.nextNode(Unknown Source) 
  at  java.util.HashMap$KeyIterator.next(Unknown Source) 
  at  FailFastExample.main(FailFastExample.java:17) 

Fail-Safe

Fail Safe Iterator makes copy of the internal data structure (object array) and iterates over the copied data structure. Any structural modification done to the iterator affects the copied data structure.  So the original data structure remains structurally unchanged and no ConcurrentModificationException is thrown.
Two issues associated with a Fail-Safe Iterator are:

  • Overhead of maintaining the copied data structure i.e. memory.
  • Fail-safe iterator does not guarantee that the data being read is the data currently in the original data structure.

As per Oracle docs, fail-safe iterator is usually costly because of the overhead and should be used when concurrency is truly desired but synchronization is not.

CopyOnWriteArrayList and ConcurrentHashMap provide fail-safe iterators. In the above example, replacing HashMap with ConcurrentHashMap will make it fail-safe. It will not throw exception but the results will not show the new Sony phone added while iterating.

Iterators and the Gang of Four design patterns

According to the Gang of Four, the Iterator design pattern is a behavioral pattern, whose key idea is "to take the responsibility for access and traversal out of the list [ed. think collection] object and put it into an iterator object."

External vs Internal Iterators

There are two general approaches to implementing an iterator depending on who controls the iteration. For an external iterator - also known as explicit iterator or external iterator - the client code controls the iteration in the sense that the client creates the iterator, tells it when to advance to the next element, tests to see if every element has been visited, and so on. Although iterators in Java have taken different forms, using an external iterator was essentially the only viable option prior to Java 8.

For an internal iterator, the iterator itself controls the iteration. The client essentially says to the iterator, "perform this operation on the elements in the collection." This approach is common in languages like LISP that provide anonymous functions or closures. With the release of Java 8, this approach to iteration is now a reasonable alternative for Java programmers.

Note: this distinction between iterators is not introduced by Java. It is one of the basic principles of data collections in any general purpose programming language.

Java 1.0

Java began iteration with the Enumeration class in 1.0 version. Vector was the only collection available at that time so it returned an Enumeration of elements. Code typically used to be:

  Code snippet 2:
Vector languages = new Vector(); // ... add some values to the collection Enumeration e = languages.elements();
while (e.hasMoreElements()) { String language = (String) e.nextElement(); System.out.println(language); }

Java 1.2

Java 1.2 introduced the collection classes that we all know and the Iterator design pattern was implemented in a class appropriately named Iterator. Because we didn't yet have generics in Java 1.2, casting an object returned from an Iterator was still necessary.

Code snippet 3:

  List  languages = new LinkedList(); 
  // ... add  some languages to the collection 
  Iterator  i = languages.iterator(;  
while (i.hasNext()) { String language = (String) i.next(); System.out.println(language);
}

Iterator differ from enumerations in two ways:

 

  • Iterators allowed the caller to remove elements from the underlying collection during the iteration with well-defined semantics. This is of course only for fail-safe Iterators.
  • Method names were improved.

Also introduced was the ListIterator interface which extends Iterator. This allows the programmer to traverse the list in either direction, modify the list during iteration, and obtain the iterator's current position in the list.

A ListIterator has no current element; its cursor position always lies between the element that would be returned by a call to previous() and the element that would be returned by a call to next(). An iterator for a list of length n has n+1 possible cursor positions, as illustrated by the carets (^) below:

Element(0)   Element(1)   Element(2)   ... Element(n-1)
Cursor positions:     ^            ^            ^            ^                  

Java 5

With Java 5, generics was introduced along with the Iterable interface and the enhanced for-loop. The enhanced for-loop is perhaps one of landmark yet small additions to Java. The creation of the iterator and calls to its hasNext() and next() methods are not expressed explicitly in the code, but they still take place behind the scenes. Thus, even though the code is more compact, we are still using an external iterator. Using Java 5, the example would look something like this:

Code snippet 4:

  List<String>  languages = new LinkedList<String>(); 
  // ... add  some names to the collection
  
  for (String name : names) 
  System.out.println(name);

For user-defined classes to be supported in this new for-loop, they had to implement the Iterable interface. It has only one method Iterator<T> iterator() which is basically returns the underlying iterator implementation. For user-defined collection classes to be used in the for-loop:

  1. The class needs to implement Iterable.
  2. We need to provide an implementation of the Iterator in a separate class. This class would be the return type of the iterator() method of Iterable above.

Code snippet 5:

  public class MyCollection<E> implements Iterable<E> { // Our own collection class 
  public Iterator<E> iterator() { 
  return new MyIterator<E>(); 
  } 
  }
public class MyIterator <T> implements Iterator<T> { // Our own Iterator public boolean hasNext() {    //implement... }
public T next() { //implement...; }
public void remove() { //implement... if supported. } }
public static void main(String[]  args) { 
  MyCollection<String> stringCollection  = new MyCollection<String>();
  
for(String string : stringCollection){  // MyCollection used in for-loop } }

Java 7 gave us the diamond operator, which reduces the verbosity of generics. Gone were the days of having to repeat the type used to instantiate the generic class after invoking the new operator. In Java 7 we could simplify the first line in snippet 4 to the following:

List<String> names = new LinkedList<>();

Java 8 — the forEach() method
Java 8 introduces internal iteration to the language. Here the new forEach() method added to the Iterable interface provides internal iteration. The client need not write any for-loop but provide a Lamdba expression stating the intent of the iteration like printing the elements. Rest is taken care internally by the method implementation.

The major new features in Java 8 center on lambda expressions, along with related features such as streams, method references, and functional interfaces (an interface with exactly one abstract method).  A default method, another new feature in Java 8, is a method in an interface with a default implementation. Default methods had to be introduced to make the existing collections work with Lambdas without having to modify all concrete classes. Here, since forEach() has been added to Iterable, all concrete classes will by default get the same implementation and rework by API writers is avoided.

In this case, the forEach() method is actually implemented using an active iterator.  Collection classes that implement Iterable (for example, all list and set classes) now have a forEach() method. This method takes a single parameter that is a functional interface. Therefore the actual parameter passed to the forEach() method is a candidate for a Lambda expression. Using the features of Java 8, the code would be:

Code snippet 6:

List<String>  languages = new LinkedList<>(); 

 

      // ... add some names to the collection       languages.forEach(language -> System.out.println(language));

Here the value passed to the method is a Lambda expression which actually is an implicit implementation of a functional interface having one method which takes String parameter and prints the value. Lambdas are sort of anonymous functions that eliminate the need of anonymous inner classes. They provide the class on-the-fly.

Using anon. classes it would be:

languages.forEach(new Consumer<String>() { 
public void accept(String language) { 
System.out.println(language); 
} 
}); 
Here is the definition of the brand new forEach() in the Iterable interface.
@FunctionalInterface public interface Iterable<T> { Iterator<T> iterator();   default void forEach(Consumer<? super T> action) {             Objects.requireNonNull(action);             for (T t : this) {                         action.accept(t);             }      } }

Consumer is one of the Functional Interfaces introduced with the java.util.function package. It has one abstract method void accept(T t) which represents an operation that accepts a single input argument and returns no result. Our Lambda also provides the String name and needs no result, so it can be used as an implicit instance of a Consumer. Lambdas and default methods are detailed topics which we will not cover here.

And what about fail-fast/fail-safe?  Well, internally forEach() uses the same iterator provided by LinkedList so it will also throw ConcurrentModificationException.

Code snippet 7:

List<String> languages = new LinkedList<>(); 
languages.add("C++"); 
languages.add("Java"); 
languages.add("Perl"); 
languages.add("Python"); 
languages.forEach(language -> { System.out.println(language);  languages.add("C");   } );

Output :
C++
Exception in thread "main" java.util.ConcurrentModificationException
at java.util.LinkedList$ListItr.checkForComodification(Unknown Source)
at java.util.LinkedList$ListItr.next(Unknown Source)
at java.lang.Iterable.forEach(Unknown Source)
at FailFastExample.main(FailFastExample.java:30)

In this case, using CopyOnWriteArrayList in place of LinkedList will not throw exception.

Difference between the external iterator in Java 7 and the internal iterator is that in the Java 7 iterators the loop structure controls the iteration, and during each pass through the loop, an object is retrieved from the list and then printed. In code 6,  there is no explicit loop. We simply tell the forEach() method what to do with the objects in the list — in this case we simply print the object. Control of the iteration resides within the forEach() method.

Where external iteration mixes the “what” (printing) and the “how” (for loop/ iterator), internal iteration lets the client to provide only the “what” but lets the library control the “how”. This offers several potential benefits: client code can be clearer because it need only focus on stating the problem, not the details of how to go about solving it, and we can move complex optimization code into libraries where it can benefit all users.

 

Streams

Java 8 also introduced the concept of Streams. A stream is a mechanism for carrying a sequence of values from a source through a pipeline of operations. A stream pipeline consists of the stream source, intermediate operations that transform the stream and produce a new stream, and a terminal operation that either produces a result or calls the forEach() method.  A stream carries intermediate live data from a source but does not store data itself. For example, we want to count the number of languages that begin with the letter C. We can write:

Code snippet 8:

List<String> languages = new LinkedList<>(); 
// ... add some names to the  collection 
            long count = languages.stream() 
  .filter(language -> language.startsWith("C")) 
  .count();

Streams output can be send to a forEach() as terminal expression, i.e. one that is at the end of the pipeline and produces a final non-stream result.

Code snippet 9:
String[] stringArray = new String[]{"Streams", "can", "be", "created", "from", "arrays"};
Arrays.stream(stringArray).forEach(token -> System.out.println(token));

Conclusion: Java has moved from simple external iterators to more complex Lambda and Streams-supporting forEach() internal iterator. It also now supports parallel stream which we have not covered here as streams in itself is advanced topic and would require detailed explanation along with default methods and functional interfaces.

 

About Author

Author (Shekhar Mitra) is working at Accenture Services Pvt. Ltd. as System Analyst and main interest is in the field of Java/J2EE and related technologies.








}