Skip to content

Stream Operations

Streams, introduced in Java 8, provide a powerful and expressive way to process collections of data. They enable functional-style operations on collections, allowing for concise and readable code while supporting parallel execution.

A stream is a sequence of elements supporting sequential and parallel aggregate operations. Streams are not data structures; they take input from collections, arrays, or I/O channels and don’t modify the original data source.

  • Not a data structure: Streams don’t store elements; they convey elements from a source through a pipeline of operations
  • Functional in nature: Operations produce results without modifying the source
  • Laziness-seeking: Many stream operations are lazy, allowing for optimization
  • Possibly unbounded: Streams can represent infinite sequences
  • Consumable: Elements are visited only once during the life of a stream

A stream pipeline consists of:

  1. Source: Where the stream comes from (collection, array, generator function, I/O channel)
  2. Intermediate operations: Transform the stream into another stream (filter, map, sorted, etc.)
  3. Terminal operation: Produces a result or side-effect (collect, forEach, reduce, etc.)
// Stream pipeline example
List<String> result = people.stream() // Source
.filter(p -> p.getAge() > 18) // Intermediate operation
.map(Person::getName) // Intermediate operation
.collect(Collectors.toList()); // Terminal operation
List<String> list = Arrays.asList("apple", "banana", "cherry");
// Create a stream from a collection
Stream<String> stream = list.stream();
// Create a parallel stream
Stream<String> parallelStream = list.parallelStream();
String[] array = {"apple", "banana", "cherry"};
// Create a stream from an array
Stream<String> stream = Arrays.stream(array);
// Create a stream from array with range
Stream<String> partialStream = Arrays.stream(array, 1, 3); // banana, cherry
// Empty stream
Stream<String> emptyStream = Stream.empty();
// Stream with explicit values
Stream<String> valueStream = Stream.of("apple", "banana", "cherry");
// Infinite streams
Stream<Integer> infiniteIntegers = Stream.iterate(0, n -> n + 1);
Stream<Double> infiniteRandoms = Stream.generate(Math::random);
// Bounded streams from infinite sources
Stream<Integer> first10Integers = Stream.iterate(0, n -> n + 1).limit(10);
// From a file (lines as stream)
try (Stream<String> lines = Files.lines(Paths.get("file.txt"))) {
// Process lines
}
// From a string (characters as stream)
IntStream charStream = "hello".chars();
// From a regex pattern
Pattern pattern = Pattern.compile(",");
Stream<String> patternStream = pattern.splitAsStream("a,b,c");

Intermediate operations transform a stream into another stream. They are lazy and only executed when a terminal operation is invoked.

List<Person> people = getPeople(); // Assume this returns a list of Person objects
// Filter by predicate
Stream<Person> adults = people.stream()
.filter(person -> person.getAge() >= 18);
// Limit number of elements
Stream<Person> first10Adults = adults.limit(10);
// Skip elements
Stream<Person> skippedAdults = adults.skip(5);
// Remove duplicates
Stream<String> uniqueNames = people.stream()
.map(Person::getName)
.distinct();
// Transform each element
Stream<String> names = people.stream()
.map(Person::getName);
// Transform to primitives (avoid boxing/unboxing)
IntStream ages = people.stream()
.mapToInt(Person::getAge);
// Flatten nested streams
List<List<String>> nestedList = Arrays.asList(
Arrays.asList("a", "b"),
Arrays.asList("c", "d")
);
Stream<String> flattenedStream = nestedList.stream()
.flatMap(Collection::stream); // [a, b, c, d]
// Natural order sorting
Stream<String> sortedNames = people.stream()
.map(Person::getName)
.sorted();
// Custom comparator
Stream<Person> sortedByAge = people.stream()
.sorted(Comparator.comparingInt(Person::getAge));
// Reverse order
Stream<Person> sortedByAgeDesc = people.stream()
.sorted(Comparator.comparingInt(Person::getAge).reversed());
// Multiple criteria
Stream<Person> sortedComplex = people.stream()
.sorted(Comparator.comparing(Person::getLastName)
.thenComparing(Person::getFirstName)
.thenComparingInt(Person::getAge));
// Peek is useful for debugging
Stream<Person> peekStream = people.stream()
.filter(p -> p.getAge() > 30)
.peek(p -> System.out.println("Filtered: " + p.getName()))
.map(Person::getName)
.peek(name -> System.out.println("Mapped: " + name));

Terminal operations produce a result or side-effect and end the stream pipeline.

// Collect to List
List<String> nameList = people.stream()
.map(Person::getName)
.collect(Collectors.toList());
// Collect to Set
Set<String> nameSet = people.stream()
.map(Person::getName)
.collect(Collectors.toSet());
// Collect to Map
Map<String, Integer> nameToAgeMap = people.stream()
.collect(Collectors.toMap(
Person::getName, // Key mapper
Person::getAge // Value mapper
));
// Handle duplicate keys
Map<String, Person> lastNameToPerson = people.stream()
.collect(Collectors.toMap(
Person::getLastName,
Function.identity(),
(existing, replacement) -> existing // Keep first occurrence on duplicate
));
// Collect to specific collection type
LinkedList<String> linkedList = people.stream()
.map(Person::getName)
.collect(Collectors.toCollection(LinkedList::new));
// Count
long count = people.stream().count();
// Any/All/None match
boolean anyAdults = people.stream().anyMatch(p -> p.getAge() >= 18);
boolean allAdults = people.stream().allMatch(p -> p.getAge() >= 18);
boolean noAdults = people.stream().noneMatch(p -> p.getAge() >= 18);
// Find operations
Optional<Person> anyPerson = people.stream().findAny();
Optional<Person> firstPerson = people.stream().findFirst();
// Min/Max
Optional<Person> oldest = people.stream()
.max(Comparator.comparingInt(Person::getAge));
Optional<Person> youngest = people.stream()
.min(Comparator.comparingInt(Person::getAge));
// Reduce
Optional<Integer> totalAge = people.stream()
.map(Person::getAge)
.reduce(Integer::sum);
// Reduce with identity value
int totalAgeWithIdentity = people.stream()
.map(Person::getAge)
.reduce(0, Integer::sum);
// Reduce with combiner (for parallel streams)
int totalAgeParallel = people.stream()
.reduce(0,
(subtotal, person) -> subtotal + person.getAge(), // Accumulator
Integer::sum // Combiner
);
// ForEach (side-effect operation)
people.stream().forEach(System.out::println);
// ForEachOrdered (maintains encounter order in parallel streams)
people.parallelStream().forEachOrdered(System.out::println);
// ToArray
Person[] personArray = people.stream().toArray(Person[]::new);

The Collectors utility class provides many powerful collectors for common reduction operations.

// Group by a classifier
Map<String, List<Person>> peopleByLastName = people.stream()
.collect(Collectors.groupingBy(Person::getLastName));
// Group by with downstream collector
Map<String, Long> countByLastName = people.stream()
.collect(Collectors.groupingBy(
Person::getLastName,
Collectors.counting()
));
// Multi-level grouping
Map<String, Map<Integer, List<Person>>> peopleByLastNameAndAge = people.stream()
.collect(Collectors.groupingBy(
Person::getLastName,
Collectors.groupingBy(Person::getAge)
));
// Partition by (special case of groupingBy with boolean predicate)
Map<Boolean, List<Person>> adultPartition = people.stream()
.collect(Collectors.partitioningBy(p -> p.getAge() >= 18));
// Access partitioned results
List<Person> adults = adultPartition.get(true);
List<Person> minors = adultPartition.get(false);
// Counting
long personCount = people.stream()
.collect(Collectors.counting());
// Summing
int totalAge = people.stream()
.collect(Collectors.summingInt(Person::getAge));
// Averaging
double averageAge = people.stream()
.collect(Collectors.averagingInt(Person::getAge));
// Min/Max by comparator
Optional<Person> oldestPerson = people.stream()
.collect(Collectors.maxBy(Comparator.comparingInt(Person::getAge)));
// Statistics summary
IntSummaryStatistics ageStats = people.stream()
.collect(Collectors.summarizingInt(Person::getAge));
System.out.println("Count: " + ageStats.getCount());
System.out.println("Sum: " + ageStats.getSum());
System.out.println("Min: " + ageStats.getMin());
System.out.println("Max: " + ageStats.getMax());
System.out.println("Average: " + ageStats.getAverage());
// Simple joining
String allNames = people.stream()
.map(Person::getName)
.collect(Collectors.joining());
// Joining with delimiter
String commaSeparatedNames = people.stream()
.map(Person::getName)
.collect(Collectors.joining(", "));
// Joining with delimiter, prefix, and suffix
String formattedNames = people.stream()
.map(Person::getName)
.collect(Collectors.joining(", ", "[", "]"));
// Custom collector to create an ImmutableList (using Guava)
ImmutableList<Person> immutablePeople = people.stream()
.collect(Collectors.collectingAndThen(
Collectors.toList(),
ImmutableList::copyOf
));
// Custom collector with mapping
Map<String, Person> nameToPersonMap = people.stream()
.collect(Collectors.toMap(
Person::getName,
Function.identity()
));

Parallel streams leverage the fork/join framework to execute operations in parallel across multiple threads.

// Create a parallel stream from a collection
List<String> result = people.parallelStream()
.filter(p -> p.getAge() > 18)
.map(Person::getName)
.collect(Collectors.toList());
// Convert sequential stream to parallel
Stream<Person> parallelStream = people.stream().parallel();
// Convert parallel stream to sequential
Stream<Person> sequentialStream = people.parallelStream().sequential();

Good candidates for parallel streams:

  • Large data sets
  • Operations that are computationally intensive
  • Operations where elements can be processed independently
  • When you have enough cores available

Poor candidates for parallel streams:

  • Small data sets (overhead may exceed benefits)
  • Operations with side effects
  • Operations that rely on encounter order
  • When using sources or sinks that are inherently sequential
// Example: CPU-intensive calculation on large dataset
long sum = IntStream.range(0, 10_000_000)
.parallel()
.map(n -> performExpensiveComputation(n))
.sum();
// Find all unique, non-empty, lowercase words
List<String> words = Arrays.asList("Hello", "World", "", "Java", "hello", "world");
List<String> uniqueLowercase = words.stream()
.filter(s -> !s.isEmpty())
.map(String::toLowerCase)
.distinct()
.collect(Collectors.toList());
// Result: [hello, world, java]
// Find the first person over 30 with a specific last name
Optional<Person> match = people.stream()
.filter(p -> p.getAge() > 30)
.filter(p -> "Smith".equals(p.getLastName()))
.findFirst();
// Check if any person is from a specific city
boolean anyFromNewYork = people.stream()
.anyMatch(p -> "New York".equals(p.getCity()));
// Calculate total salary by department
Map<String, Double> totalSalaryByDept = employees.stream()
.collect(Collectors.groupingBy(
Employee::getDepartment,
Collectors.summingDouble(Employee::getSalary)
));
// Find highest paid employee by department
Map<String, Optional<Employee>> topEarnerByDept = employees.stream()
.collect(Collectors.groupingBy(
Employee::getDepartment,
Collectors.maxBy(Comparator.comparingDouble(Employee::getSalary))
));
// Convert list of employees to a map of employee ID to employee
Map<Long, Employee> employeeMap = employees.stream()
.collect(Collectors.toMap(
Employee::getId,
Function.identity()
));
// Extract a specific property from all objects
List<String> allEmails = employees.stream()
.map(Employee::getEmail)
.collect(Collectors.toList());
// Find average salary by department for employees with age > 30
Map<String, Double> avgSalaryByDept = employees.stream()
.filter(e -> e.getAge() > 30)
.collect(Collectors.groupingBy(
Employee::getDepartment,
Collectors.averagingDouble(Employee::getSalary)
));
// Count employees by department and position
Map<String, Map<String, Long>> countByDeptAndPosition = employees.stream()
.collect(Collectors.groupingBy(
Employee::getDepartment,
Collectors.groupingBy(
Employee::getPosition,
Collectors.counting()
)
));
// Collect stream results into a custom summary object
class DepartmentSummary {
private final String name;
private final long employeeCount;
private final double totalSalary;
private final double averageSalary;
// Constructor, getters...
}
List<DepartmentSummary> summaries = employees.stream()
.collect(Collectors.groupingBy(Employee::getDepartment))
.entrySet().stream()
.map(entry -> {
String dept = entry.getKey();
List<Employee> deptEmployees = entry.getValue();
long count = deptEmployees.size();
double totalSalary = deptEmployees.stream()
.mapToDouble(Employee::getSalary)
.sum();
return new DepartmentSummary(
dept,
count,
totalSalary,
totalSalary / count
);
})
.collect(Collectors.toList());

1. What is the difference between intermediate and terminal operations in streams?

Section titled “1. What is the difference between intermediate and terminal operations in streams?”

Answer:

  • Intermediate operations (like filter, map, sorted) transform a stream into another stream, are lazy (not executed until a terminal operation is invoked), and can be chained together.
  • Terminal operations (like collect, forEach, reduce) produce a result or side-effect, trigger the execution of the stream pipeline, and end the stream (it cannot be reused after).

2. Explain the difference between map and flatMap operations.

Section titled “2. Explain the difference between map and flatMap operations.”

Answer:

  • map transforms each element of a stream into exactly one element of another stream using a one-to-one mapping function.
  • flatMap transforms each element of a stream into zero or more elements of another stream using a one-to-many mapping function, and then flattens these multiple streams into a single stream.
// map: Stream<T> -> Stream<R> (one-to-one)
Stream<String> upperCaseNames = names.stream()
.map(String::toUpperCase);
// flatMap: Stream<T> -> Stream<Stream<R>> -> Stream<R> (one-to-many + flattening)
Stream<String> allWords = sentences.stream()
.flatMap(sentence -> Arrays.stream(sentence.split(" ")));

3. What is the purpose of the collect operation?

Section titled “3. What is the purpose of the collect operation?”

Answer: The collect operation is a terminal operation that transforms elements of a stream into a different form by accumulating them into a collection, summarizing them, or performing some other aggregation operation. It uses a Collector which encapsulates the functions needed to accumulate elements into a mutable result container and optionally transform the result.

Answer: Parallel streams leverage the fork/join framework to execute operations in parallel across multiple threads. When a parallel stream is created, the stream is split into multiple substreams, operations are performed on these substreams in parallel, and then the results are combined. This can lead to performance improvements for large data sets and computationally intensive operations, but introduces complexities like non-deterministic ordering and thread-safety concerns.

5. What are the potential issues with using parallel streams?

Section titled “5. What are the potential issues with using parallel streams?”

Answer:

  • Thread safety: Operations must be stateless and non-interfering
  • Overhead: For small data sets, the overhead of parallelization may exceed benefits
  • Non-deterministic ordering: Results may come in different order than the input
  • Common pool contention: Uses the common ForkJoinPool which might be used by other parts of the application
  • Cost of merging results: Some operations are expensive to merge across threads

6. Explain the difference between findFirst and findAny.

Section titled “6. Explain the difference between findFirst and findAny.”

Answer:

  • findFirst() returns an Optional describing the first element of the stream, respecting the encounter order if defined.
  • findAny() returns any element of the stream, without guaranteeing which one. In parallel streams, findAny() may be more efficient as it doesn’t require coordination to respect encounter order.

7. How would you implement a custom collector?

Section titled “7. How would you implement a custom collector?”

Answer: A custom collector can be implemented by providing the four functions required by the Collector interface:

public static <T> Collector<T, ?, CustomResult> toCustomResult() {
return Collector.of(
CustomAccumulator::new, // Supplier: creates the accumulator
CustomAccumulator::add, // Accumulator: adds an element
CustomAccumulator::combine, // Combiner: combines two accumulators
CustomAccumulator::finishResult // Finisher: final transformation
);
}

Alternatively, you can use Collectors.collectingAndThen() to transform the result of an existing collector.

8. What is the difference between reduce and collect operations?

Section titled “8. What is the difference between reduce and collect operations?”

Answer:

  • reduce combines elements into a single result by repeatedly applying a combining operation. It’s best for operations where the result type is the same as the element type or when combining elements in a simple way.
  • collect is more versatile and designed for mutable reduction operations that accumulate elements into a mutable container. It’s better for complex aggregations, especially when the result type differs from the element type.

9. How can you avoid ConcurrentModificationException when using streams with collections?

Section titled “9. How can you avoid ConcurrentModificationException when using streams with collections?”

Answer: Streams operate on a snapshot of the collection at the time the stream is created, so modifying the source collection during stream processing won’t cause a ConcurrentModificationException. However, if you need to modify a collection based on stream results:

// Incorrect - may cause ConcurrentModificationException
list.forEach(item -> {
if (someCondition(item)) {
list.remove(item); // Modifying the collection during iteration
}
});
// Correct - collect elements to remove first, then remove them
List<Item> itemsToRemove = list.stream()
.filter(this::someCondition)
.collect(Collectors.toList());
list.removeAll(itemsToRemove);
// Alternatively, use removeIf (Java 8+)
list.removeIf(this::someCondition);

10. How would you debug a stream pipeline?

Section titled “10. How would you debug a stream pipeline?”

Answer: The peek() operation is useful for debugging streams as it allows you to perform an action on each element without modifying the stream:

List<String> result = people.stream()
.filter(p -> p.getAge() > 30)
.peek(p -> System.out.println("After filter: " + p))
.map(Person::getName)
.peek(name -> System.out.println("After map: " + name))
.collect(Collectors.toList());

Other debugging approaches include:

  • Breaking complex pipelines into smaller parts
  • Using logging instead of System.out.println in peek
  • Using a debugger with conditional breakpoints
  • Converting intermediate results to collections for inspection