Stream Operations

Streams, introduced in Java 8, provide a powerful and expressive way to process collections of data. They enable functional-style operations on collections, allowing for concise and readable code while supporting parallel execution.

Introduction to Streams

A stream is a sequence of elements supporting sequential and parallel aggregate operations. Streams are not data structures; they take input from collections, arrays, or I/O channels and don’t modify the original data source.

Key Characteristics

Not a data structure: Streams don’t store elements; they convey elements from a source through a pipeline of operations
Functional in nature: Operations produce results without modifying the source
Laziness-seeking: Many stream operations are lazy, allowing for optimization
Possibly unbounded: Streams can represent infinite sequences
Consumable: Elements are visited only once during the life of a stream

Stream Pipeline Structure

A stream pipeline consists of:

Source: Where the stream comes from (collection, array, generator function, I/O channel)
Intermediate operations: Transform the stream into another stream (filter, map, sorted, etc.)
Terminal operation: Produces a result or side-effect (collect, forEach, reduce, etc.)

// Stream pipeline example
List<String> result = people.stream()      // Source
    .filter(p -> p.getAge() > 18)         // Intermediate operation
    .map(Person::getName)                 // Intermediate operation
    .collect(Collectors.toList());        // Terminal operation

Creating Streams

From Collections

List<String> list = Arrays.asList("apple", "banana", "cherry");

// Create a stream from a collection
Stream<String> stream = list.stream();

// Create a parallel stream
Stream<String> parallelStream = list.parallelStream();

From Arrays

String[] array = {"apple", "banana", "cherry"};

// Create a stream from an array
Stream<String> stream = Arrays.stream(array);

// Create a stream from array with range
Stream<String> partialStream = Arrays.stream(array, 1, 3); // banana, cherry

From Static Factory Methods

// Empty stream
Stream<String> emptyStream = Stream.empty();

// Stream with explicit values
Stream<String> valueStream = Stream.of("apple", "banana", "cherry");

// Infinite streams
Stream<Integer> infiniteIntegers = Stream.iterate(0, n -> n + 1);
Stream<Double> infiniteRandoms = Stream.generate(Math::random);

// Bounded streams from infinite sources
Stream<Integer> first10Integers = Stream.iterate(0, n -> n + 1).limit(10);

From Other Sources

// From a file (lines as stream)
try (Stream<String> lines = Files.lines(Paths.get("file.txt"))) {
    // Process lines
}

// From a string (characters as stream)
IntStream charStream = "hello".chars();

// From a regex pattern
Pattern pattern = Pattern.compile(",");
Stream<String> patternStream = pattern.splitAsStream("a,b,c");

Intermediate Operations

Intermediate operations transform a stream into another stream. They are lazy and only executed when a terminal operation is invoked.

Filtering Operations

List<Person> people = getPeople(); // Assume this returns a list of Person objects

// Filter by predicate
Stream<Person> adults = people.stream()
    .filter(person -> person.getAge() >= 18);

// Limit number of elements
Stream<Person> first10Adults = adults.limit(10);

// Skip elements
Stream<Person> skippedAdults = adults.skip(5);

// Remove duplicates
Stream<String> uniqueNames = people.stream()
    .map(Person::getName)
    .distinct();

Mapping Operations

// Transform each element
Stream<String> names = people.stream()
    .map(Person::getName);

// Transform to primitives (avoid boxing/unboxing)
IntStream ages = people.stream()
    .mapToInt(Person::getAge);

// Flatten nested streams
List<List<String>> nestedList = Arrays.asList(
    Arrays.asList("a", "b"),
    Arrays.asList("c", "d")
);

Stream<String> flattenedStream = nestedList.stream()
    .flatMap(Collection::stream); // [a, b, c, d]

Sorting Operations

// Natural order sorting
Stream<String> sortedNames = people.stream()
    .map(Person::getName)
    .sorted();

// Custom comparator
Stream<Person> sortedByAge = people.stream()
    .sorted(Comparator.comparingInt(Person::getAge));

// Reverse order
Stream<Person> sortedByAgeDesc = people.stream()
    .sorted(Comparator.comparingInt(Person::getAge).reversed());

// Multiple criteria
Stream<Person> sortedComplex = people.stream()
    .sorted(Comparator.comparing(Person::getLastName)
        .thenComparing(Person::getFirstName)
        .thenComparingInt(Person::getAge));

Peeking Operations

// Peek is useful for debugging
Stream<Person> peekStream = people.stream()
    .filter(p -> p.getAge() > 30)
    .peek(p -> System.out.println("Filtered: " + p.getName()))
    .map(Person::getName)
    .peek(name -> System.out.println("Mapped: " + name));

Terminal Operations

Terminal operations produce a result or side-effect and end the stream pipeline.

Collection Operations

// Collect to List
List<String> nameList = people.stream()
    .map(Person::getName)
    .collect(Collectors.toList());

// Collect to Set
Set<String> nameSet = people.stream()
    .map(Person::getName)
    .collect(Collectors.toSet());

// Collect to Map
Map<String, Integer> nameToAgeMap = people.stream()
    .collect(Collectors.toMap(
        Person::getName,    // Key mapper
        Person::getAge      // Value mapper
    ));

// Handle duplicate keys
Map<String, Person> lastNameToPerson = people.stream()
    .collect(Collectors.toMap(
        Person::getLastName,
        Function.identity(),
        (existing, replacement) -> existing  // Keep first occurrence on duplicate
    ));

// Collect to specific collection type
LinkedList<String> linkedList = people.stream()
    .map(Person::getName)
    .collect(Collectors.toCollection(LinkedList::new));

Reduction Operations

// Count
long count = people.stream().count();

// Any/All/None match
boolean anyAdults = people.stream().anyMatch(p -> p.getAge() >= 18);
boolean allAdults = people.stream().allMatch(p -> p.getAge() >= 18);
boolean noAdults = people.stream().noneMatch(p -> p.getAge() >= 18);

// Find operations
Optional<Person> anyPerson = people.stream().findAny();
Optional<Person> firstPerson = people.stream().findFirst();

// Min/Max
Optional<Person> oldest = people.stream()
    .max(Comparator.comparingInt(Person::getAge));
Optional<Person> youngest = people.stream()
    .min(Comparator.comparingInt(Person::getAge));

// Reduce
Optional<Integer> totalAge = people.stream()
    .map(Person::getAge)
    .reduce(Integer::sum);

// Reduce with identity value
int totalAgeWithIdentity = people.stream()
    .map(Person::getAge)
    .reduce(0, Integer::sum);

// Reduce with combiner (for parallel streams)
int totalAgeParallel = people.stream()
    .reduce(0,
        (subtotal, person) -> subtotal + person.getAge(),  // Accumulator
        Integer::sum                                        // Combiner
    );

Iteration and Side-Effects

// ForEach (side-effect operation)
people.stream().forEach(System.out::println);

// ForEachOrdered (maintains encounter order in parallel streams)
people.parallelStream().forEachOrdered(System.out::println);

// ToArray
Person[] personArray = people.stream().toArray(Person[]::new);

Advanced Collectors

The Collectors utility class provides many powerful collectors for common reduction operations.

Grouping and Partitioning

// Group by a classifier
Map<String, List<Person>> peopleByLastName = people.stream()
    .collect(Collectors.groupingBy(Person::getLastName));

// Group by with downstream collector
Map<String, Long> countByLastName = people.stream()
    .collect(Collectors.groupingBy(
        Person::getLastName,
        Collectors.counting()
    ));

// Multi-level grouping
Map<String, Map<Integer, List<Person>>> peopleByLastNameAndAge = people.stream()
    .collect(Collectors.groupingBy(
        Person::getLastName,
        Collectors.groupingBy(Person::getAge)
    ));

// Partition by (special case of groupingBy with boolean predicate)
Map<Boolean, List<Person>> adultPartition = people.stream()
    .collect(Collectors.partitioningBy(p -> p.getAge() >= 18));

// Access partitioned results
List<Person> adults = adultPartition.get(true);
List<Person> minors = adultPartition.get(false);

Statistics and Summarization

// Counting
long personCount = people.stream()
    .collect(Collectors.counting());

// Summing
int totalAge = people.stream()
    .collect(Collectors.summingInt(Person::getAge));

// Averaging
double averageAge = people.stream()
    .collect(Collectors.averagingInt(Person::getAge));

// Min/Max by comparator
Optional<Person> oldestPerson = people.stream()
    .collect(Collectors.maxBy(Comparator.comparingInt(Person::getAge)));

// Statistics summary
IntSummaryStatistics ageStats = people.stream()
    .collect(Collectors.summarizingInt(Person::getAge));

System.out.println("Count: " + ageStats.getCount());
System.out.println("Sum: " + ageStats.getSum());
System.out.println("Min: " + ageStats.getMin());
System.out.println("Max: " + ageStats.getMax());
System.out.println("Average: " + ageStats.getAverage());

String Joining

// Simple joining
String allNames = people.stream()
    .map(Person::getName)
    .collect(Collectors.joining());

// Joining with delimiter
String commaSeparatedNames = people.stream()
    .map(Person::getName)
    .collect(Collectors.joining(", "));

// Joining with delimiter, prefix, and suffix
String formattedNames = people.stream()
    .map(Person::getName)
    .collect(Collectors.joining(", ", "[", "]"));

Custom Collectors

// Custom collector to create an ImmutableList (using Guava)
ImmutableList<Person> immutablePeople = people.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.toList(),
        ImmutableList::copyOf
    ));

// Custom collector with mapping
Map<String, Person> nameToPersonMap = people.stream()
    .collect(Collectors.toMap(
        Person::getName,
        Function.identity()
    ));

Parallel Streams

Parallel streams leverage the fork/join framework to execute operations in parallel across multiple threads.

// Create a parallel stream from a collection
List<String> result = people.parallelStream()
    .filter(p -> p.getAge() > 18)
    .map(Person::getName)
    .collect(Collectors.toList());

// Convert sequential stream to parallel
Stream<Person> parallelStream = people.stream().parallel();

// Convert parallel stream to sequential
Stream<Person> sequentialStream = people.parallelStream().sequential();

When to Use Parallel Streams

Good candidates for parallel streams:

Large data sets
Operations that are computationally intensive
Operations where elements can be processed independently
When you have enough cores available

Poor candidates for parallel streams:

Small data sets (overhead may exceed benefits)
Operations with side effects
Operations that rely on encounter order
When using sources or sinks that are inherently sequential

// Example: CPU-intensive calculation on large dataset
long sum = IntStream.range(0, 10_000_000)
    .parallel()
    .map(n -> performExpensiveComputation(n))
    .sum();

Common Use Cases and Patterns

Filtering and Transforming

// Find all unique, non-empty, lowercase words
List<String> words = Arrays.asList("Hello", "World", "", "Java", "hello", "world");

List<String> uniqueLowercase = words.stream()
    .filter(s -> !s.isEmpty())
    .map(String::toLowerCase)
    .distinct()
    .collect(Collectors.toList());
// Result: [hello, world, java]

Finding and Matching

// Find the first person over 30 with a specific last name
Optional<Person> match = people.stream()
    .filter(p -> p.getAge() > 30)
    .filter(p -> "Smith".equals(p.getLastName()))
    .findFirst();

// Check if any person is from a specific city
boolean anyFromNewYork = people.stream()
    .anyMatch(p -> "New York".equals(p.getCity()));

Aggregation and Summarization

// Calculate total salary by department
Map<String, Double> totalSalaryByDept = employees.stream()
    .collect(Collectors.groupingBy(
        Employee::getDepartment,
        Collectors.summingDouble(Employee::getSalary)
    ));

// Find highest paid employee by department
Map<String, Optional<Employee>> topEarnerByDept = employees.stream()
    .collect(Collectors.groupingBy(
        Employee::getDepartment,
        Collectors.maxBy(Comparator.comparingDouble(Employee::getSalary))
    ));

Data Transformation

// Convert list of employees to a map of employee ID to employee
Map<Long, Employee> employeeMap = employees.stream()
    .collect(Collectors.toMap(
        Employee::getId,
        Function.identity()
    ));

// Extract a specific property from all objects
List<String> allEmails = employees.stream()
    .map(Employee::getEmail)
    .collect(Collectors.toList());

Complex Data Processing

// Find average salary by department for employees with age > 30
Map<String, Double> avgSalaryByDept = employees.stream()
    .filter(e -> e.getAge() > 30)
    .collect(Collectors.groupingBy(
        Employee::getDepartment,
        Collectors.averagingDouble(Employee::getSalary)
    ));

// Count employees by department and position
Map<String, Map<String, Long>> countByDeptAndPosition = employees.stream()
    .collect(Collectors.groupingBy(
        Employee::getDepartment,
        Collectors.groupingBy(
            Employee::getPosition,
            Collectors.counting()
        )
    ));

Collecting to Custom Objects

// Collect stream results into a custom summary object
class DepartmentSummary {
    private final String name;
    private final long employeeCount;
    private final double totalSalary;
    private final double averageSalary;

    // Constructor, getters...
}

List<DepartmentSummary> summaries = employees.stream()
    .collect(Collectors.groupingBy(Employee::getDepartment))
    .entrySet().stream()
    .map(entry -> {
        String dept = entry.getKey();
        List<Employee> deptEmployees = entry.getValue();
        long count = deptEmployees.size();
        double totalSalary = deptEmployees.stream()
            .mapToDouble(Employee::getSalary)
            .sum();

        return new DepartmentSummary(
            dept,
            count,
            totalSalary,
            totalSalary / count
        );
    })
    .collect(Collectors.toList());

Interview Questions on Stream Operations

1. What is the difference between intermediate and terminal operations in streams?

Answer:

Intermediate operations (like filter, map, sorted) transform a stream into another stream, are lazy (not executed until a terminal operation is invoked), and can be chained together.
Terminal operations (like collect, forEach, reduce) produce a result or side-effect, trigger the execution of the stream pipeline, and end the stream (it cannot be reused after).

2. Explain the difference between `map` and `flatMap` operations.

Answer:

map transforms each element of a stream into exactly one element of another stream using a one-to-one mapping function.
flatMap transforms each element of a stream into zero or more elements of another stream using a one-to-many mapping function, and then flattens these multiple streams into a single stream.

// map: Stream<T> -> Stream<R>  (one-to-one)
Stream<String> upperCaseNames = names.stream()
    .map(String::toUpperCase);

// flatMap: Stream<T> -> Stream<Stream<R>> -> Stream<R>  (one-to-many + flattening)
Stream<String> allWords = sentences.stream()
    .flatMap(sentence -> Arrays.stream(sentence.split(" ")));

3. What is the purpose of the `collect` operation?

Answer: The collect operation is a terminal operation that transforms elements of a stream into a different form by accumulating them into a collection, summarizing them, or performing some other aggregation operation. It uses a Collector which encapsulates the functions needed to accumulate elements into a mutable result container and optionally transform the result.

4. How do parallel streams work in Java?

Answer: Parallel streams leverage the fork/join framework to execute operations in parallel across multiple threads. When a parallel stream is created, the stream is split into multiple substreams, operations are performed on these substreams in parallel, and then the results are combined. This can lead to performance improvements for large data sets and computationally intensive operations, but introduces complexities like non-deterministic ordering and thread-safety concerns.

5. What are the potential issues with using parallel streams?

Answer:

Thread safety: Operations must be stateless and non-interfering
Overhead: For small data sets, the overhead of parallelization may exceed benefits
Non-deterministic ordering: Results may come in different order than the input
Common pool contention: Uses the common ForkJoinPool which might be used by other parts of the application
Cost of merging results: Some operations are expensive to merge across threads

6. Explain the difference between `findFirst` and `findAny`.

Answer:

findFirst() returns an Optional describing the first element of the stream, respecting the encounter order if defined.
findAny() returns any element of the stream, without guaranteeing which one. In parallel streams, findAny() may be more efficient as it doesn’t require coordination to respect encounter order.

7. How would you implement a custom collector?

Answer: A custom collector can be implemented by providing the four functions required by the Collector interface:

public static <T> Collector<T, ?, CustomResult> toCustomResult() {
    return Collector.of(
        CustomAccumulator::new,           // Supplier: creates the accumulator
        CustomAccumulator::add,           // Accumulator: adds an element
        CustomAccumulator::combine,       // Combiner: combines two accumulators
        CustomAccumulator::finishResult   // Finisher: final transformation
    );
}

Alternatively, you can use Collectors.collectingAndThen() to transform the result of an existing collector.

8. What is the difference between `reduce` and `collect` operations?

Answer:

reduce combines elements into a single result by repeatedly applying a combining operation. It’s best for operations where the result type is the same as the element type or when combining elements in a simple way.
collect is more versatile and designed for mutable reduction operations that accumulate elements into a mutable container. It’s better for complex aggregations, especially when the result type differs from the element type.

9. How can you avoid `ConcurrentModificationException` when using streams with collections?

Answer: Streams operate on a snapshot of the collection at the time the stream is created, so modifying the source collection during stream processing won’t cause a ConcurrentModificationException. However, if you need to modify a collection based on stream results:

// Incorrect - may cause ConcurrentModificationException
list.forEach(item -> {
    if (someCondition(item)) {
        list.remove(item); // Modifying the collection during iteration
    }
});

// Correct - collect elements to remove first, then remove them
List<Item> itemsToRemove = list.stream()
    .filter(this::someCondition)
    .collect(Collectors.toList());

list.removeAll(itemsToRemove);

// Alternatively, use removeIf (Java 8+)
list.removeIf(this::someCondition);

10. How would you debug a stream pipeline?

Answer: The peek() operation is useful for debugging streams as it allows you to perform an action on each element without modifying the stream:

List<String> result = people.stream()
    .filter(p -> p.getAge() > 30)
    .peek(p -> System.out.println("After filter: " + p))
    .map(Person::getName)
    .peek(name -> System.out.println("After map: " + name))
    .collect(Collectors.toList());

Other debugging approaches include:

Breaking complex pipelines into smaller parts
Using logging instead of System.out.println in peek
Using a debugger with conditional breakpoints
Converting intermediate results to collections for inspection

Stream Operations

Introduction to Streams

Key Characteristics

Stream Pipeline Structure

Creating Streams

From Collections

From Arrays

From Static Factory Methods

From Other Sources

Intermediate Operations

Filtering Operations

Mapping Operations

Sorting Operations

Peeking Operations

Terminal Operations

Collection Operations

Reduction Operations

Iteration and Side-Effects

Advanced Collectors

Grouping and Partitioning

Statistics and Summarization

String Joining

Custom Collectors

Parallel Streams

When to Use Parallel Streams

Common Use Cases and Patterns

Filtering and Transforming

Finding and Matching

Aggregation and Summarization

Data Transformation

Complex Data Processing

Collecting to Custom Objects

Interview Questions on Stream Operations

1. What is the difference between intermediate and terminal operations in streams?

2. Explain the difference between map and flatMap operations.

3. What is the purpose of the collect operation?

4. How do parallel streams work in Java?

5. What are the potential issues with using parallel streams?

6. Explain the difference between findFirst and findAny.

7. How would you implement a custom collector?

8. What is the difference between reduce and collect operations?

9. How can you avoid ConcurrentModificationException when using streams with collections?

10. How would you debug a stream pipeline?

2. Explain the difference between `map` and `flatMap` operations.

3. What is the purpose of the `collect` operation?

6. Explain the difference between `findFirst` and `findAny`.

8. What is the difference between `reduce` and `collect` operations?

9. How can you avoid `ConcurrentModificationException` when using streams with collections?