Apache Storm Interview Questions And Answers

Apache Storm Interview Questions And Answers. Here Coding compiler sharing a list of 35 interview questions on Storm. These Storm questions were asked in various job interviews conducted by the top MNC companies and prepared by Storm experts. This list of Apache Storm interview questions & answers will help you to crack your next Storm job interview. All the best for future and happy learning.

Apache Storm Interview Questions

  1. What is Apache Storm?
  2. What are “spouts” and “bolts” in Storm?
  3. What is directed acyclic graph (DAG) in Storm?
  4. What are the main components in Storm architecture?
  5. What are Nodes?
  6. What are the Components of Storm?
  7. What are Storm Topologies?
  8. What is TopologyBuilder class?
  9. How can you Kill a topology in Storm?
  10. What happens when Storm kill a topology?

Apache Storm Interview Questions And Answers

1) What is Apache Storm?

A) Apache Storm is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. The storm is simple, can be used with any programming language.

2) What are “spouts” and “bolts” in Storm?

A) Apache Storm uses custom created “spouts” and “bolts” to define information sources and manipulations to allow batch, distributed processing of streaming data.

3) What is directed acyclic graph (DAG) in Storm?

A) Storm application is designed as a “topology” in the shape of a directed acyclic graph (DAG) with spouts and bolts acting as the graph vertices. Edges on the graph are named streams and direct data from one node to another. Together, the topology acts as a data transformation pipeline.

4) What are the main components in Storm architecture?

A) The Apache Storm cluster comprises of two main components, they are Nodes and Components.

5) What are Nodes?

A) There are two types of nodes are there in Storm, they are, Master Node and Worker Node.

The Master Node executes a daemon Nimbus which assigns tasks to machines and monitors their performances.

The Worker Node runs the daemon called Supervisor which assigns the tasks to other worker node and operates them as per the need.

6) What are the Components of Storm?

A) Components- Storm has three critical components, viz., Topology, Stream, and Spout. Topology is a network made of Stream and Spout.

The stream is an unbounded pipeline of tuples and Spout is the source of the data streams which converts the data into the tuple of streams and sends to the bolts to be processed.

7) What are Storm Topologies?

A) The logic for a real-time application is packaged into a Storm topology. A Storm topology is analogous to a MapReduce job. One key difference is that a MapReduce job eventually finishes, whereas a topology runs forever (or until you kill it, of course). A topology is a graph of spouts and bolts that are connected with stream groupings.

Apache Storm Interview Questions # 8) What is TopologyBuilder class?

A) java.lang.Object -> org.apache.storm.topology.TopologyBuilder

public class TopologyBuilder
extends Object

TopologyBuilder exposes the Java API for specifying a topology for Storm to execute. Topologies are Thrift structures in the end, but since the Thrift API is so verbose, TopologyBuilder greatly eases the process of creating topologies.

The template for creating and submitting a topology looks something like:

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout(“1”, new TestWordSpout(true), 5);
builder.setSpout(“2”, new TestWordSpout(true), 3);
builder.setBolt(“3”, new TestWordCounter(), 3)
.fieldsGrouping(“1”, new Fields(“word”))
.fieldsGrouping(“2”, new Fields(“word”));
builder.setBolt(“4”, new TestGlobalCount())

Map conf = new HashMap();
conf.put(Config.TOPOLOGY_WORKERS, 4);

StormSubmitter.submitTopology(“mytopology”, conf, builder.createTopology());

Apache Storm Interview Questions # 9) How can you Kill a topology in Storm?

A) To kill a topology, simply run:

storm kill {stormname}

10) What happens when Storm kill a topology?

A) Storm won’t kill the topology immediately. Instead, it deactivates all the spouts so that they don’t emit any more tuples, and then Storm waits Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS seconds before destroying all the workers. This gives the topology enough time to complete any tuples it was processing when it got killed.

Top Apache Storm Interview Questions

11) How can you update a running topology?

A) To update a running topology, the only option currently is to kill the current topology and resubmit a new one.

Apache Storm Interview Questions # 12) What does storm swap command do?

A) A planned feature is to implement a storm swap command that swaps a running topology with a new one, ensuring minimal downtime and no chance of both topologies processing tuples at the same time.

13) How can you monitor topologies?

A) The best place to monitor a topology is using the Storm UI. The Storm UI provides information about errors happening in tasks and fine-grained stats on the throughput and latency performance of each component of each running topology.

14) What are Streams?

A) The stream is the core abstraction in Storm. A stream is an unbounded sequence of tuples that is processed and created in parallel in a distributed fashion. Streams are defined with a schema that names the fields in the stream’s tuples.

15) What tuples contain in Storm?

A) By default, tuples can contain integers, longs, shorts, bytes, strings, doubles, floats, booleans, and byte arrays. You can also define your own serializers so that custom types can be used natively within tuples.

Apache Storm Interview Questions # 16) What is Kryo?

A) Storm uses Kryo for serialization. Kryo is a flexible and fast serialization library that produces small serializations.

17) What are Spouts?

A) A spout is a source of streams in a topology. Generally, spouts will read tuples from an external source and emit them into the topology (e.g. a Kestrel queue or the Twitter API).

18) What are reliable or unreliable Spouts?

A) Spouts can either be reliable or unreliable. A reliable spout is capable of replaying a tuple if it failed to be processed by Storm, whereas an unreliable spout forgets about the tuple as soon as it is emitted.

Apache Storm Interview Questions # 19) What are Bolts?

A) All processing in topologies is done in bolts. Bolts can do anything from filtering, functions, aggregations, joins, talking to databases, and more.

Bolts can do simple stream transformations. Doing complex stream transformations often requires multiple steps and thus multiple bolts.

20) What is Stream grouping?

A) A stream grouping defines how that stream should be partitioned among the bolt’s tasks.

Advanced Apache Storm Interview Questions

21) What are the built-in stream groups in Storm?

A) There are eight built-in stream groupings in Storm, they are:

Shuffle grouping: Tuples are randomly distributed across the bolt’s tasks in a way such that each bolt is guaranteed to get an equal number of tuples.

Fields grouping: The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the “user-id” field, tuples with the same “user-id” will always go to the same task, but tuples with different “user-id”‘s may go to different tasks.

Partial Key grouping: The stream is partitioned by the fields specified in the grouping, like the Fields grouping, but are load balanced between two downstream bolts, which provides better utilization of resources when the incoming data is skewed. This paper provides a good explanation of how it works and the advantages it provides.

All grouping: The stream is replicated across all the bolt’s tasks. Use this grouping with care.

Global grouping: The entire stream goes to a single one of the bolt’s tasks. Specifically, it goes to the task with the lowest id.

None grouping: This grouping specifies that you don’t care how the stream is grouped. Currently, none groupings are equivalent to shuffle groupings. Eventually, though, Storm will push down bolts with none groupings to execute in the same thread as the bolt or spout they subscribe from (when possible).

Direct grouping: This is a special kind of grouping. A stream grouped this way means that the producer of the tuple decides which task of the consumer will receive this tuple. Direct groupings can only be declared on streams that have been declared as direct streams.

Local or shuffle grouping: If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks. Otherwise, this acts like a normal shuffle grouping.

Apache Storm Interview Questions # 22) What are Tasks?

A) Each spout or bolt executes as many tasks across the cluster. Each task corresponds to one thread of execution, and stream groupings define how to send tuples from one set of tasks to another set of tasks. You set the parallelism for each spout or bolt in the setSpout and setBolt methods of TopologyBuilder.

23) What are Workers?

A) Topologies execute across one or more worker processes. Each worker process is a physical JVM and executes a subset of all the tasks for the topology.

24) How many types of built-in schedulers are there in Storm?

A) Storm now has 4 kinds of built-in schedulers:

  • DefaultScheduler,
  • IsolationScheduler,
  • MultitenantScheduler,
  • ResourceAwareScheduler.

Apache Storm Interview Questions # 25) Where does default configurations will be stored?

A) Every configuration has a default value defined in defaults.yaml in the Storm codebase

26) What happens when a worker dies?

A) When a worker dies, the supervisor will restart it. If it continuously fails on startup and is unable to heartbeat to Nimbus, Nimbus will reschedule the worker.

27) What happens when a node dies?

A) The tasks assigned to that machine will time-out and Nimbus will reassign those tasks to other machines.

Apache Storm Interview Questions # 28) What happens when Nimbus or Supervisor daemons die?

A) The Nimbus and Supervisor daemons are designed to be fail-fast (process self-destructs whenever any unexpected situation is encountered) and stateless (all state is kept in Zookeeper or on disk).

The Nimbus and Supervisor daemons must be run under supervision using a tool like daemon tools or monit. So if the Nimbus or Supervisor daemons die, they restart like nothing happened.

Most notably, no worker processes are affected by the death of Nimbus or the Supervisors. This is in contrast to Hadoop, where if the JobTracker dies, all the running jobs are lost.

29) Is Nimbus a single point of failure?

A) If you lose the Nimbus node, the workers will still continue to function. Additionally, supervisors will continue to restart workers if they die. However, without Nimbus, workers won’t be reassigned to other machines when necessary (like if you lose a worker machine).

Apache Storm Interview Questions And Answers For Experienced

30) How does Storm guarantee data processing?

A) Storm provides mechanisms to guarantee data processing even if nodes die or messages are lost.

Apache Storm Interview Questions # 31) What makes a running topology: worker processes, executors and tasks?

A) Storm distinguishes between the following three main entities that are used to actually run a topology in a Storm cluster:

  • Worker processes
  • Executors (threads)
  • Tasks

A worker process executes a subset of a topology. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. A running topology consists of many such processes running on many machines within a Storm cluster.

An executor is a thread that is spawned by a worker process. It may run one or more tasks for the same component (spout or bolt).

A task performs the actual data processing — each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology, but the number of executors (threads) for a component can change over time.

Apache Storm Interview Questions # 32) What rules of thumb can you give me for configuring Storm+Trident?

A) number of workers a multiple of number of machines; parallelism a multiple of number of workers; number of kafka partitions a multiple of number of spout parallelism

  • Use one worker per topology per machine
  • Start with fewer, larger aggregators, one per machine with workers on it
  • Use the isolation scheduler
  • Use one acker per worker — 0.9 makes that the default, but earlier versions do not.
  • Enable GC logging; you should see very few major GCs if things are in reasonable shape.
  • Set the trident batch millis to about 50% of your typical end-to-end latency.

Start with a max spout pending that is for sure too small — one for trident, or the number of executors for storm — and increase it until you stop seeing changes in the flow. You’ll probably end up with something near 2*(throughput in recs/sec)*(end-to-end latency) (2x the Little’s law capacity).

33) What are some of the best ways to get a worker to mysteriously and bafflingly die?

A) Do you have write access to the log directory

  • Are you blowing out your heap?
  • Are all the right libraries installed on all of the workers?
  • Is the zookeeper hostname still set to localhost?
  • Did you supply a correct, unique hostname — one that resolves back to the machine — to each worker, and put it in the storm conf file?

Have you opened firewall/securitygroup permissions bidirectionally among a) all the workers, b) the storm master, c) zookeeper? Also, from the workers to any kafka/kestrel/database/etc that your topology accesses? Use netcat to poke the appropriate ports and be sure.

34) Can a Trident topology have Multiple Streams?

A) Can a Trident Topology work like a workflow with conditional paths (if-else)? e.g. A Spout (S1) connects to a bolt (B0) which based on certain values in the incoming tuple routes them to either bolt (B1) or bolt (B2) but not both.

A Trident “each” operator returns a Stream object, which you can store in a variable. You can then run multiple eaches on the same Stream to split it, e.g.:

Stream s = topology.each(…).groupBy(…).aggregate(…)
Stream branch1 = s.each(…, FilterA)
Stream branch2 = s.each(…, FilterB)
You can join streams with join, merge or multiReduce.

35) Why am I getting a NotSerializableException/IllegalStateException when my topology is being started up?

A) Within the Storm lifecycle, the topology is instantiated and then serialized to byte format to be stored in ZooKeeper, prior to the topology being executed.

Within this step, if a spout or bolt within the topology has an initialized unserializable property, serialization will fail. If there is a need for a field that is unserializable, initialize it within the bolt or spout’s prepare method, which is run after the topology is delivered to the worker. Source: Apache Storm Documentation


  1. Apache Kafka Interview Questions
  2. Couchbase Interview Questions
  3. IBM Bluemix Interview Questions
  4. Cloud Foundry Interview Questions
  5. Maven Interview Questions
  6. VirtualBox Interview Questions
  7. Laravel Interview Questions
  8. Logstash Interview Questions
  9. Elasticsearch Interview Questions
  10. Kibana Interview Questions
  11. JBehave Interview Questions
  12. Openshift Interview Questions
  13. Kubernetes Interview Questions
  14. Nagios Interview Questions
  15. Jenkins Interview Questions
  16. Chef Interview Questions
  17. Puppet Interview Questions
  18. RPA Interview Questions And Answers
  19. Demandware Interview Questions
  20. Visual Studio Interview Questions
  21. Vagrant Interview Questions
  22. 60 Java Multiple Choice Questions
  23. 40 Core Java MCQ Questions
  24. Anaplan Interview Questions And Answers
  25. Tableau Multiple Choice Questions
  26. Python Coding Interview Questions
  27. CSS3 Interview Questions
  28. Linux Administrator Interview Questions
  29. SQL Interview Questions
  30. Hibernate Interview Questions

Leave a Comment