ZooKeeper Interview Questions And Answers. Here Coding compiler sharing a list of 30 interview questions on Zookeeper and these Zookeeper questions were asked in various interviews conducted by MNCs and prepared by ZooKeeper experts. All the best for your future and happy learning.
ZooKeeper Interview Questions
- What is ZooKeeper?
- What are the prime features of Apache Zookeeper?
- What is ZooKeeper Atomic Broadcast (ZAB) protocol?
- What are the key elements in ZooKeeper Architecture?
- What are the Design Goals of ZooKeeper?
- What is the Data model, and the hierarchical namespace?
- What are Nodes and ephemeral nodes?
- What are Znodes?
- What are Watches in Zookeeper?
- What is org.apache.jute package?
ZooKeeper Interview Questions And Answers
1) What is ZooKeeper?
A) ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications.
2) What are the prime features of Apache Zookeeper?
A) Some of the prime features of Apache ZooKeeper are:
Reliable System: This system is very reliable as it keeps working even if a node fails.
Simple Architecture: The architecture of ZooKeeper is quite simple as there is a shared hierarchical namespace which helps to coordinate the processes.
Fast Processing: Zookeeper is especially fast in “read-dominant” workloads (i.e. workloads in which reads are much more common than writes).
Scalable: The performance of ZooKeeper can be improved by adding nodes.
3) What is ZooKeeper Atomic Broadcast (ZAB) protocol?
A) The ZooKeeper Atomic Broadcast (ZAB) protocol is the core of the system. ZooKeeper can be viewed as an atomic broadcast system, through which updates are totally ordered.
4) What are the key elements in ZooKeeper Architecture?
A) The key elements in the Zookeeper architecture are:
Node: The systems installed on the cluster
ZNode: The nodes where the status is updated by other nodes in cluster
Client Applications: The tools that interact with the distributed applications
Server Applications: Allows the client applications to interact using a common interface
5) What are the Design Goals of ZooKeeper?
A) ZooKeeper is simple. ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchal namespace which is organized similarly to a standard file system.
The name space consists of data registers – called znodes, in ZooKeeper parlance – and these are similar to files and directories. Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can acheive high throughput and low latency numbers.
The ZooKeeper implementation puts a premium on high performance, highly available, strictly ordered access. The performance aspects of ZooKeeper means it can be used in large, distributed systems. The reliability aspects keep it from being a single point of failure. The strict ordering means that sophisticated synchronization primitives can be implemented at the client.
6) What is the Data model, and the hierarchical namespace?
A) The name space provided by ZooKeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash (/). Every node in ZooKeeper’s name space is identified by a path.
ZooKeeper Interview Questions # 7) What are Nodes and ephemeral nodes?
A) Unlike is standard file systems, each node in a ZooKeeper namespace can have data associated with it as well as children. It is like having a file-system that allows a file to also be a directory.
(ZooKeeper was designed to store coordination data: status information, configuration, location information, etc., so the data stored at each node is usually small, in the byte to kilobyte range.)
ZooKeeper Interview Questions # 8) What are Znodes?
A) Znodes maintain a stat structure that includes version numbers for data changes, ACL changes, and timestamps, to allow cache validations and coordinated updates. Each time a znode’s data changes, the version number increases. For instance, whenever a client retrieves data it also receives the version of the data.
The data stored at each znode in a namespace is read and written atomically. Reads get all the data bytes associated with a znode and a write replaces all the data. Each node has an Access Control List (ACL) that restricts who can do what.
ZooKeeper also has the notion of ephemeral nodes. These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted. Ephemeral nodes are useful when you want to implement.
9) What are Watches in Zookeeper?
A) ZooKeeper supports the concept of watches. Clients can set a watch on a znodes. A watch will be triggered and removed when the znode changes. When a watch is triggered the client receives a packet saying that the znode has changed. And if the connection between the client and one of the Zoo Keeper servers is broken, the client will receive a local notification.
10) What is org.apache.jute package?
A) org.apache.jute – Hadoop record I/O contains classes and a record description language translator for simplifying serialization and deserialization of records in a language-neutral manner.
Apache ZooKeeper Interview Questions
11) What are barriers?
A) A barrier is a primitive that enables a group of processes to synchronize the beginning and the end of a computation. The general idea of this implementation is to have a barrier node that serves the purpose of being a parent for individual process nodes. Suppose that we call the barrier node “/b1”. Each process “p” then creates a node “/b1/p”. Once enough processes have created their corresponding nodes, joined processes can start the computation.
12) What are Producer-Consumer Queues?
A) A producer-consumer queue is a distributed data structure that group of processes use to generate and consume items. Producer processes create new elements and add them to the queue. Consumer processes remove elements from the list, and process them.
13) What is CONNECTION_LOSS error?
A) CONNECTION_LOSS means the link between the client and server was broken. It doesn’t necessarily mean that the request failed. If you are doing a create request and the link was broken after the request reached the server and before the response was returned, the create request will succeed.
If the link was broken before the packet went onto the wire, the create request failed. Unfortunately, there is no way for the client library to know, so it returns CONNECTION_LOSS.
The programmer must figure out if the request succeeded or needs to be retired. Usually, this is done in an application-specific way. Examples of success detection include checking for the presence of a file to be created or checking the value of a znode to be modified.
ZooKeeper Interview Questions # 14) How should you handle SESSION_EXPIRED?
A) SESSION_EXPIRED automatically closes the ZooKeeper handle. In a correctly operating cluster, you should never see SESSION_EXPIRED. It means that the client was partitioned off from the ZooKeeper service for more the the session timeout and ZooKeeper decided that the client died. Because the ZooKeeper service is ground truth, the client should consider itself dead and go into recovery.
If the client is only reading state from ZooKeeper, recovery means just reconnecting. In more complex applications, recovery means recreating ephemeral nodes, vying for leadership roles, and reconstructing published state.
ZooKeeper Interview Questions # 15) Is there an easy way to expire a session for testing?
A) Yes, a ZooKeeper handle can take a session id and password. This constructor is used to recover a session after total application failure. For example, an application can connect to ZooKeeper, save the session id and password to a file, terminate, restart, read the session id and password, and reconnect to ZooKeeper without loosing the session and the corresponding ephemeral nodes.
It is up to the programmer to ensure that the session id and password isn’t passed around to multiple instances of an application, otherwise problems can result.
ZooKeeper Interview Questions # 16) What are the options-process for upgrading ZooKeeper?
A) There are two primary ways of doing this; 1) full restart or 2) rolling restart.
In the full restart case you can stage your updated code/configuration/etc…, stop all of the servers in the ensemble, switch code/configuration, and restart the ZooKeeper ensemble.
If you do this programmatically (scripts typically, ie not by hand) the restart can be done on order of seconds. As a result the clients will lose connectivity to the ZooKeeper cluster during this time, however it looks to the clients just like a network partition.
All existing client sessions are maintained and re-established as soon as the ZooKeeper ensemble comes back up. Obviously, one drawback to this approach is that if you encounter any issues (it’s always a good idea to test/stage these changes on a test harness) the cluster may be down for longer than expected.
The second option, preferable for many users, is to do a “rolling restart”. In this case you upgrade one server in the ZooKeeper ensemble at a time; bring down the server, upgrade the code/configuration/etc…, then restart the server.
The server will automatically rejoin the quorum, update it’s internal state with the current ZK leader, and begin serving client sessions. As a result of doing a rolling restart, rather than a full restart, the administrator can monitor the ensemble as the upgrade progresses, perhaps rolling back if any issues are encountered.
17) Can I run an ensemble cluster behind a load balancer?
A) There are two types of servers failures in a distributed system from socket I/O perspective.
Server down due to hardware failures and OS panic/hang, Zookeeper daemon hang, temporary/permanent network outage, network switch anomaly, etc: client cannot figure out failures immediately since there is no responding entities. As a result, zookeeper clients must rely on timeout to identify failures.
Dead zookeeper process (daemon): since OS will respond to closed TCP port, client will get “connection refused” upon socket connect or “peer reset” on socket I/O. Client immediately notice that the other end failed.
18) What happens to ZK sessions while the cluster is down?
A) Imagine that a client is connected to ZK with a 5 second session timeout, and the administrator brings the entire ZK cluster down for an upgrade. The cluster is down for several minutes, and then is restarted.
In this scenario, the client is able to reconnect and refresh its session. Because session timeouts are tracked by the leader, the session starts counting down again with a fresh timeout when the cluster is restarted. So, as long as the client connects within the first 5 seconds after a leader is elected, it will reconnect without an expiration, and any ephemeral nodes it had prior to the downtime will be maintained.
The same behavior is exhibited when the leader crashes and a new one is elected. In the limit, if the leader is flip-flopping back and forth quickly, sessions will never expire since their timers are getting constantly reset.
19) What are the different ZKClientBindings?
A) ZooKeeper ships with C, Java, Perl and Python client bindings, here are a list of client bindings that are available from the community are,
Scala, C#, Node.js, Twisted/Python, Python (no C dependency), Erlang, Haskell, Ruby, Go, Lua.
20) Can you list some useful Zookeeper tools?
A) zkconf – generate configuration for a ZooKeeper ensemble
zk-smoketest – smoketest or latencytest a ZooKeeper ensemble (uses zkpython)
zookeeper_dashboard – web dashboard for ZooKeeper ensemble (uses zkpython & django)
zktop – monitor ZooKeeper in realtime
zkexamples – phunt’s “random examples of useful bits of ZooKeeper ephemera”
SPM for ZooKeeper – Performance Monitoring and Alerting for ZooKeeper
Advanced ZooKeeper Interview Questions For Experienced
21) What is Cages?
A) Cages ia a distributed synchronization library for Zookeeper. Cages is a Java library of distributed synchronization primitives that uses the Apache ZooKeeper system. If you can run a ZooKeeper machine or cluster, then you can use Cages to synchronize and coordinate data access, data manipulation and data processing, configuration change and more esoteric things like cluster membership across multiple machines.
22) What is BookKeeper?
A) BookKeeper is a system to reliably log streams of records. It is designed to store write ahead logs, such as those found in database or database like applications. In fact, the Hadoop NameNode inspired BookKeeper. The NameNode logs changes to the in-memory namespace data structures to the local disk before they are applied in memory. However logging the changes locally means that if the NameNode fails the log will be inaccessible. We found that by using BookKeeper, the NameNode can log to distributed storage devices in a way that yields higher availability and performance. Although it was designed for the NameNode, BookKeeper can be used for any application that needs strong durability guarantees with high performance and has a single writer.
23) What are the bookkeeper elements and concepts?
A) BookKeeper uses four basic elements:
- BookKeeper Client
- Metadata Storage Service
24) What is a Ledger in BookKeeper?
A) Ledger : A ledger is a sequence of entries, and each entry is a sequence of bytes. Entries are written sequentially to a ledger and at most once. Consequently, ledgers have an append-only semantics;
ZooKeeper Interview Questions # 25) What is a BookKeeper client in BookKeeper?
A) BookKeeper client : A client runs along with a BookKeeper application, and it enables applications to execute operations on ledgers, such as creating a ledger and writing to it;
ZooKeeper Interview Questions # 26) What is a Bookie in BookKeeper?
A) Bookie : A bookie is a BookKeeper storage server. Bookies store the content of ledgers. For any given ledger L, we call an ensemble the group of bookies storing the content of L. For performance, we store on each bookie of an ensemble only a fragment of a ledger. That is, we stripe when writing entries to a ledger such that each entry is written to sub-group of bookies of the ensemble.
ZooKeeper Interview Questions # 27) What is Metadata storage service in BookKeeper?
A) Metadata storage service : BookKeeper requires a metadata storage service to store information related to ledgers and available bookies. We currently use ZooKeeper for such a task.
ZooKeeper Interview Questions # 28) Why not just use zookeeper for everything?
A) There are a number of reasons:
1. Zookeeper’s log is only exposed through a tree like interface. It can be hard to shoehorn your application into this.
2. A zookeeper ensemble of multiple machines is limited to one log. You may want one log per resource, which will become expensive very quickly.
3. Adding extra machines to a zookeeper ensemble does not increase capacity nor throughput.
Bookkeeper can be viewed as a means of exposing zookeeper’s replicated log to applications in a scalable fashion. However, we still use zookeeper to maintain consistency guarantees.
ZooKeeper Interview Questions # 29) What is Closing out ledgers?
A) The process of closing out the ledger and finding the last entry is difficult due to the durability guarantees of BookKeeper:
If an entry has been successfully recorded, it must be readable.
If an entry is read once, it must always be available to be read.
If the ledger was closed gracefully, ZooKeeper will have the last entry and everything will work well. But, if the BookKeeper client that was writing the ledger dies, there is some recovery that needs to take place.
ZooKeeper Interview Questions # 30) Describe how HBase uses ZooKeeper?
A) HBase clients find the cluster to connect to by asking zookeeper. The only configuration a client needs is the zk quorum to connect to. Masters and hbase slave nodes (regionservers) all register themselves with zk. If their znode evaporates, the master or regionserver is consided lost and repair begins. Source
RELATED INTERVIEW QUESTIONS AND ANSWERS
- Peoplesoft Admin Interview Questions
- Apache Kafka Interview Questions
- Couchbase Interview Questions
- IBM Bluemix Interview Questions
- Cloud Foundry Interview Questions
- Maven Interview Questions
- VirtualBox Interview Questions
- Laravel Interview Questions
- Logstash Interview Questions
- Elasticsearch Interview Questions
- Kibana Interview Questions
- JBehave Interview Questions
- Openshift Interview Questions
- Kubernetes Interview Questions
- Nagios Interview Questions
- Jenkins Interview Questions
- Chef Interview Questions
- Puppet Interview Questions
- RPA Interview Questions And Answers
- Demandware Interview Questions
- Visual Studio Interview Questions
- Vagrant Interview Questions
- 60 Java Multiple Choice Questions
- 40 Core Java MCQ Questions
- Anaplan Interview Questions And Answers
- Tableau Multiple Choice Questions
- Python Coding Interview Questions
- CSS3 Interview Questions
- Linux Administrator Interview Questions