What is NoSQL - NoSQL Introduction Tutorial - You Should Read This Guide Now.!

What is NoSQL – Not Only SQL Database – Comprehensive Guide on NoSQL databases from Coding compiler. Here you will learn about NoSQL database concepts like the non-relational database, why do we use NoSQL, RDBMS vs NoSQL, CAP theorem, advantages and disadvantages of NoSQL. Let’s start learning NoSQL database.

What is NoSQL – Introduction to NoSQL

What is NoSQL – (NoSQL = Not Only SQL ), meaning “not just SQL.”

In modern computing systems, there is a huge amount of data on the network every day.

A large part of this data is handled by a relational database management system (RDBMS). In 1970, EF Codd’s paper “A relational model of data for large shared data banks” made data modelling and application programming easier.

Through application practice, the relational model is very suitable for client-server programming, far beyond the expected benefits. Today it is the leading technology for structured data storage in network and business applications.

NoSQL is a brand new database revolutionary movement that was raised in the early days and has grown to a higher level in 2009. NoSQL advocates advocate the use of non-relational data storage, a concept that is undoubtedly a new kind of thinking injection compared to the use of overwhelming relational databases.

Relational database follows ACID rules

A transaction is a transaction in English, similar to a transaction in the real world. It has the following four characteristics:

1, A (Atomicity) atomicity

Atomicity is easy to understand, that is to say, all operations in a transaction are either done or not done. The success of the transaction is that all operations in the transaction are successful. As long as one operation fails, the entire transaction fails and needs to be returned. roll.

For example, bank transfer, from A account to 100 yuan to B account, is divided into two steps: 1) take 100 yuan from the A account; 2) deposit 100 yuan to the B account. These two steps are either done together or not together. If only the first step is completed and the second step fails, the money will be inexplicably less than 100 yuan.

2, C (Consistency) consistency

Consistency is also easier to understand, which means that the database should always be in a consistent state, and the operation of the transaction does not change the original consistency constraint of the database.

For example, the existing integrity constraint a+b=10. If a transaction changes a, then b must be changed so that a+b=10 is still satisfied after the transaction ends, otherwise, the transaction fails.

3, I (Isolation) independence

The so-called independence means that concurrent transactions do not affect each other. If the data to be accessed by one transaction is being modified by another transaction, as long as another transaction is not committed, the data it accesses is not affected by the uncommitted transaction.

For example, if there is a transaction from the A account to 100 dollars to the B account if the transaction has not been completed, if B queries its own account at this time, it will not see the newly added 100 dollars.

4, D (Durability) persistence

Persistence means that once a transaction is committed, the changes it makes will be permanently stored on the database, even if there is a downtime.

Distributed Systems

A distributed system consists of multiple computers and communicating software components connected by a computer network (local network or wide area network).

A distributed system is a software system built on top of a network. Because of the nature of software, distributed systems are highly cohesive and transparent.

Therefore, the difference between a network and a distributed system is more in high-level software (especially operating systems) than hardware.

Distributed systems can be applied to different platforms such as Pc, workstations, local area networks and wide area networks.

Advantages of distributed computing

Reliability (fault tolerance):

An important advantage of distributed computing systems is reliability. A system crash on one server does not affect the rest of the server.

Scalability:

In distributed computing systems you can add more machines as needed.

Resource Sharing:

Sharing data is an essential application, such as banking, booking systems.

flexibility:

Since the system is very flexible, it is easy to install, implement and debug new services.

Faster speed:

Distributed computing systems can have the computing power of multiple computers, making it faster than other systems.

Open system:

Since it is an open system, it can be accessed locally or remotely.

Higher performance:

Higher performance (and better price/performance) can be achieved compared to centralized computer network clusters.

Disadvantages of distributed computing

Troubleshooting:

Troubleshoot and diagnose problems.

software:

Less software support is a major drawback of distributed computing systems.

The internet:

Problems with network infrastructure, including transmission problems, high load, loss of information, etc.

safety:

The characteristics of open systems make distributed computing systems have problems with data security and sharing risks.

What is NoSQL?

NoSQL refers to a non-relational database. NoSQL is sometimes called the abbreviation of Not Only SQL, which is a general term for a database management system different from the traditional relational database.

NoSQL is used for the storage of very large data. (For example, Google or Facebook collects trillions of bits of data per day for their users). These types of data stores do not require a fixed pattern and can scale out without unnecessary operations.

Why use NoSQL?

Today we can easily access and crawl data through third-party platforms (such as Google, Facebook, etc.). The user’s personal information, social networks, geographic locations, user-generated data, and user action logs have multiplied. If we want to mine these user data, then the SQL database is no longer suitable for these applications, and the development of NoSQL database can handle these large data well.

Instance

Social network:

Each record: UserID1, UserID2
Separate records: UserID, first_name, last_name, age, gender,…
Task: Find all friends of friends of friends of … friends of a given user.

Wikipedia page:

Large collection of documents
Combination of structured and unstructured data
Task: Retrieve all pages regarding athletics of Summer Olympic before 1950.

RDBMS vs NoSQL

RDBMS
– Highly Organized Structured Data
– Structured Query Language (SQL)
– Both data and relationships are stored in separate tables.
– Data manipulation language, data definition language
– Strict consistency
– Basic transactions

NoSQL
– stands for more than just SQL
– no declarative query language
– no predefined schema
– key – value pair storage, column storage, document storage, graphics database
– final consistency, not ACID properties
– unstructured and not Predicted Data
– CAP Theorem
– High Performance, High Availability and Scalability

A brief history of NoSQL

The term NoSQL first appeared in 1998 and is a lightweight, open source, a non-SQL-enabled relational database developed by Carlo Strozzi.

In 2009, Johan Oskarsson of Last.fm launched a discussion on distributed open source databases [2]. Eric Evans from Rackspace once again proposed the concept of NoSQL. At this time, NoSQL mainly refers to non-relational, distributed, and not provided. ACID database design pattern.

The “no:sql(east)” seminar held in Atlanta in 2009 was a milestone with the slogan “select fun, profit from real_world where relational=false;”. Therefore, the most common interpretation of NoSQL is “non-associative”, emphasizing the advantages of Key-Value Stores and document databases, rather than simply opposing RDBMS.

CAP theorem

In computer science, the CAP theorem, also known as Brewer’s theorem, points out that for a distributed computing system, it is impossible to satisfy the following three points:

Consistency (all nodes have the same data at the same time)
Availability (guarantee that every request responds with success or failure)
Partition tolerance (the loss or failure of any information in the system does not affect the continued operation of the system)

The core of CAP theory is that a distributed system cannot meet the three requirements of consistency, availability and partition fault tolerance at the same time, and can only satisfy two at the same time.

Therefore, according to the CAP principle, the NoSQL database is divided into three categories: satisfying the CA principle, satisfying the CP principle, and satisfying the AP principle:

CA – A single point cluster, a system that satisfies consistency and availability, and is usually less powerful in terms of scalability.
CP – A system that satisfies consistency, partition tolerance, and usually does not perform particularly well.
AP – A system that satisfies availability, partition tolerance, and usually may have lower consistency requirements.

Advantages/disadvantages of NoSQL

Advantages:

High scalability
Distributed Computing
low cost
Architecture flexibility, semi-structured data
no complicated relationship

Disadvantages:

No standardization
Limited query function (so far)
Ultimate consistency is not intuitive

BASE

BASE: Basically Available, Soft-state, Eventually Consistent. Defined by Eric Brewer.

BASE is a weak requirement for the availability and consistency of NoSQL databases:

Basically Availble — Basically available
Soft-state — soft state / flexible transaction. “Soft state” can be understood as “no connection”, and “Hard state” is “connection-oriented”
Eventual Consistency – The ultimate consistency is also the ultimate goal of ACID.

ACID vs BASE

ACID	BASE
Atomicity ( Atomicity)	Basically available ( Basically Available)
Consistency ( Consistency)	Soft state / flexible transaction ( Soft state)
Isolation ( the Isolation)	The final consistency ( Eventual Consistency)
Persistent ( Durable)

NoSQL database classification

Types of	Partial representative	Characteristics
Column storage	Hbase Cassandra Hypertable	As the name suggests, data is stored in columns. The biggest feature is the convenience of storing structured and semi-structured data, facilitating data compression, and having a very large IO advantage for queries against a certain column or columns.
Document storage	MongoDB CouchDB	Document storage is generally stored in a format similar to json, and the stored content is document type. This also has the opportunity to index certain fields and implement some functions of the relational database.
Key-value storage	Tokyo Cabinet / Tyrant Berkeley DB MemcacheDB Redis	You can quickly query its value by key. In general, the format of the value regardless of the value is stored. (Redis includes other features)
Graph storage	Neo4J FlockDB	The best storage for graphical relationships. The performance is solved by using the traditional relational database, and the design is inconvenient to use.
Object storage	Db4o Versant	The database is accessed by an object-like language grammar, and the data is accessed by means of objects.
Xml database	Berkeley DB XML BaseX	Efficiently store XML data and support XML internal query syntax, such as XQuery, XPath.

Who is using the N0SQL database?

There are already many companies using NoSQL:

Google
Facebook
Mozilla
Adobe
Foursquare
LinkedIn
Digg
McGraw-Hill Education
Vermont Public Radio

What is NoSQL – NoSQL Introduction – Comprehensive Guide On NoSQL