DEFINITION :-
"A database management system is a collection of interrelated data and a set of programs to access those data. Collection of data is referred to as a database."
Primary goal of dbms is to provide a way to store and retrieve database information that is both convenient and efficient. Dbms allows us to define structure for storage of information and also provides mechanism to manipulate this information. Dbms also provides safety for the information stored despite system crashes or attempts of authorized access.
DATA ABSTRACTION :-
Major purpose of dbms is to provide users with abstract view of data i.e. the system hides certain details of how the data are stored and maintained. Since database system users are not computer trained, developers hide the complexity from users through 3 levels of abstraction, to simplify user’s interaction with the system.
1) Physical level of data abstraction :
This s the lowest level of abstraction which describes how data are actually stored.
2) Logical level of data abstraction :
This level hides what data are actually stored in the database and what relationship exists among them.
3) View Level of data abstraction :
View provides security mechanism to prevent user from accessing certain parts of database.
SCHEMAS :-
1. The internal level has an internal schema, which describes the physical storage structure of the database. The internal schema uses a physical data model and describes the complete details of data storage and access paths for the database.
2. The conceptual level has a conceptual schema, which describes the structure of the whole database for a community of users. The conceptual schema hides the details of physical storage structures and concentrates on describing entities, data types, relationships, user operations, and constraints. A high-level data model or an implementation data model can be used at this level.
3. The external or view level includes a number of external schemas or user views. Each external schema describes the part of the database that a particular user group is interested in and hides the rest of the database from that user group. A high-level data model or an implementation data model can be used at this level.
4 TYPES OF DATABASE USERS :-
1) Naive users:
These are the unsophisticated users who interact with the system by invoking one of the application programs that have been written previously.
E.g. consider a user who checks for account balance information over the World Wide Web.
2) Application programmers:
These are computer professionals who write application programs, used to develop user interfaces. The application programmer uses Rapid Application Development (RAD) toolkit or special type of programming languages which include special features to facilitate generation of forms and display of date on screen.
3) Sophisticated users:
These users interact with the database using database query language. They submit their query to the query processor. Then Data Manipulation Language (DML) functions are performed on the database to retrieve the data. Tools used by these users are OLAP(Online Analytical Processing) and data mining tools.
4) Specialized users:
These users write specialized database applications to retrieve data. These applications can be used to retrieve data with complex data types e.g. graphics data and audio data.
ENTITY RELATIONSHIP MODEL :-
1. Entity :
• The basic object that the ER model represents is an entity, which is a "thing" in the real world with a independent existence.
• An entity may be an object with a physical existence—a particular person, car, house, or employee—or it may be an object with a conceptual existence—a company, a job, or a university course.
2. ATTRIBUTE :
• Each entity has attributes—the particular properties that describe it.
• For example, an employee entity may be described by the employee’s name, age, address, salary, and job.
• A particular entity will have a value for each of its attributes.
• The attribute values that describe each entity become a major part of the data stored in the database.
• Several types of attributes occur in the ER model: simple versus composite; single-valued versus multi-valued; and stored versus derived.
3. KEY :
• An important constraint on the entities of an entity type is the key or uniqueness constraint on attributes.
• A key is an attribute (also known as column or field) or a combination of attribute that is used to identify records.
• Sometimes we might have to retrieve data from more than one table, in those cases we require to join tables with the help of keys.
• The purpose of the key is to bind data together across tables without repeating all of the data in every table
• Such an attribute is called a key attribute, and its values can be used to identify each entity uniquely.
• For example, the Name attribute is a key of the COMPANY entity type because no two companies are allowed to have the same name.
• For the PERSON entity type, a typical key attribute is SocialSecurityNumber.
• Sometimes, several attributes together form a key, meaning that the combination of the attribute values must be distinct for each entity.
• If a set of attributes possesses this property, we can define a composite attribute that becomes a key attribute of the entity type.
- Types Of Keys :-
i) Super Key – An attribute or a combination of attribute that is used to identify the records uniquely is known as Super Key. A table can have many Super Keys.
ii) Candidate Key – It can be defined as minimal Super Key or irreducible Super Key. In other words an attribute or a combination of attribute that identifies the record uniquely but none of its proper subsets can identify the records uniquely.
iii) Primary Key – A Candidate Key that is used by the database designer for unique identification of each row in a table is known as Primary Key. A Primary Key can consist of one or more attributes of a table.
iv) Foreign Key – A foreign key is an attribute or combination of attribute in one base table that points to the candidate key (generally it is the primary key) of another table. The purpose of the foreign key is to ensure referential integrity of the data i.e. only values that are supposed to appear in the database are permitted.
v) Composite Key – If we use multiple attributes to create a Primary Key then that Primary Key is called Composite Key (also called a Compound Key or Concatenated Key).
vi) Alternate Key – Alternate Key can be any of the Candidate Keys except for the Primary Key.
vii) Secondary Key – The attributes that are not even the Super Key but can be still used for identification of records (not unique) are known as Secondary Key.
4. An E-R diagram can express the overall logical structure of a database graphically. E-R diagrams are simple and clear— qualities that may well account in large part for the widespread use of the E-R model. Such a diagram consists of the following major components:
• Rectangles, which represent entity sets
• Ellipses, which represent attributes
• Diamonds, which represent relationship sets
• Lines, which link attributes to entity sets and entity sets to relationship sets
• Double ellipses, which represent multi valued attributes
• Dashed ellipses, which denote derived attributes
• Double lines, which indicate total participation of an entity in a relationship set
• Double rectangles, which represent weak entity sets
RELATIONAL MODEL :-
The relational model represents the database as a collection of relations. Informally, each relation resembles a table of values or, to some extent, a "flat" file of records. When a relation is thought of as a table of values, each row in the table represents a collection of related data values. In the relational model, each row in the table
represents a fact that typically corresponds to a real world entity or relationship. The table name and column names are used to help in interpreting the meaning of the values in each row. In the formal relational model terminology, a row is called a tuple, a column header is called an attribute, and the table is called a relation. The
data type describing the types of values that can appear in each column is called a domain.
To preserve the consistency and correctness of its stored data, a relational DBMS typically imposes one or more data integrity constraints. These constraints restrict the data values that can be inserted into the database or created by a database update. Several different types of data integrity constraints are commonly found in relational databases, including:
• Required data :
Some columns in a database must contain a valid data value in every row; they are not allowed to contain
missing or NULL values. In the sample database, every order must have an associated customer who placed the order. The DBMS can be asked to prevent NULL values in this column.
• Validity checking :
Every column in a database has a domain, a set of data values that are legal for that column. The DBMS can be
asked to prevent other data values in these columns.
• Entity integrity :
The primary key of a table must contain a unique value in each row, which is different from the values in all other
rows. Duplicate values are illegal, because they wouldn't allow the database to distinguish one entity from another. The DBMS can be asked to enforce this unique values constraint.
• Referential integrity :
A foreign key in a relational database links each row in the child table containing the foreign key to the row of the parent table containing the matching primary key value. The DBMS can be asked to enforce this foreign key/primary key constraint.
• Other data relationships :
The real-world situation modeled by a database will often have additional constraints that govern the legal data values that may appear in the database. The DBMS can be asked to check modifications to the tables to make sure that their values are constrained in this way.
• Business rules :
Updates to a database may be constrained by business rules governing the real-world transactions that are
represented by the updates.
• Consistency :
Many real-world transactions cause multiple updates to a database. The DBMS can be asked to enforce this
type of consistency rule or to support applications that implement such rules.
DATABASE LANGUAGES :-
- Data definition language - defines data types and the relationships among them
- Data manipulation language - performs tasks such as inserting, updating, or deleting data occurrences
- Query language - allows searching for information and computing derived information
Advantages of DBMS :-
Redundancy is controlled.
Unauthorized access is restricted.
Providing multiple user interfaces.
Enforcing integrity constraints.
- Providing backup and recovery.
Disadvantages of File Processing System :-
Data redundancy and inconsistency.
Difficult in accessing data.
Data isolation.
Data integrity.
Concurrent access is not possible.
- Security Problems.
Three levels of data abstraction :-
Physical level: The lowest level of abstraction describes how data are stored.
Logical level: The next higher level of abstraction, describes what data are stored in database and what relationship among those data.
- View level: The highest level of abstraction describes only part of entire database.
Integrity Rules :-
There are two Integrity rules.
NORMAL FORMS :-
The domain of attribute must include only atomic (simple, indivisible) values.
A relation schema R is in 2NF if it is in 1NF and every non-prime attribute A in R is fully functionally dependent on primary key.
A relation schema R is in 3NF if it is in 2NF and for every FD X A either of the following is true
X is a Super-key of R.
A is a prime attribute of R.
In other words, if every non prime attribute is non-transitively dependent on primary key.
A relation schema R is said to be in 4NF if for every Multivalued dependency X Y that holds over R, one of following is true.
1. X is subset or equal to (or) XY = R.
2. X is a super key.
A Relation schema R is said to be 5NF if for every join dependency {R1, R2, ..., Rn} that holds R, one the following is true
1. Ri = R for some i.
2. The join dependency is implied by the set of FD, over R in which the left side is key of R.
BCNF (Boyce-Codd Normal Form) :-
A relation schema R is in BCNF if it is in 3NF and satisfies an additional constraint that for every FD X A, X must be a candidate key.
Domain-Key Normal Form :-
A relation is said to be in DKNF if all constraints and dependencies that should hold on the the constraint can be enforced by simply enforcing the domain constraint and key constraint on the relation.
ACID Properties :-
ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.
Jim Gray defined these properties of a reliable transaction system in the late 1970s and developed technologies to achieve them automatically. In 1983, Andreas Reuter and Theo Härder coined the acronym ACID to describe them.
- Atomicity requires that each transaction is "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. To the outside world, a committed transaction appears (by its effects on the database) to be indivisible ("atomic"), and an aborted transaction does not happen.
- The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code) but merely that any programming errors do not violate any defined rules.
- The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e. one after the other. Providing isolation is the main goal of concurrency control. Depending on concurrency control method, the effects of an incomplete transaction might not even be visible to another transaction.
- Durability that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory.