ICT: 10/27/17

Friday, 27 October 2017

DBMS - Normalization

DBMS - Normalization

Functional Dependency

Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional dependency says that if two tuples have same values for attributes A1, A2,..., An, then those two tuples must have to have same values for attributes B1, B2, ..., Bn.

Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally determines Y. The left-hand side attributes determine the values of attributes on the right-hand side.

Armstrong's Axioms

If F is a set of functional dependencies then the closure of F, denoted as F+, is the set of all functional dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when applied repeatedly, generates a closure of functional dependencies.

Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.

Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is adding attributes in dependencies, does not change the basic dependencies.

Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a → c also holds. a → b is called as a functionally that determines b.

Trivial Functional Dependency

Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial FD. Trivial FDs always hold.

Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD.

Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a completely non-trivial FD.

Normalization

If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator. Managing a database with anomalies is next to impossible.

Update anomalies − If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly while a few others are left with old values. Such instances leave the database in an inconsistent state.

Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also saved somewhere else.

Insert anomalies − We tried to insert data in a record that does not exist at all.

Normalization is a method to remove all these anomalies and bring the database to a consistent state.

First Normal Form

First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all the attributes in a relation must have atomic domains. The values in an atomic domain are indivisible units.

unorganized relation

We re-arrange the relation (table) as below, to convert it to First Normal Form.

Relation in 1NF

Each attribute must contain only a single value from its pre-defined domain.

Second Normal Form

Before we learn about the second normal form, we need to understand the following −

Prime attribute − An attribute, which is a part of the prime-key, is known as a prime attribute.

Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-prime attribute.

If we follow second normal form, then every non-prime attribute should be fully functionally dependent on prime key attribute. That is, if X → A holds, then there should not be any proper subset Y of X, for which Y → A also holds true.

Relation not in 2NF

We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon both and not on any of the prime key attribute individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial dependency, which is not allowed in Second Normal Form.

Relation in 2NF

We broke th

database users

Users of Database management system:-

1. Database administrators: responsible for authorizing access to the database, for coordinating and monitoring its use, acquiring software, and hardware resources, controlling its use and monitoring efficiency of operations.

2. Database Designers: Responsible to define the content, the structure, the constraints, and functions or transactions against the database. They must communicate with the end-users and understand their needs.

3. End users : End users are the people whose jobs require access to the database for querying, updating, and generating reports; the database primarily exists for their use. There are several categories of end users:

Casual End User: access database occasionally when needed. But they may need different information each time.

Naïve or Parametric End user : they make up a large section of the end-user population. They use previously well-defined functions in the form of canned transactionsÂ” against the database. Examples are bank-tellers or reservation clerks who do this activity for an entire shift of operations.

Sophisticated End User : These include business analysts, scientists, engineers, others thoroughly familiar with the system capabilities. Many use tools in the form of software packages that work closely with the stored database.

Stand-alone End User : Mostly maintain personal databases using ready-to-use packaged applications. An example is a tax program user that creates his or her own internal database.

function of dbms

The functions performed by a typical DBMS are the following:

1. Data Definition

The DBMS provides functions to define the structure of the data in the application. These include defining and modifying the record structure, the type and size of fields and the various constraints/conditions to be satisfied by the data in each field.

2. Data Manipulation

Once the data structure is defined, data needs to be inserted, modified or deleted. The functions which perform these operations are also part of the DBMS. These function can handle planned and unplanned data manipulation needs. Planned queries are those which form part of the application. Unplanned queries are ad-hoc queries which are performed on a need basis.

3. Data Security & Integrity

The DBMS contains functions which handle the security and integrity of data in the application. These can be easily invoked by the application and hence the application programmer need not code these functions in his/her programs.

4. Data Recovery & Concurrency

Recovery of data after a system failure and concurrent access of records by multiple users are also handled by the DBMS.

5. Data Dictionary Maintenance

Maintaining the Data Dictionary which contains the data definition of the application is also one of the functions of a DBMS.

6. Performance

Optimizing the performance of the queries is one of the important functions of a DBMS. Hence the DBMS has a set of programs forming the Query Optimizer which evaluates the different implementations of a query and chooses the best among them.

Thus the DBMS provides an environment that is both convenient and efficient to use when there is a large volume of data and many transactions to be processed.

ADVANTAGE DATABASE SYSTEM

As shown in the figure, the DBMS is a central system which provides a common interface between the data and the various front-end programs in the application. It also provides a central location for the whole data in the application to reside.

Due to its centralized nature, the database system can overcome the disadvantages of the file-based system as discussed below.

Minimal Data Redundancy:- Since the whole data resides in one central database, the various programs in the application can access data in different data files. Hence data present in one file need not be duplicated in another. This reduces data redundancy.

However, this does not mean all redundancy can be eliminated. There could be business or technical reasons for having some amount of redundancy. Any such redundancy should be carefully controlled and the DBMS should be aware of it.

Data Consistency:- Reduced data redundancy leads to better data consistency.

Data Integration:- Since related data is stored in one single database, enforcing data integrity is much easier. Moreover, the functions in the DBMS can be used to enforce the integrity rules with minimum programming in the application programs.

Data Sharing:- Related data can be shared across programs since the data is stored in a centralized manner. Even new applications can be developed to operate against the same data.

· Enforcement of Standards:- Enforcing standards in the organization and structure of data files is required and also easy in a Database System, since it is one single set of programs which is always interacting with the data files.

· Application Development Ease:- The application programmer need not build the functions for handling issues

like concurrent access, security, data integrity, etc. The programmer only needs to implement the application business rules. This brings in application development ease. Adding additional functional modules is also easier than in file-based systems.

· Better Controls:-Better controls can be achieved due to the centralized nature of the system.

· Data Independence:-The architecture of the DBMS can be viewed as a 3-level system comprising the following:

- The internal or the physical level where the data resides.

- The conceptual level which is the level of the DBMS functions

- The external level which is the level of the application programs or the end user.

Data Independence is isolating an upper level from the changes in the organization or structure of a lower level. For example, if changes in the file organization of a data file do not demand for changes in the functions in the DBMS or in the application programs, data independence is achieved. Thus Data Independence can be defined as immunity of applications to change in physical representation and access technique. The provision of data independence is a major objective for database systems.

· Reduced Maintenance:-Maintenance is less and easy, again, due to the centralized nature of the system.

history of database system

Data processing drives the growth of computers, as it has from the earliest days of commercial computers. In fact, automation of data processing tasks predates computers. Punched cards, invented by Hollerith, were used at the very beginning of the twentieth century to record U.S. census data, and mechanicalsystems were used to process the cards and tabulate results. Punched cards were later widely used as a means of entering data into computers.

Techniques for data storage and processing have evolved over the years:

• 1950s and early 1960s: Magnetic tapes were developed for data storage. Data processing tasks such as payroll were automated, with data stored on tapes. Processing of data consisted of reading data from one or more tapes and writing data to a new tape. Data could also be input from punched card decks, and output to printers. For example, salary raises were processed by entering the raises on punched cards and reading the punched card deck in synchronization with a tape containing the master salary details. The records had to be in the same sorted order. The salary raises would be added to the salary read from the master tape, and written to a new tape; the new tape would become the new master tape.

• Late 1960s and 1970s:Widespread use of hard disks in the late 1960s changed the scenario for data processing greatly, since hard disks allowed direct access to data. The position of data on disk was immaterial, since any location on disk could be accessed in just tens of milliseconds. Data were thus freed from the tyranny of sequentiality.With disks, network and hierarchical databases could be created that allowed data structures such as lists and trees to be stored on disk. Programmers could construct and manipulate these data structures.

A landmark paper by Codd [1970] defined the relational model, and nonprocedural ways of querying data in the relational model, and relational databases were born.The simplicity of the relational model and the possibility of hiding implementation details completely from the programmer were enticing indeed. Codd later won the prestigious Association of Computing Machinery Turing Award for his work.

1980s: Although academically interesting, the relational model was not used in practice initially, because of its perceived performance disadvantages; relational databases could not match the performance of existing network and hierarchical databases. That changed with System R, a groundbreaking project at IBM Research that developed techniques for the construction of an efficient relational database system. Excellent overviews of System R are provided by Astrahan [1976] and Chamberlin [1981]. The fully functional System R prototype led to IBM’s first relational database product, SQL/DS. Initial commercial relational database systems, such as IBM DB2, Oracle, Ingres, and DEC Rdb, played a major role in advancing techniques for efficient processing of declarative queries. By the early 1980s, relational databases had become competitive with network and hierarchical database systems even in the area of performance. Relational databases were so easy to use that they eventually replaced network/hierarchical databases; programmers using such databases were forced to deal with many low-level implementation details, and had to code their queries in a procedural fashion.
• Early 1990s: The SQL language was designed primarily for decision support applications, which are query intensive, yet the mainstay of databases in the1980s was transaction processing applications, which are update intensive. Decision support and querying re-emerged as a major application area for databases. Tools for analyzing large amounts of data saw large growths in usage.

Many database vendors introduced parallel database products in this period. Database vendors also began to add object-relational support to their databases.

• Late 1990s: The major event was the explosive growth of the World Wide Web. Databases wer

view of data

A database system is a collection of interrelated files and a set of programs that allow users to access and modify these files. A major purpose of a database system is to provide users with an abstract view of the data. That is, the system hides certain details of how the data are stored and maintained.

Data Abstraction

For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to use complex data structures to represent data in the database. Since many database-systems users are not computer trained, developers hide the complexity from users through several levels of abstraction, to simplify users’ interactions with the system:

• Physical level. The lowest level of abstraction describes how the data are actually stored. The physical level describes complex low-level data structures in detail.

• Logical level. The next-higher level of abstraction describes what data are stored in the database, and what relationships exist among those data. The logical level thus describes the entire database in terms of a small number of relatively simple structures. Although implementation of the simple structures at the logical level may involve complex physical-level structures, the user of the logical level does not need to be aware of this complexity. Database administrators, who must decide what information to keep in the database, use the logical level of abstraction.

• View level. The highest level of abstraction describes only part of the entire database. Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system. The system may provide many views for the same database.

Instances and Schemas

Databases change over time as information is inserted and deleted. The collection of information stored in the database at a particular moment is called an instance of the database. The overall design of the database is called the database schema. Schemas are changed infrequently, if at all.

The concept of database schemas and instances can be understood by analogy to a program written in a programming language. A database schema corresponds to the variable declarations (along with associated type definitions) in a program. Each variable has a particular value at a given instant. The values of the variables in a program at a point in time correspond to an instance of a database schema.
Database systems have several schemas, partitioned according to the levels of abstraction. The physical schema describes the database design at the physical level, while the logical schema describes the database design at the logical level.A database may also have several schemas at the view level, sometimes called subschemas, that describe different views of the database. Application programs are said to exhibit physical data independence if they do not depend on the physical schema, and thus need not be rewritten if the physical schema changes.

SQL Overview

SQL Overview

SQL is a programming language for Relational Databases. It is designed over relational algebra and tuple relational calculus. SQL comes as a package with all major distributions of RDBMS.

SQL comprises both data definition and data manipulation languages. Using the data definition properties of SQL, one can design and modify database schema, whereas data manipulation properties allows SQL to store and retrieve data from database.

Data Definition Language

SQL uses the following set of commands to define database schema −

CREATE

Creates new databases, tables and views from RDBMS.

For example −

Create database tutorialspoint;
Create table article;
Create view for_students;
DROP

Drops commands, views, tables, and databases from RDBMS.

For example−

Drop object_type object_name;
Drop database tutorialspoint;
Drop table article;
Drop view for_students;
ALTER

Modifies database schema.

Alter object_type object_name parameters;
For example−

Alter table article add subject varchar;
This command adds an attribute in the relation article with the namesubject of string type.

Data Manipulation Language

SQL is equipped with data manipulation language (DML). DML modifies the database instance by inserting, updating and deleting its data. DML is responsible for all froms data modification in a database. SQL contains the following set of commands in its DML section −

SELECT/FROM/WHERE
INSERT INTO/VALUES
UPDATE/SET/WHERE
DELETE FROM/WHERE
These basic constructs allow database programmers and users to enter data and information into the database and retrieve efficiently using a number of filter options.

SELECT/FROM/WHERE

SELECT − This is one of the fundamental query command of SQL. It is similar to the projection operation of relational algebra. It selects the attributes based on the condition described by WHERE clause.

FROM − This clause takes a relation name as an argument from which attributes are to be selected/projected. In case more than one relation names are given, this clause corresponds to Cartesian product.

WHERE − This clause defines predicate or conditions, which must match in order to qualify the attributes to be projected.

For example −

Select author_name
From book_author
Where age > 50;
This command will yield the names of authors from the relationbook_author whose age is greater than 50.

INSERT INTO/VALUES

This command is used for inserting values into the rows of a table (relation).

Syntax−

INSERT INTO table (column1 [, column2, column3 ... ]) VALUES (value1 [, value2, value3 ... ])
Or

INSERT INTO table VALUES (value1, [value2, ... ])
For example −

INSERT INTO tutorialspoint (Author, Subject) VALUES ("anonymous", "computers");
UPDATE/SET/WHERE

This command is used for updating or modifying the values of columns in a table (relation).

Syntax −

UPDATE table_name SET column_name = value [, column_name = value ...] [WHERE condition]
For example −

UPDATE tutorialspoint SET Author="webmaster" WHERE Author="anonymous";
DELETE/FROM/WHERE

This command is used for removing one or more rows from a table (relation).

Syntax −

DELETE FROM table_name [WHERE condition];
For example −

DELETE FROM tutorialspoints
WHERE Author="unknown";

extended ER features

The basic E-R concepts can model most database features, some aspects of a database may be more aptly expressed by certain extensions to the basic E-R model. The extended E-R features are specialization, generalization, higher- and lower-level entity sets, attribute inheritance, and aggregation.

1. Specialization

An entity set may include subgroupings of entities that are distinct in some way from other entities in the set. For instance, a subset of entities within an entity set may have attributes that are not shared by all the entities in the entity set. The E-R model provides a means for representing these distinctive entity groupings.

Consider an entity set person, with attributes name, street, and city. A person may be further classified as one of the following:

• customer

• employee

Each of these person types is described by a set of attributes that includes all the attributes of entity set person plus possibly additional attributes. For example, customer entities may be described further by the attribute customer-id, whereas employee entities may be described further by the attributes employee-id and salary. The process of designating subgroupings within an entity set is called specialization. The specialization of person allows us to distinguish among persons according to whether they are employees or customers.

2.Generalization

The refinement from an initial entity set into successive levels of entity subgroupings represents a top-down design process in which distinctions are made explicit. The design process may also proceed in a bottom-up manner, in which multiple entity sets are synthesized into a higher-level entity set on the basis of common features. The database designer may have first identified a customer entity set with the attributes name, street, city, and customer-id, and an employee entity set with the attributes name, street, city, employee-id, and salary.

There are similarities between the customer entity set and the employee entity set in the sense that they have several attributes in common. This commonality can be expressed by generalization, which is a containment relationship that exists between a higher-level entity set and one or more lower-level entity sets. In our example, person is the higher-level entity set and customer and employee are lower-level entity sets. Higher- and lower-level entity sets also may be designated by the terms superclass and subclass, respectively. The person entity set is the superclass of the customer and employee subclasses.

For all practical purposes, generalization is a simple inversion of specialization.

3.Attribute Inheritance

A crucial property of the higher- and lower-level entities created by specialization and generalization is attribute inheritance. The attributes of the higher-level entity sets are said to be inherited by the lower-level entity sets. For example, customer and employee inherit the attributes of person. Thus, customer is described by its name, street, and city attributes, and additionally a customer-id attribute; employee is described by its name, street, and city attributes, and additionally employee-id and salary attributes.

A lower-level entity set (or subclass) also inherits participation in the relationship sets in which its higher-level entity (or superclass) participates. The officer, teller, and secretary entity sets can participate in the works-for relationship set, since the superclass employee participates in the works-for relationship. Attribute inheritance applies through all tiers of lower-level entity sets. The above entity sets can participate in any relationships in which the person entity set participates. Whether a given portion of an E-R model was arrived at by specialization or generalization,the outcome is basically the same:

• A higher-level entity set with attributes and relationships that apply to all of its lower-level entity sets

• Lower-level entity sets with distinctive features that

hierarchical data model

In the hierarchical data model, information is organized as a collection of inverted trees of records. The inverted trees may be of arbitrary depth. The record at the root of a tree has zero or more child records; the child records, in turn, serve as parent records for their immediate descendants. This parent-child relationship recursively continues down the tree.

The records consists of fields, where each field may contain simple data values (e.g. integer, real, text), or a pointer to a record. The pointer graph is not allowed to contain cycles. Some combinations of fields may form the key for a record relative to its parent. Only a few hierarchical DBMSs support null values or variable-length fields.

Applications can navigate a hierarchical database by starting at a root and successively navigate downward from parent to children until the desired record is found. Applications can interleave parent-child navigation with traversal of pointers. Searching down a hierarchical tree is very fast since the storage layer for hierarchical databases use contiguous storage for hierarchical structures. All other types of queries require sequential search techniques.

A DDL for hierarchical data model must allow the definition of record types, fields types, pointers, and parent-child relationships. And the DML must support direct navigation using the parent-child relationships and through pointers. Programs therefore navigate very close to the physical data structure level, implying that the hierarchical data model offers only very limited data independence.

The hierarchical data model is impoverished for expressing complex information models. Often a natural hierarchy does not exist and it is awkward to impose a parent-child relationship. Pointers partially compensate for this weakness, but it is still difficult to specify suitable hierarchical schemas for large models.

Database Relations

Database Relations

1. Relation schema

Named relation defined by a set of attribute and domain name pairs.

2. Relational database schema

Set of relation schemas, each with a distinct name.

Properties of Relations

Relation name is distinct from all other relation names in relational schema.
Each cell of relation contains exactly one atomic (single) value.
Each attribute has a distinct name.
Values of an attribute are all from the same domain.
Each tuple is distinct; there are no duplicate tuples.
Order of attributes has no significance.
Order of tuples has no significance, theoretically.
Relational Keys

Superkey

An attribute, or a set of attributes, that uniquely identifies a tuple within a relation.

Candidate Key

Superkey (K) such that no proper subset is a superkey within the relation.
In each tuple of R, values of K uniquely identify that tuple (uniqueness).
No proper subset of K has the uniqueness property (irreducibility).

Primary Key

Candidate key selected to identify tuples uniquely within relation.

Alternate Keys

Candidate keys that are not selected to be primary key.

Foreign Key

Attribute, or set of attributes, within one relation that matches candidate key of some (possibly same) relation.

Relational Integrity

Null

Represents value for an attribute that is currently unknown or not applicable for tuple.
Deals with incomplete or exceptional data.
Represents the absence of a value and is not the same as zero or spaces, which are values.

Entity Integrity

In a base relation, no attribute of a primary key can be null.

Referential Integrity

If foreign key exists in a relation, either foreign key value must match a candidate key value of some tuple in its home relation or foreign key value must be wholly null.

Enterprise Constraints

Additional rules specified by users or database administrators.

ER Diagram Representation

ER Diagram Representation

Let us now learn how the ER Model is represented by means of an ER diagram. Any object, for example, entities, attributes of an entity, relationship sets, and attributes of relationship sets, can be represented with the help of an ER diagram.

Entity

Entities are represented by means of rectangles. Rectangles are named with the entity set they represent.

Entities in a school database

Attributes

Attributes are the properties of entities. Attributes are represented by means of ellipses. Every ellipse represents one attribute and is directly connected to its entity (rectangle).

Simple Attributes

If the attributes are composite, they are further divided in a tree like structure. Every node is then connected to its attribute. That is, composite attributes are represented by ellipses that are connected with an ellipse.

Composite Attributes

Multivalued attributes are depicted by double ellipse.

Multivalued Attributes

Derived attributes are depicted by dashed ellipse.

Derived Attributes

Relationship

Relationships are represented by diamond-shaped box. Name of the relationship is written inside the diamond-box. All the entities (rectangles) participating in a relationship, are connected to it by a line.

Binary Relationship and Cardinality

A relationship where two entities are participating is called a binary relationship. Cardinality is the number of instance of an entity from a relation that can be associated with the relation.

One-to-one − When only one instance of an entity is associated with the relationship, it is marked as '1:1'. The following image reflects that only one instance of each entity should be associated with the relationship. It depicts one-to-one relationship.

One-to-one

One-to-many − When more than one instance of an entity is associated with a relationship, it is marked as '1:N'. The following image reflects that only one instance of entity on the left and more than one instance of an entity on the right can be associated with the relationship. It depicts one-to-many relationship.

One-to-many

Many-to-one − When more than one instance of entity is associated with the relationship, it is marked as 'N:1'. The following image reflects that more than one instance of an entity on the left and only one instance of an entity on the right can be associated with the relationship. It depicts many-to-one relationship.

Many-to-one

Many-to-many − The following image reflects that more than one instance of an entity on the left and more than one instance of an entity on the right can be associated with the relationship. It depicts many-to-many relationship.

Many-to-many

Participation Constraints

Total Participation − Each entity is involved in the relationship. Total participation is represented by double lines.

Partial participation − Not all entities are involved in the relationship. Partial participation is represented by single lines.

Participation Constraints

ER Model - Basic Concepts

ER Model - Basic Concepts

The ER model defines the conceptual view of a database. It works around real-world entities and the associations among them. At view level, the ER model is considered a good option for designing databases.

Entity

An entity can be a real-world object, either animate or inanimate, that can be easily identifiable. For example, in a school database, students, teachers, classes, and courses offered can be considered as entities. All these entities have some attributes or properties that give them their identity.

An entity set is a collection of similar types of entities. An entity set may contain entities with attribute sharing similar values. For example, a Students set may contain all the students of a school; likewise a Teachers set may contain all the teachers of a school from all faculties. Entity sets need not be disjoint.

Attributes

Entities are represented by means of their properties, called attributes. All attributes have values. For example, a student entity may have name, class, and age as attributes.

There exists a domain or range of values that can be assigned to attributes. For example, a student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be negative, etc.

Types of Attributes

Simple attribute − Simple attributes are atomic values, which cannot be divided further. For example, a student's phone number is an atomic value of 10 digits.

Composite attribute − Composite attributes are made of more than one simple attribute. For example, a student's complete name may have first_name and last_name.

Derived attribute − Derived attributes are the attributes that do not exist in the physical database, but their values are derived from other attributes present in the database. For example, average_salary in a department should not be saved directly in the database, instead it can be derived. For another example, age can be derived from data_of_birth.

Single-value attribute − Single-value attributes contain single value. For example − Social_Security_Number.

Multi-value attribute − Multi-value attributes may contain more than one values. For example, a person can have more than one phone number, email_address, etc.

These attribute types can come together in a way like −

simple single-valued attributes
simple multi-valued attributes
composite single-valued attributes
composite multi-valued attributes
Entity-Set and Keys

Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.

For example, the roll_number of a student makes him/her identifiable among students.

Super Key − A set of attributes (one or more) that collectively identifies an entity in an entity set.

Candidate Key − A minimal super key is called a candidate key. An entity set may have more than one candidate key.

Primary Key − A primary key is one of the candidate keys chosen by the database designer to uniquely identify the entity set.

Relationship

The association among entities is called a relationship. For example, an employee works_at a department, a student enrolls in a course. Here, Works_at and Enrolls are called relationships.

Relationship Set

A set of relationships of similar type is called a relationship set. Like entities, a relationship too can have attributes. These attributes are called descriptive attributes.

Degree of Relationship

The number of participating entities in a relationship defines the degree of the relationship.

Binary = degree 2
Ternary = degree 3
n-ary = degree

Mapping Cardinalities

Cardinality defines the number of entities in one entity set, which can be associated with the number of entities of other set via relationship set.

One-to-one − One entity from entity set A can be associated with at most one entity of entity set B and vice versa.

One-to-one relation

One-to-many − One entity from entity set A can be associated with more than one entities of entity set B however an entity from entity set B, ca

DBMS - Data Independence

DBMS - Data Independence

If a database system is not multi-layered, then it becomes difficult to make any changes in the database system. Database systems are designed in multi-layers as we learnt earlier.

Data Independence

A database system normally contains a lot of data in addition to users’ data. For example, it stores data about data, known as metadata, to locate and retrieve data easily. It is rather difficult to modify or update a set of metadata once it is stored in the database. But as a DBMS expands, it needs to change over time to satisfy the requirements of the users. If the entire data is dependent, it would become a tedious and highly complex job.

Data independence

Metadata itself follows a layered architecture, so that when we change data at one layer, it does not affect the data at another level. This data is independent but mapped to each other.

Logical Data Independence

Logical data is data about database, that is, it stores information about how data is managed inside. For example, a table (relation) stored in the database and all its constraints, applied on that relation.

Logical data independence is a kind of mechanism, which liberalizes itself from actual data stored on the disk. If we do some changes on table format, it should not change the data residing on the disk.

Physical Data Independence

All the schemas are logical, and the actual data is stored in bit format on the disk. Physical data independence is the power to change the physical data without impacting the schema or logical data.

For example, in case we want to change or upgrade the storage system itself − suppose we want to replace hard-disks with SSD − it should not have any impact on the logical data or schemas.

DBMS - Data Schemas

DBMS - Data Schemas

Database Schema

A database schema is the skeleton structure that represents the logical view of the entire database. It defines how the data is organized and how the relations among them are associated. It formulates all the constraints that are to be applied on the data.

A database schema defines its entities and the relationship among them. It contains a descriptive detail of the database, which can be depicted by means of schema diagrams. It’s the database designers who design the schema to help programmers understand the database and make it useful.

A database schema can be divided broadly into two categories −

Physical Database Schema − This schema pertains to the actual storage of data and its form of storage like files, indices, etc. It defines how the data will be stored in a secondary storage.

Logical Database Schema − This schema defines all the logical constraints that need to be applied on the data stored. It defines tables, views, and integrity constraints.

Database Instance

It is important that we distinguish these two terms individually. Database schema is the skeleton of database. It is designed when the database doesn't exist at all. Once the database is operational, it is very difficult to make any changes to it. A database schema does not contain any data or information.

A database instance is a state of operational database with data at any given time. It contains a snapshot of the database. Database instances tend to change with time. A DBMS ensures that its every instance (state) is in a valid state, by diligently following all the validations, constraints, and conditions that the database designers have imposed.

DBMS - Data Models

DBMS - Data Models

Data models define how the logical structure of a database is modeled. Data Models are fundamental entities to introduce abstraction in a DBMS. Data models define how data is connected to each other and how they are processed and stored inside the system.

The very first data model could be flat data-models, where all the data used are to be kept in the same plane. Earlier data models were not so scientific, hence they were prone to introduce lots of duplication and update anomalies.

Entity-Relationship Model

Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships among them. While formulating real-world scenario into the database model, the ER Model creates entity set, relationship set, general attributes and constraints.

ER Model is best used for the conceptual design of a database.

ER Model is based on −

Entities and their attributes.

Relationships among entities.

These concepts are explained below.

Entity − An entity in an ER Model is a real-world entity having properties called attributes. Every attribute is defined by its set of values called domain. For example, in a school database, a student is considered as an entity. Student has various attributes like name, age, class, etc.

Relationship − The logical association among entities is calledrelationship. Relationships are mapped with entities in various ways. Mapping cardinalities define the number of association between two entities.

Mapping cardinalities −

one to one
one to many
many to one
many to many
Relational Model

The most popular data model in DBMS is the Relational Model. It is more scientific a model than others. This model is based on first-order predicate logic and defines a table as an n-ary relation.

Relational Model Table

The main highlights of this model are −

Data is stored in tables called relations.
Relations can be normalized.
In normalized relations, values saved are atomic values.
Each row in a relation contains a unique value.
Each column in a relation contains values from a same domain.

DBMS

Data bases and database systems have become an essential component of everyday life in modern society.

Examples for database Applications:

Purchases from the supermarket
Purchases using your credit card
Booking a holiday at the travel agents
Using the local library
Taking out insurance
Using the Internet
Studying at university
Need to store data :

Data originates at one time and used later(i.e.) Store registrations for grading later, Store for future information needs, Governmental regulations requires access to past data, Data used later for auditing, evaluation purpose, Used more than once : save for future use.

Limitations of manual methods:

Problems of speed, Problems of accuracy, Problems of consistency and reliability, Problems of poor response time, Problems of work-load handling capability, Problems of meeting ad hoc information needs, Problems of cost, Problems due to human frailties: (misplaced) loyalty, inconsistency, irregularity, difficulties in handling big tasks .

Why computerized data processing? Advantage of speed, Advantage of accuracy, Advantage of reliability and consistency, Advantage of storage and retrieval efficiency, Advantage of on-line-access to meet ad-hoc needs, Advantage of cost

Database: Collection of related data. By data, we mean known facts that can be recorded and that have implicit meaning.

Definition of DBMS: A data base management system(DBMS) is a collection of programs that enables users to create and maintain a database. The DBMS is hence a general purpose software system that facilitate the process of defining, constructing , manipulating and sharing databases among the various users and applications.

ICT

Translate