Chapter – 3 Introduction to Data Base Management System DEPT. OF INFORMATION

Chapter – 3
Introduction to Data Base
Management System
Data Processing System
• A data processing system takes raw data
and, through the power of computer
automation, produces information that a
set of program applications has validated.
Information includes text, arithmetic
calculations, formulas and various other
types of information and data based on
the computer system. A data processing
system is also called an automated data
processing (ADP) unit or an electronic data
processing (EDP) unit.
Serial processing is a system
in which only one step happens at a time (and so the steps go in a series).
Batch processing is used when there is a lot of transactions affecting a high percentage of master
file records and the response needed is not immediate, usually until the end of the week or
month. A good example of this in a large, national business would be payroll processing, where
nearly every master file record will be affected. The data is collected over a period of time, then
input and verified by clerks (verified means input by someone else and then both inputs are
compared by computer) and processed centrally.
The transactions are entered in batches by keyboard and stored in transaction files. These
batches consist of thirty or so records, which are given a batch, control ID. The batches are then
run through a validation process and to make sure the batches balance a computed total is
compared with a manually produced total. This helps to ensure that all data is entered without
error or omission. The actual updating of master files only takes place after verification and
validation are complete. This means batch processing is often run overnight, unattended. A new
master file is produced as a result of a batch-processing run. The original master file is kept along
with a previous version.
After processing the output is produced, and is usually printed media such as payslips or invoices,
although this is changing with the advent of the web.
• Real-time processing. The waiting time from input to response is minimum.
Unreasonable However such fast systems are used in critical systems that control
aircraft or the manufacture of sensitive or dangerous compounds.
Online processing means users directly enter information online (usually, online, in
this case, means online to a central processor, rather than its modern connotation of
the Internet, but it could mean both!), it is validated and updated directly onto the
master file. No new file is created in this case. Therefore, there is near immediate
input process, and output. Imagine a cash dispenser transaction or booking a holiday
at a travel agents or over the Internet. Compared with batch processing the number
of transactions will be few.
centralized processing is processing performed in one computer or in a cluster of
coupled computers in a single location. Access to the computer is via "dumb
terminals," which send only input and receive output or "smart terminals," which add
screen formatting. All data processing is performed in the central computer.
• Distributed Processing The distribution of applications
and business logic across multiple processing platforms.
Distributed processing implies that processing will
occur on more than one processor in order for a
transaction to be completed. In other words, processing
is distributed across two or more machines and the
processes are most likely not running at the same time,
i.e. each process performs part of an application in a
sequence. Often the data used in a distributed
processing environment is also distributed across
Advantages of DBMS (Database
Management Systems) are as follows:
• A true DBMS offers several advantages over file processing. The principal advantages
of a DBMS are the followings:
• Flexibility: Because programs and data are independent, programs do not have to
be modified when types of unrelated data are added to or deleted from the database,
or when physical storage changes.
• Fast response to information requests: Because data are integrated into a single
database, complex requests can be handled much more rapidly then if the data were
located in separate, non-integrated files. In many businesses, faster response means
better customer service.
• Multiple access: Database software allows data to be accessed in a variety of ways
(such as through various key fields) and often, by using several programming
languages (both 3GL and nonprocedural 4GL programs).
• Lower user training costs: Users often find it easier to learn such systems and
training costs may be reduced. Also, the total time taken to process requests may be
shorter, which would increase user productivity.
• Less storage: Theoretically, all occurrences of data items need be stored only once,
thereby eliminating the storage of redundant data. System developers and database
designers often use data normalization to minimize data redundancy.
• Concept: - File organization is the methodology which is applied primarily to
the logical arrangement of data (which can itself be organized in a system
of records with correlation between the fields/columns) in a file system. It
should not be confused with the physical storage of the file in some types
of storage media. There are certain basic types of computer file, which can
include files stored as blocks of data and streams of data, where the
information streams out of the file while it is being read until the end of the
file is encountered.
We will look at two components of file organization here:
The way the internal file structure is arranged and
The external file as it is presented to the O/S or program that calls it.
Files are presented to the application as a stream of bytes and then an EOF
(end of file) condition. A program that uses a file needs to know the
structure of the file and needs to interpret its contents. There are four
methods of organizing files. They are sequential, relative, indexedsequential, and direct or hashed access organization.
Sequential Organization
• A sequential file contains records organized in the order they were
entered. The order of the records is fixed. The records are stored
and sorted in physical, contiguous blocks within each block the
records are in sequence.
• Records in these files can only be read or written sequentially.
• Once stored in the file, the record cannot be made shorter, or
longer, or deleted. However, the record can be updated if the length
does not change. (This is done by replacing the records by creating
a new file.) New records will always appear at the end of the file.
• If the order of the records in a file is not important, sequential
organization will suffice, no matter how many records you may
have. Sequential output is also useful for report printing or
sequential reads which some programs prefer to do.
Relative Organization
• A relative record file contains records ordered by their relative key, that is, the record
number that represents the record location relative to where the file begins. For
example, the first record in the file has a relative record number of 1, the tenth
record has a relative record number of 10, and so forth. The records can have fixed
length or variable length.
The record transmission modes allowed for relative files are sequential, random, or
dynamic. When relative files are read or written sequentially, the sequence is that of
the relative record number.
In this file organization, the records of the file are stored one after another both
physically and logically. That is, record with sequence number 16 is located just after
the 15th record.
Quite easy to process,
If you can know the key value of the record that you need to find, there is no need
for a search and you can access the record almost instantaneously,
Can be only used in conjunction with consecutive numerical keys. This disadvantage
(only numerical and consecutive values for the key value) is overcome with a
completely different file structure, namely the INDEXED SEQUENTIAL FILE.
• Key searches are improved by this system too. The single-level indexing
structure is the simplest one where a file, whose records are pairs, contains
a key pointer. This pointer is the position in the data file of the record with
the given key. A subset of the records, which are evenly spaced along the
data file, is indexed, in order to mark intervals of data records.
This is how a key search is performed: the search key is compared with the
index keys to find the highest index key coming in front of the search key,
while a linear search is performed from the record that the index key points
to, until the search key is matched or until the record pointed to by the next
index entry is reached. Regardless of double file access (index + data)
required by this sort of search, the access time reduction is significant
compared with sequential file searches.
Primary Area:-Contains file records stored by key or ID numbers.
Overflow Area:-Contains records area that cannot be placed in primary
Index Area:-It contains keys of records and there locations on the disc.
Direct or Hashed Access
• With direct or hashed access a portion of disk space
is reserved and a “hashing” algorithm computes the
record address. So there is additional space required for
this kind of file in the store. Records are placed randomly
throughout the file. Records are accessed by addresses
that specify their disc location. Also, this type of file
organization requires a disk storage rather than tape. It
has an excellent search retrieval performance, but care
must be taken to maintain the indexes. If the indexes
become corrupt, what is left may as well go to the bitbucket, so it is as well to have regular backups of this
kind of file just as it is for all stored valuable data!
Hierarchical Model
The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and
child data segments. This structure implies that a record can have repeating information,
generally in the child data segments. Data in a series of records, which have a set of field values
attached to it. It collects all the instances of a specific record together as a record type. These
record types are the equivalent of tables in the relational model, and with the individual records
being the equivalent of rows. To create links between these record types, the hierarchical model
uses Parent Child Relationships. These are a 1:N mapping between record types. This is done by
using trees, like set theory used in the relational model, "borrowed" from maths.
For example, an organization might store information about an employee, such as name,
employee number, department, salary. The organization might also store information about an
employee's children, such as name and date of birth. The employee and children data forms a
hierarchy, where the employee data represents the parent segment and the children data
represents the child segment. If an employee has three children, then there would be three child
segments associated with one employee segment. In a hierarchical database the parent-child
relationship is one to many. This restricts a child segment to having only one parent segment.
Hierarchical DBMSs were popular from the late 1960s, with the introduction of IBM's Information
Management System (IMS) DBMS, through the 1970s.
Network Model
• The basic data modeling construct in the network model is the set
construct. A set consists of an owner record type, a set name, and a
member record type. A member record type can have that role in
more than one set, hence the multiparent concept is supported. An
owner record type can also be a member or owner in another set.
The data model is a simple network, and link and intersection record
types (called junction records by IDMS) may exist, as well as sets
between them . Thus, the complete network of relationships is
represented by several pairwise sets; in each set some (one) record
type is owner (at the tail of the network arrow) and one or more
record types are members (at the head of the relationship arrow).
Usually, a set defines a 1:M relationship, although 1:1 is permitted.
Relational Model
(RDBMS - relational database management system) A database based on the relational model developed by E.F.
Codd. A relational database allows the definition of data structures, storage and retrieval operations and integrity
constraints. In such a database the data and relations between them are organised in tables. A table is a
collection of records and each record in a table contains the same fields.
Properties of Relational Tables:
· Values Are Atomic
· Each Row is Unique
· Column Values Are of the Same Kind
· The Sequence of Columns is Insignificant
· The Sequence of Rows is Insignificant
· Each Column Has a Unique Name
Certain fields may be designated as keys, which means that searches for specific values of that field will use
indexing to speed them up. Where fields in two different tables take values from the same set, a join operation
can be performed to select related records in the two tables by matching values in those fields. Often, but not
always, the fields will have the same name in both tables. For example, an "orders" table might contain
(customer-ID, product-code) pairs and a "products" table might contain (product-code, price) pairs so to calculate
a given customer's bill you would sum the prices of all products ordered by that customer by joining on the
product-code fields of the two tables. This can be extended to joining multiple tables on multiple fields. Because
these relationships are only specified at retreival time, relational databases are classed as dynamic database
management system. The RELATIONAL database model is based on the Relational Algebra.
Object/Relational Model
• Object/relational database management systems (ORDBMSs) add new
object storage capabilities to the relational systems at the core of modern
information systems. These new facilities integrate management of
traditional fielded data, complex objects such as time-series and geospatial
data and diverse binary media such as audio, video, images, and applets.
By encapsulating methods with data structures, an ORDBMS server can
execute complex analytical and data15 manipulation operations to search
and transform multimedia and other complex objects.
As an evolutionary technology, the object/relational (OR) approach has
inherited the robust transaction- and performance-management features of
it s relational ancestor and the flexibility of its object-oriented cousin.
Database designers can work with familiar tabular structures and data
definition languages (DDLs) while assimilating new object-management
possibi lities. Query and procedural languages and call interfaces in
ORDBMSs are familiar: SQL3, vendor procedural languages, and ODBC,
JDBC, and proprie tary call interfaces are all extensions of RDBMS
languages and interfaces. And the leading vendors are, of course, quite well
known: IBM, Inform ix, and Oracle.