Sunday, September 19, 2010

E-R Diagram


What are Entity Relationship Diagrams?

Entity Relationship Diagrams (ERDs) illustrate the logical structure of databases.
An ER Diagram
An ER Diagram

Entity Relationship Diagram Notations


Peter Chen developed ERDs in 1976. Since then Charles Bachman and James Martin have added some sligh refinements to the basic ERD principles.
Entity
An entity is an object or concept about which you want to store information.
Learn how to edit text on an entity.
Entity
Weak Entity
A weak entity is an entity that must defined by a foreign key relationship with another entity as it cannot be uniquely identified by its own attributes alone.
Learn how to edit text on this object.
Weak Entity
Key attribute
A key attribute is the unique, distinguishing characteristic of the entity. For example, an employee's social security number might be the employee's key attribute.
Key attribute
Multivalued attribute
A multivalued attribute can have more than one value. For example, an employee entity can have multiple skill values.
Multivalued attribute
Derived attribute
A derived attribute is based on another attribute. For example, an employee's monthly salary is based on the employee's annual salary.
Derived attribute
Relationships
Relationships illustrate how two entities share information in the database structure.
Learn how to draw relationships:
First, connect the two entities, then drop the relationship notation on the line.
Relationships
Cardinality
Cardinality specifies how many instances of an entity relate to one instance of another entity.
Ordinality is also closely linked to cardinality. While cardinality specifies the occurences of a relationship, ordinality describes the relationship as either mandatory or optional. In other words, cardinality specifies the maximum number of relationships and ordinality specifies the absolute minimum number of relationships.
Cardinality
Click here for more cardinality notations
Recursive relationship
In some cases, entities can be self-linked. For example, employees can supervise other employees.
Recursive relationship


Drawing ER Diagrams in SmartDraw

You can draw E-R diagrams in SmartDraw easily and quickly.

ERD Library
Use the SmartDraw explorer to find the ERD library located under Software Design and Software.
The SmartDraw ERD library
The SmartDraw ERD library
To open the library, double-click on it or drag it to the drawing area.

Adding an ERD Symbol to Your Page
To add an ERD symbol to your page, click on it and without releasing the mouse, drag it to the page.
Dragging an ERD symbol to the page
Dragging an ERD symbol to the page
Connecting Entities
To connect two entities, select one of the line tools on the Toolbar by clicking on it once. Now, your cursor should look like a pencil. Touch the pencil to the edge of the first shape, click down, and without releasing the mouse, stretch the line to the edge of the next shape until you see the anchor symbol.
Drawing the line between two entities
Drawing the line between two entities
Connecting two symbols
Connecting two symbols

If you don't see the anchor symbol, go to the Arrange menu and make sure Allow Lines to Link is turned on.

http://www.smartdraw.com/downloads

For Watch Video

http://www.smartdraw.com/specials/sd/watch-video.htm?id=35545&gclid=CLr24brBk6QCFdRA6wodwhnhHg



Tips for Effective ER Diagrams

1) Make sure that each entity only appears once per diagram.
2) Name every entity, relationship, and attribute on your diagram.
3) Examine relationships between entities closely. Are they necessary? Are there any relationships missing? Eliminate any redundant relationships. Don't connect relationships to each other.
4) Use colors to highlight important portions of your diagram.
Using colors can help you highlight important features in your diagram
Using colors can help you highlight important features in your diagram 5) SmartDraw makes it easy to share your software design diagram with others in a business presentation or on the web.
  • Export as GIF or JPEG for the web
  • Publish to your free SmartDrawNet web space with just one click. Any hyperlinks
  • in your drawings become working hyperlinks on the web—and interlinked pages
  • become interlinked web sites!
  • Export as HTML with working hyperlinks
  • Copy & Paste into PowerPoint® or other Microsoft Office® Suite programs.
When you paste your diagram into another program (such as PowerPoint®), you can edit it by simply double-clicking on the diagram in the other program. This opens the diagram in SmartDraw for editing, and your changes will be updated in the other program automatically.
6) Create a polished diagram by adding shadows and color. You can choose from a number of ready-made styles in the Edit menu under Colors and Shadows, or you can create your own.
An ER diagram created using the Blues style in SmartDraw
To hide the shadow on an individual object, go to the Arrange menu and select Hide Shadow.

Saturday, September 18, 2010

Database Definition


database consists of an organized collection of data for one or more uses, typically in digital form. One way of classifying databases involves the type of their contents, for example: bibliographic, document-text, statistical. Digital databases are managed using database management systems, which store database contents, allowing data creation and maintenance, and search and other access.

Database architecture consists of three levels, externalconceptual and internal. Clearly separating the three levels was a major feature of the relational database model that dominates 21st century databases.[1]
The external level defines how users understand the organization of the data. A single database can have any number of views at the external level. The internal level defines how the data is physically stored and processed by the computing system. Internal architecture is concerned with cost, performance, scalability and other operational matters. The conceptual is a level of indirection between internal and external. It provides a common view of the database that is uncomplicated by details of how the data is stored or managed, and that can unify the various external views into a coherent whole.[1]

A database management system (DBMS) consists of software that operates databases, providing storage, access, security, backup and other facilities. Database management systems can be categorized according to the database model that they support, such as relational or XML, the type(s) of computer they support, such as a server cluster or a mobile phone, thequery language(s) that access the database, such as SQL or XQuery, performance trade-offs, such as maximum scale or maximum speed or others. Some DBMS cover more than one entry in these categories, e.g., supporting multiple query languages.

Architecture

Database architecture consists of three levels, externalconceptual and internal. Clearly separating the three levels was a major feature of the relational database model that dominates 21st century databases.[1]
The external level defines how users understand the organization of the data. A single database can have any number of views at the external level. The internal level defines how the data is physically stored and processed by the computing system. Internal architecture is concerned with cost, performance, scalability and other operational matters. The conceptual is a level of indirection between internal and external. It provides a common view of the database that is uncomplicated by details of how the data is stored or managed, and that can unify the various external views into a coherent whole.[1]

[edit]Database management systems

A database management system (DBMS) consists of software that operates databases, providing storage, access, security, backup and other facilities. Database management systems can be categorized according to the database model that they support, such as relational or XML, the type(s) of computer they support, such as a server cluster or a mobile phone, thequery language(s) that access the database, such as SQL or XQuery, performance trade-offs, such as maximum scale or maximum speed or others. Some DBMS cover more than one entry in these categories, e.g., supporting multiple query languages.Examples of some commonly used DBMS are MySQL, PostgreSQL, Microsoft Access, SQL Server, FileMaker,Oracle, RDBMS, dBASE, Clipper,FoxPro,etc. Almost every database software comes with an Open Database Connectivity (ODBC) driver that allows the database to integrate with other databases.

[edit]Components of DBMS

Most DBMS as of 2009 implement a relational model.[2] Other DBMS systems, such as Object DBMS, offer specific features for more specialized requirements. Their components are similar, but not identical.

[edit]RDBMS components

  • Sublanguages— Relational DBMS (RDBMS) include Data Definition Language (DDL) for defining the structure of the database, Data Control Language (DCL) for defining security/access controls, and Data Manipulation Language (DML) for querying and updating data.
  • Interface drivers—These drivers are code libraries that provide methods to prepare statements, execute statements, fetch results, etc. Examples include ODBCJDBC,MySQL/PHPFireBird/Python.
  • SQL engine—This component interprets and executes the DDLDCL, and DML statements. It includes three major components (compiler, optimizer, and executor).
  • Transaction engine—Ensures that multiple SQL statements either succeed or fail as a group, according to application dictates.
  • Relational engine—Relational objects such as Table, Index, and Referential integrity constraints are implemented in this component.
  • Storage engine—This component stores and retrieves data from secondary storage, as well as managing transaction commit and rollback, backup and recovery, etc.

[edit]ODBMS components

Object DBMS (ODBMS) has transaction and storage components that are analogous to those in an RDBMS. Some ODBMS handle DDL, DCL and update tasks differently. Instead of using sublanguages, they provide APIs for these purposes. They typically include a sublanguage and accompanying engine for processing queries with interpretive statements analogous to but not the same as SQL. Example object query languages are OQLLINQJDOQLJPAQL and others. The query engine returns collections of objects instead of relational rows.

[edit]Types

[edit]Operational database

These databases store detailed data about the operations of an organization. They are typically organized by subject matter, process relatively high volumes of updates usingtransactions. Essentially every major organization on earth uses such databases. Examples include customer databases that record contact, credit, and demographic information about a business' customers, personnel databases that hold information such as salary, benefits, skills data about employees, manufacturing databases that record details about product components, parts inventory, and financial databases that keep track of the organization's money, accounting and financial dealings.

[edit]Data warehouse

Data warehouses archive modern data from operational databases and often from external sources such as market research firms. Often operational data undergoes transformation on its way into the warehouse, getting summarized, anonymized, reclassified, etc. The warehouse becomes the central source of data for use by managers and other end-users who may not have access to operational data. For example, sales data might be aggregated to weekly totals and converted from internal product codes to use UPC codes so that it can be compared with ACNielsen data.Some basic and essential components of data warehousing include retrieving and analyzing data, transforming,loading and managing data so as to make it available for further use.

[edit]Analytical database

Analysts may do their work directly against a data warehouse, or create a separate analytic database for Online Analytical Processing. For example, a company might extract sales records for analyzing the effectiveness of advertising and other sales promotions at an aggregate level.

[edit]Distributed database

These are databases of local work-groups and departments at regional offices, branch offices, manufacturing plants and other work sites. These databases can include segments of both common operational and common user databases, as well as data generated and used only at a user’s own site.

[edit]End-user database

These databases consist of data developed by individual end-users. Examples of these are collections of documents in spreadsheets, word processing and downloaded files, or even managing their personal baseball card collection.

[edit]External database

These databases contain data collected for use across multiple organizations, either freely or via subscription. The Internet Movie Database is one example.

[edit]Hypermedia databases

The Worldwide web can be thought of as a database, albeit one spread across millions of independent computing systems. Web browsers "process" this data one page at a time, whileweb crawlers and other software provide the equivalent of database indexes to support search and other activities.

[edit]Models

[edit]Post-relational database models

Products offering a more general data model than the relational model are sometimes classified as post-relational.[3] Alternate terms include "hybrid database", "Object-enhanced RDBMS" and others. The data model in such products incorporates relations but is not constrained by E.F. Codd's Information Principle, which requires that
all information in the database must be cast explicitly in terms of values in relations and in no other way[4]
Some of these extensions to the relational model integrate concepts from technologies that pre-date the relational model. For example, they allow representation of a directed graph withtrees on the nodes.
Some post-relational products extend relational systems with non-relational features. Others arrived in much the same place by adding relational features to pre-relational systems. Paradoxically, this allows products that are historically pre-relational, such as PICK and MUMPS, to make a plausible claim to be post-relational.

[edit]Object database models

In recent years, the object-oriented paradigm has been applied in areas such as engineering and spatial databases, telecommunications and in various scientific domains. The conglomeration of object oriented programming and database technology led to this new kind of database. These databases attempt to bring the database world and the application-programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.
A variety of these ways have been tried[by whom?] for storing objects in a database. Some products have approached the problem from the application-programming side, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not provide language-level functionality for finding objects based on their information content. Others[which?] have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

[edit]Storage structures

Databases may store relational tables/indexes in memory or on hard disk in one of many forms:
The most commonly used[citation needed] are B+ trees and ISAM.
Object databases use a range of storage mechanisms. Some use virtual memory-mapped files to make the native language (C++Java etc.) objects persistent. This can be highly efficient but it can make multi-language access more difficult. Others disassemble objects into fixed- and varying-length components that are then clustered in fixed sized blocks on disk and reassembled into the appropriate format on either the client or server address space. Another popular technique involves storing the objects in tuples (much like a relational database) which the database server then reassembles into objects for the client.[citation needed]
Other techniques include clustering by category (such as grouping data by month, or location), storing pre-computed query results, known as materialized views, partitioning data by range (e.g., a data range) or by hash.
Memory management and storage topology can be important design choices for database designers as well. Just as normalization is used to reduce storage requirements and improve database designs, conversely denormalization is often used to reduce join complexity and reduce query execution time.[5]

[edit]Indexing

Indexing is a technique for improving database performance. The many types of index share the common property that they eliminate the need to examine every entry when running a query. In large databases, this can reduce query time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a binary search with an adjacent reference to the location of the entry, analogous to the index in the back of a book. The same data can have multiple indexes (an employee database could be indexed by last name and hire date.)
Indexes affect performance, but not results. Database designers can add or remove indexes without changing application logic, reducing maintenance costs as the database grows and database usage evolves.
Given a particular query, the DBMS' query optimizer is responsible for devising the most efficient strategy for finding matching data. The optimizer decides which index or indexes to use, how to combine data from different parts of the database, how to provide data in the order requested, etc.
Indexes can speed up data access, but they consume space in the database, and must be updated each time the data are altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine whether a given index is worth the cost.

[edit]Transactions

Most DBMS provide some form of support for transactions, which allow multiple data items to be updated in a consistent fashion, such that updates that are part of a transaction succeed or fail in unison. The so-called ACID rules, summarized here, characterize this behavior:
  • Atomicity: Either all the data changes in a transaction must happen, or none of them. The transaction must be completed, or else it must be undone (rolled back).
  • Consistency: Every transaction must preserve the declared consistency rules for the database.
  • Isolation: Two concurrent transactions cannot interfere with one another. Intermediate results within one transaction must remain invisible to other transactions. The most extreme form of isolation is serializability, meaning that transactions that take place concurrently could instead be performed in some series, without affecting the ultimate result.
  • Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) DBMS restarts.
In practice, many DBMSs allow the selective relaxation of these rules to balance perfect behavior with optimum performance.

[edit]Replication

Database replication involves maintaining multiple copies of a database on different computers, to allow more users to access it, or to allow a secondary site to immediately take over if the primary site stops working. Some DBMS piggyback replication on top of their transaction logging facility, applying the primary's log to the secondary in near real-time. Database clustering is a related concept for handling larger databases and user communities by employing a cluster of multiple computers to host a single database that can use replication as part of its approach.[6][7]

[edit]Security

Database security denotes the system, processes, and procedures that protect a database from unauthorized activity.
DBMSs usually enforce security through access controlauditing, and encryption:
  • Access control manages who can connect to the database via authentication and what they can do via authorization.
  • Auditing records information about database activity: who, what, when, and possibly where.
  • Encryption protects data at the lowest possible level by storing and possibly transmitting data in an unreadable form. The DBMS encrypts data when it is added to the database and decrypts it when returning query results. This process can occur on the client side of a network connection to prevent unauthorized access at the point of use.

[edit]Confidentiality

Law and regulation governs the release of information from some databases, protecting medical history, driving records, telephone logs, etc.
In the United Kingdom, database privacy regulation falls under the Office of the Information Commissioner. Organizations based in the United Kingdom and holding personal data in digital format such as databases must register with the Office.[8]

[edit]Locking

When a transaction modifies a resource, the DBMS stops other transactions from also modifying it, typically by locking it. Locks also provide one method of ensuring that data does not change while a transaction is reading it or even that it doesn't change until a transaction that once read it has completed.

[edit]Granularity

Locks can be coarse, covering an entire database, fine-grained, covering a single data item, or intermediate covering a collection of data such as all the rows in a RDBMS table.

[edit]Lock types

Locks can be shared[9] or exclusive, and can lock out readers and/or writers. Locks can be created implicitly by the DBMS when a transaction performs an operation, or explicitly at the transaction's request.
Shared locks allow multiple transactions to lock the same resource. The lock persists until all such transactions complete. Exclusive locks are held by a single transaction and prevent other transactions from locking the same resource.
Read locks are usually shared, and prevent other transactions from modifying the resource. Write locks are exclusive, and prevent other transactions from modifying the resource. On some systems, write locks also prevent other transactions from reading the resource.
The DBMS implicitly locks data when it is updated, and may also do so when it is read. Transactions explicitly lock data to ensure that they can complete without a deadlock or other complication. Explicit locks may be useful for some administrative tasks.[10][11]
Locking can significantly affect database performance, especially with large and complex transactions in highly concurrent environments.

[edit]Isolation

Isolation refers to the ability of one transaction to see the results of other transactions. Greater isolation typically reduces performance and/or concurrency, leading DBMSs to provide administrative options to reduce isolation. For example, in a database that analyzes trends rather than looking at low-level detail, increased performance might justify allowing readers to see uncommitted changes ("dirty reads".)

[edit]Deadlocks

Deadlocks occur when two transactions each require data that the other has already locked exclusively. Deadlock detection is performed by the DBMS, which then aborts one of the transactions and allows the other to complete.

[edit]See also

[edit]References

  1. a b Date 1990, pp. 31–32
  2. ^ "Design of Main Memory Database System/Overview of DBMS". En.wikibooks.org. Retrieved 2010-08-29.
  3. ^ Introducing databases by Stephen Chu, in Conrick, M. (2006) Health informatics: transforming healthcare with technology, Thomson, ISBN 0-17-012731-1, p. 69.
  4. ^ Date, C. J. (June 1, 1999). "When's an extension not an extension?"Intelligent Enterprise 2 (8).
  5. ^ Lightstone, Teorey & Nadeau 2007
  6. ^ "MySQL Cluster". Mysql.com. 2010-08-25. Retrieved 2010-08-29.
  7. ^ "Oracle Real Application Cluster (RAC)". Oracle.com. 2010-03-23. Retrieved 2010-08-29.
  8. ^ "Information Commissioner's Office". ICO. Retrieved 2010-08-29.
  9. ^ "Information on Shared Locks". Methodsandtools.com. Retrieved 2010-08-29.
  10. ^ ""Locking tables and databases" (section in some documentation / explanation from IBM)". Publib.boulder.ibm.com. Retrieved 2010-08-29.
  11. ^ "Routine Database Maintenance". Postgresql.org. Retrieved 2010-08-29.

[edit]Further reading

  • Ling Liu and Tamer M. Özsu (Eds.) (2009). "Encyclopedia of Database Systems, 4100 p. 60 illus. ISBN 978-0-387-49616-0. Table of Content available athttp://refworks.springer.com/mrw/index.php?id=1217
  • Beynon-Davies, P. (2004). Database Systems. 3rd Edition. Palgrave, Houndmills, Basingstoke.
  • Connolly, Thomas and Carolyn Begg. Database Systems. New York: Harlow, 2002.
  • Date, C. J. An Introduction to Database Systems, Eighth Edition, Addison Wesley, 2003.
  • Date, C. J. (2003). An Introduction to Database Systems, Fifth Edition. Addison Wesley. ISBN 0-201-51381-1.
  • Galindo, J.; Urrutia, A.; Piattini, M. Fuzzy Databases: Modeling, Design and Implementation (FSQL guide). Idea Group Publishing Hershey, USA, 2006.
  • Galindo, J., Ed. Handbook on Fuzzy Information Processing in Databases. Hershey, PA: Information Science Reference (an imprint of Idea Group Inc.), 2008.
  • Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann Publishers, 1992.
  • Kroenke, David M. Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130-144.
  • Kroenke, David M. and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007.
  • Lightstone, S.; Teorey, T.; Nadeau, T. (2007). Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more. Morgan Kaufmann Press. ISBN 0-12-369389-6.
  • O'Brien, James. "Management Information Systems". New York 1999
  • Shih, J. "Why Synchronous Parallel Transaction Replication is Hard, But Inevitable?", white paper, 2007.
  • Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press, 2005. ISBN 0-12-685352-5
  • Tukey, John W. Exploratory Data Analysis. Reading, MA: Addison Wesley, 1977.
  • Manovich, Lev.Database as a Symbolic Form, Cambridge: MIT press, 2001.