What is the data dictionarys function in database design?

Your Organization

Kelly C. Bourne, in Application Administrators Handbook, 2014

19.11 Data Dictionary

A data dictionary is a centralized repository of metadata. Metadata is data about data. Some examples of what might be contained in an organization’s data dictionary include:

•

The names of fields contained in all of the organization’s databases

•

What table(s) each field exists in

•

What database(s) each field exists in

•

The data types, e.g., integer, real, character, and image of all fields in the organization’s databases

•

The sizes, e.g., LONG INT, DOUBLE, and CHAR(64), of all fields in the organization’s databases

•

An explanation of what each database field means

•

The source of the data for each database field

•

A list of applications that reference each database field

•

The relationship between fields in all of the organization’s databases

•

Default values that exist for all fields in all of the organization’s databases

•

Who has access to each field

Does your organization maintain a data dictionary? If so, does it impact the application that you support? Some ways in which it might impact you include:

•

Does information applicable to your application exist in the data dictionary?

•

Are you required to periodically review and validate data dictionary entries related to your application?

•

Are you required to update the data dictionary if your application’s database details change? For example, if new fields are added or the size of existing fields is changed, do these changes need to be reflected in the data dictionary?

•

If new users are added to your application, does their access need to be reflected in the data dictionary?

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123985453000194

Beyond the corporate library: information management in organisations

Michael Middleton, in Libraries in the Twenty-First Century, 2007

Quality control

Quality control procedures range from software support for data processing at the technical level through to scrutiny and performance review of management processes. In libraries, the data processing quality control may be through the medium of authority files that support cataloguing processes; the performance review may be of a task such as average time to undertake reference queries.

Each of these procedures has its equivalent in the information management world outside libraries. Many data dictionaries, for instance, provide for data elements to have validation lists – that is, the data instances for a particular data element, such as person’s name, may have only certain allowable values. Query answer throughput is a significant aspect of performance review in call centres.

Data dictionaries provide for formalising and controlling the naming of entities, attributes and their relationships within databases, for example by inclusion of:

•

Data entities such as elements, tables, rows and keys.

•

System entities such as programs and modules.

•

External entities such as description of people, documents and devices.

•

Identification attributes such as naming along with synonyms or aliases.

•

Representation attributes such as data type or number of characters in an element.

•

Control attributes such as ownership – who is allowed to change data instances for an element.

•

Cardinality relationships – the number of instances in one entity that may be related to instances in a related entity, for example, the number of rows in a table.

•

Subtype or subsumption relationships that indicate whet one entity is a part of another (a sedan, for example, is a subtype of car).

When put into effect, data dictionaries support quality control of data as the data are entered. When an operator is required to enter the postcode for an address into a database, for instance, a data dictionary may be used to:

•

Validate the operator as a user who is allowed to enter postcodes.

•

Have a postcode data element of a limited number of characters.

•

Allow postcodes to appear only within the numerical range associated with the country of instance.

•

Provide a picklist of allowed postcodes from a scrollable dialogue box for the data element.

•

Provide alternative names to be used for the element (for example, zipcode) by operators in different countries.

•

Maintain a history of versions of naming and allowed values provided for any picklists.

Although dictionaries help to control data, they have limitations when it comes to fields that are more difficult to validate such as name and address. Data entry operators inevitably make keyboard transcription errors – they may be unable to differentiate forenames from family names or the same customer may have his or her name recorded in different ways in the same organisation (with initials, with full forenames, with slight spelling variations in family name or with family name changing over time). These present problems with identity tracking, or with matching, say, a purchase order and a complaint by the same person.

Standards authorities attempt to provide assistance in this area, for example, Standards Australia has a standard for client interchange information that is presently under revision. Nevertheless, large corporations, even if they heed standards, find it necessary to carry out monitoring of their large data sets. Similarly, smaller organisations responsible for key information used by larger ones must have many data quality checking approaches. An example would be a credit reference agency such as Baycorp Advantage, which sells crucial credit checking information to businesses. The businesses themselves will have supplied much of the information that the agency uses. However, it can maintain data quality using highly structured data and validating data with source bodies (for example, address data with Australia Post) or by using specialist software such as comparators (comparing strings of data for likeness) or soundex (making phonetic matches) in order to identify element occurrences that are effectively the same even if they are recorded differently.

It is worth turning from databases to websites because, since the advent of the web, much has been written about maintaining the quality of web pages. Relevant advice appears in the many style guides that include recommendations about site quality. Corresponding guidance is provided in the checklists that support approaches to website evaluation. Queensland University of Technology’s FAVORS (http://www.favors.fit.qut.edu.au/) is one such list, maintained online with examples and references. A summary of the website evaluation criteria that it illustrates is shown in Table 11.1. Information quality is maintained as much as possible at the information acquisition stage for databases, but attention must also be paid to the forms of presentation, typically through websites.

Table 11.1. Website evaluation criteria based upon FAVORS

Criterion	Factors
Functionality	Active links; errors in mark-up; help facilities; layout; search facilities; site maps; alternate text for images.
Authority	Affiliations indicated; Copyright indications; creator responsibility; credentials; editorial oversight; funding source indication; viability.
Validity	Feedback; Ratings and awards; Refereed content; Referring links; Reviews of site; Usage figures.
Obtainability	Cost of access; Format support; Load factors; Metadata; Naming mnemonic; Security protection; Speed.
Relevance	Audience; Balance; Breadth; Controversial content; Currency; Depth.
Substance	Accuracy; Coverage; Detail; Evidence; Explanation; Readability.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781876938437500119

Generic data management software

Stuart Ferguson, Rodney Hebels, in Computers for Librarians (Third Edition), 2003

Data dictionaries

A data dictionary is at the heart of any database management system. The data dictionary contains important information, such as what files are in the database and descriptions (called attributes) of the data contained in the files. This information is used by the system to assess whether or not a particular process can be accomplished and whether or not a particular user is authorised to carry it out. Information stored in the data dictionary could normally be expected to include:

•

what data are available

•

where on the storage device they are located

•

data attributes (for instance, data type numerical or alphanumerical)

•

how the data are used

•

definitions of data security requirements (who is allowed to access the data, who is allowed to update/amend them)

•

relationships to other pieces of data

•

definitions of data integrity requirements.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781876938604500112

Enterprise-Level Data Architecture Practices

Charles D. Tupper, in Data Architecture, 2011

Data Dictionary–Metadata Repository

A data dictionary represents a compendium of all data definitions at the lowest level. That is, it consists of data attribute names and the definitions and characteristics associated with them. Normally it is established at the enterprise level but sometimes at the application level on an exception basis. While it is not necessary to compile this, it can be used as a guideline or source of new data names.

The enterprise level lets the pool of data attributes be reused throughout the enterprise, ensuring integrity of output while fostering understanding of the data. While it is critical to have a data dictionary of some kind, it doesn’t matter how it is implemented. As long as it contains or references the procedures and policies that ensure that all development is assisted or implemented by way of a data dictionary, it will ensure success and data sharing.

Dictionary policies and procedures must be defined and publicized due to the need for the developer, the modeler, and the client to all agree on how to encode the requirement in the dictionary. It must be sponsored from IT management as well as client management, since it is often seen by the client as unnecessary overhead. But, as we have seen, once it is defined for the transaction system, it becomes available for the reporting and EIS systems that will follow later on. It will also provide a basis for data sourcing for the data warehouse that will eventually be designed.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123851260000036

The Relational Data Model

Jan L. Harrington, in Relational Database Design and Implementation (Fourth Edition), 2016

Sample Data Dictionary Tables

The precise tables that make up a data dictionary depend somewhat on the DBMS. In this section, you will see one example of a typical way in which a DBMS might organize its data dictionary.

The linchpin of the data dictionary is actually a table that documents all the data dictionary tables (often named syscatalog, the first few rows of which can be found in Figure 5.4). From the names of the data dictionary tables, you can probably guess that there are tables to store data about base tables, their columns, their indexes, and their foreign keys.

Figure 5.4. A portion of a syscatalog table.

The syscolumn table describes the columns in each table (including the data dictionary tables). In Figure 5.5, for example, you can see a portion of a syscolumn table that describes the Antique Opticals merchandise item table.

Figure 5.5. Selected rows from a syscolumn table.

Keep in mind that these data dictionary tables have the same type of structure as all other tables in the database, and must adhere to the same rules as base tables. They must have non-null unique primary keys; they must also enforce referential integrity among themselves.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128043998000053

Users and Access Rights

Jan L. Harrington, in SQL Clearly Explained (Third Edition), 2010

Storing Access Rights

Access rights to tables and views are stored in the data dictionary. Although the details of the data dictionary tables vary from one DBMS to another, you will usually find access rights split between two system tables named something like SYSTABLEPERM and SYSCOLPERM.3

The first table is used when access rights are granted to entire tables or views; the second is used when rights are granted to specific columns within a table or view.

A SYSTABLEPERM table has a structure similar to the following:

Systableperm (table_id, grantee, grantor, selectauth, insertauth, deleteauth, updateauth, updatecols, referenceauth)

The columns represent

◊

TABLE_ID: An identifier for the table or view.

◊

GRANTEE: The user ID to which rights have been granted.

◊

GRANTOR: The user ID granting the rights.

◊

SELECTAUTH: The grantee's SELECT rights.

◊

INSERTAUTH: The grantee's INSERT rights.

◊

DELETEAUTH: The grantee's DELETE rights.

◊

UPDATEAUTH: The grantee's UPDATE rights.

◊

UPDATECOLS; Indicates whether rights have been granted to specific columns within the table or view. When this value is Y (yes), the DBMS must also look in SYSCOLPERM to determine whether a user has the rights to perform a specific action against the database.

◊

REFERENCEAUTH: The grantee's REFERENCE rights.

The columns that hold the access rights take one of three values: Y (yes), N (no), or G (yes with grant option).

Whenever a user makes a request to the DBMS to manipulate data, the DBMS first consults the data dictionary to determine whether the user has the rights to perform the requested action. (SQL-based DBMSs are therefore said to be data dictionary driven.) If the DBMS cannot find a row with a matching user ID and table identifier, then the user has no rights at all to the table or view. If a row with a matching user ID and table identifier exists, then the DBMS checks for the specific rights that the user has to the table or view and—based on the presence of Y, N, or G in the appropriate column—either permits or disallows the requested database access.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123756978500121

Overview of Auditing

In SAP Security Configuration and Deployment, 2009

Auditing Customized Programs

SAP allows a great deal of customization to the system via changes to the data dictionary and ABAP code. In order to track this customization, SAP has established the SAP Software Change Registration (SSCR) procedure that registers all manual changes to ABAP code and the data dictionary. When ABAP code or the data dictionary are changed, the system automatically requests an SSCR key. In auditing customized programs it is important to keep in mind that custom programs can circumvent key controls in the system. Here are some other items to look at when auditing custom programs:

▪

Verify programs are requested by approved individuals and are adequatly documented.

▪

Verify that before programs are moved to Production adequate security has been created for the program.

▪

Verify that only authorozed individuals maintain programs.

▪

Verify programs cannot be created in the Production enviroment.

▪

Verify obsolete transaction codes are blocked.

Here are some reports and configuration settings that can be used to review program changes:

▪

Use transaction code SE16 to look at table TRDIR to review custom programs and verify that:

▪

The title of the program sufficiently describes the purpose of the program.

▪

Custom programs are assigned to an appropriate authorization group.

▪

The person who created the customer program is authorized to created programs.

▪

Use transaction code SE16 to look at table TSTC to review a list of custom transaction codes, the programs assigned to the transaction code, and a description of the transaction code.

▪

Use transaction code SM01 to view locked transaction codes. Verify any transaction codes that are no longer used have been blocked.

▪

Use SE38 review if a custom program was adequately documented. In SE38, type in a custom program and hit the source code button. There should be wording that indicates the purpose for the custom program, when it was created, who requested the change, and under what logged issue ticket the change was requested.

▪

Verify ABAP code cannot be updated in Production by disabling repository and client independent changes in table T000, the Client setting table.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781597492843000077

Using CASE Tools for Database Design

Jan L. Harrington, in Relational Database Design and Implementation (Fourth Edition), 2016

The Data Dictionary

From a database designer’s point of view, the ER diagram and its associated data dictionary are the two most important parts of CASE software. Since you were introduced to several types of ER diagrams in Chapter 4, we will not repeat them here, but instead focus on the interaction of the diagrams and the data dictionary.

A data dictionary provides a central repository for documenting entities, attributes, and domains. In addition, by linking entries in the ER diagram to the data dictionary you can provide enough information for the CASE tool to generate the SQL CREATE statements needed to define the structure of the database.

The layout of a data dictionary varies with the specific CASE tool, as does the way in which entries are configured. In the CASE tool used for examples in this chapter, entities are organized alphabetically, with the attributes following the entity name. Entity names are red; attributes are blue. (Of course, you can’t see the colors in this black-and-white book, so you’ll have to take my word for it.) Domain names appear alphabetically among the entities. Each relationship in the related ERD also has an entry. Because each item name begins with “Relation,” all relationship entries sort together in the data dictionary.

When you select an entity name, the display shows the entity’s name, composition (the attributes in the entity), definition (details needed to generate SQL, and so on), and type of database element (in the References section). Figure 12.7, for example, shows the information stored in the data dictionary for Antique Opticals’ customer relation. All of the information about the entity (and all other entries, for that matter) is editable, but because the format is specific to the CASE tool, be careful when making changes unless you know exactly how entries should appear.

Figure 12.7. Definition of an entity in a data dictionary window.

Attribute entries (Figure 12.8) are similar to entity entries, but have no data in the composition section. Attribute definitions can include the attribute’s data type, a default value, and any constraints that have been placed on that attribute. In most cases, these details are entered through a dialog box, relieving the designer of worrying about specific SQL syntax.

Figure 12.8. Definition of an attribute in a data dictionary window.

Relationships (Figure 12.9) are named by the CASE tool. Notice that the definition indicates which entities the relationship relates, as well as which is at the “many” end of the relationship (the child) and which is at the “one” end (the parent).

Figure 12.9. Data dictionary entry for a relationship between two entities in an ERD.

Many relational DBMSs now support the definition of custom domains. Such domains are stored in the data dictionary (Figure 12.10), along with their definitions. Once a domain has been created and is part of the data dictionary, it can be assigned to attributes. If a database administrator needs to change a domain, it can be changed once in the data dictionary and propagated automatically to all attributes entries that use it.

Figure 12.10. Data dictionary entry for custom domain.

The linking of data dictionary entries to an ER diagram has another major benefit: The data dictionary can examine its entries and automatically identify foreign keys. This is yet another way in which the consistency of attribute definitions enforced by a CASE tool’s data dictionary can support the database design process.

Note: Mac A&D is good enough at identifying foreign keys to pick up concatenated foreign keys.

Keep in mind that a CASE tool is not linked dynamically with a DBMS. Although data definitions in the data dictionary are linked to diagrams, changes made to the CASE tool’s project will not affect the DBMS. It is up to the database administrator to make the actual changes to the database.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128043998000120

Digital preservation

Iris Xie PhD, Krystyna K. Matusiak PhD, in Discover Digital Libraries, 2016

Standards

The development and adoption of open standards proved to be critical to progress in digital preservation. Conceptual frameworks and metadata standards provide a theoretical foundation for developing reliable preservation systems and services. Two standards that have been recognized as particularly influential are the OAIS reference model and PREMIS metadata standard.

OAIS reference model is a high-level standard that provides a conceptual framework and consistent terminology for developing and maintaining archival information systems (Lee, 2010). The major purpose of the model is “to facilitate a much wider understanding of what is required to preserve and access information for the long term” (CCSDS, 2012, p. 2.1). It was developed by the researchers at the Consultative Committee for Space Data Systems (CCSDS) in 2001 and became an ISO standard in 2002. The model identifies the key players in the information environment, including Producers, Managers, and Consumers. In defining Information Object, it makes a distinction between Data Object (sequence of bits) and Representation Information. Data Object is interpreted with the associated Representation Information, yielding a useful and meaningful Information Object. This distinction is important in the context of archival information systems that need to support preservation of bits as well as the maintenance of Representation Information.

In addition to defining informational concepts, the Reference Model provides a functional layout of an archival system, identifying six main entities (Preservation Planning, Administration, Ingest, Data Management, Archival Storage, and Access), and the way that information flows between them (see Fig. 9.2). It addresses both the access and preservation aspects of ingesting digital objects and associated descriptive information into a repository for long-term storage. Lee (2010) notes that many aspects of the model rest on the distinction between the Submission Information Packages (SIP) received from Producers, the Archival Information Package (AIP) generated from SIPs upon ingest and managed by archives, and the Dissemination Archival Package (DIP) accessed by Consumers. The OAIS model provides a foundation for building and implementing standard and interoperable repository systems.

Figure 9.2. OAIS Functional Entities (CCSDS, 2012)

PREMIS (Preservation Metadata: Implementation Strategies) is the international standard for metadata to support the preservation of digital objects and ensure their long-term usability (Library of Congress, 2015). It specifies the metadata units that a repository needs to maintain core preservation functions. The standard was developed by the OCLC/RLG working group in 2005. Its current development is managed by the Library of Congress in conjunction with the PREMIS Editorial Committee. The standard consists of a Data Model and Data Dictionary. An XML schema is also available to support the implementation of the data dictionary in digital repository systems. Version 2.2 of the PREMIS Data Dictionary is currently available though the Library of Congress (PREMIS Editorial Committee, 2012).

The Data Dictionary defines preservation metadata as “the information a repository uses to support the digital preservation process” (PREMIS Editorial Committee, 2012, p. 3). Preservation metadata spans a number of metadata types, including descriptive, structural, technical, and administrative. The Data Dictionary places a strong emphasis on the documentation of digital provenance (the history of an object) and the documentation of relationships, especially relationships among different objects within the preservation repository.

PREMIS standard provides a simple data model to organize the semantic units defined in the Data Dictionary and to encourage a shared way to organize preservation metadata (Dappert and Enders, 2010). The following entities are defined in the Data Model:

•

Intellectual Entity: a set of content that is considered a single intellectual unit for purposes of management and description, for example, a particular book, map, photograph, or database.

•

Object (or Digital Object): a discrete unit of information in digital form.

•

Event: an action that involves or impacts at least one Object or Agent associated with or known by the preservation repository.

•

Agent: person, organization, or software program/system associated with Events in the life of an Object or with Rights attached to an Object.

•

Rights: assertions of one or more rights or permissions pertaining to an Object and/or Agent (PREMIS Editorial Committee, 2012, p. 6).

Fig. 9.3 demonstrates the entities in the PREMIS data model and the relationships between them.

Figure 9.3. The PREMIS Data Model (PREMIS Editorial Committee, 2012)

The PREMIS Data Dictionary defines semantic units, not metadata elements. As Caplan (2009) explains, PREMIS does not specify how metadata should be represented or implemented in a repository system; it only defines what the system needs to know and should be able to export to other systems. Semantic units describe properties of digital objects and their contexts or the relationships between them. Each semantic unit defined in the Data Dictionary is mapped to one of the entities in the Data Model. For example, the Object entity is described by a number of semantic units, such as objectIdentifierType or objectIdentifierValue, defined as mandatory (M) and nonrepeatable (NR). Semantic units are presented in a hierarchical structure.

PREMIS may be implemented in a variety of ways, which offers the potential of broad application across a wide range of preservation contexts. Guenther (2010) explores using PREMIS within a METS container and points to the benefits of using the two metadata standards together. A number of research studies investigate implementation of PREMIS in practical digital library settings. Alemneh (2009) examined the barriers to adopt PREMIS in cultural heritage institutions. Donaldson and Conway (2010) present a case study in which PREMIS is implemented in the Florida Digital Archive. Findings point to the iterative nature of the implementation process and to the necessity of adopting the standard in the local repository. Donaldson and Yakel (2013) investigated the adoption of PREMIS by several organizations registered with the Library of Congress PREMIS Implementers Group. The researchers confirm the findings of the earlier studies, indicating that many institutions have made the decision to adopt PREMIS, but few have fully implemented it.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124171121000090

Metadata

David Loshin, in Business Intelligence (Second Edition), 2013

Structural Metadata

Structural metadata comprises most of what is traditionally considered metadata that is organized as the data dictionary, and is derivable from database catalogs. This type of metadata can include:

▪

Data element information, like data element names, types, lengths, definitions, and other usage information;

▪

Table information, including table names; the description of what is modeled by each table; the database in which the table is stored; the physical location, size, and growth rate of the table; the data sources that feed each table; update histories (including the date of last update and of last refresh); the results of the last update; candidate keys; foreign keys; the degrees of the foreign key cardinality (e.g., 1:1 versus 1: many); referential integrity constraints; functional dependencies; and indexes;

▪

Record structure information, which describes the structure of the record; overall record size; whether the record is a variable or static length; all column names, types, descriptions, and sizes; source of values that populate each column; whether a column is an automatically generated unique key; null status; domain restrictions; and validity constraints.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123858894000090

What is the data dictionary's function in database design quizlet?

The function of the data dictionary is to define data attributes, entities, relationships and data types so other developers can use it.

What is the function of a data dictionary?

A data dictionary is used to catalog and communicate the structure and content of data, and provides meaningful descriptions for individually named data objects.

What is a data dictionary in a database?

A Data Dictionary is a collection of names, definitions, and attributes about data elements that are being used or captured in a database, information system, or part of a research project.

What is data dictionary in system design?

A data dictionary is a collection of data about data. It maintains information about the defintion, structure, and use of each data element that an organization uses. There are many attributes that may be stored about a data element.

Data dictionary example