Quantcast
Viewing all articles
Browse latest Browse all 79

Method for determining candidate keys and highest normal form of a relation based on functional dependencies

For my advanced database systems course I needed to learn how to take a given relation and functional dependencies, tell the highest normal form and then normalize it up to BCNF. It’s actually not that hard, but there are a lot of pitfalls to watch out for. Here I’m going to show the methods I learned to solve the exam questions. I need to give credit to the mentors and StinaQ for their help and tutorials!

Visualizing the dependencies

You start with a given relation and its dependencies. For example:

R(A,B,C)
A → B
B → {A,C}

The trick is to draw boxes in order to visualize the dependencies (ignore the right hand side CK/NP box for now):

Note that B->{A,C} was broken down to B->A and B->C thanks to the decomposition rule.

Let’s try a more tricky one:

R(A,B,C,D,E,F)
A → B
B → A
{B,C} → D
C → E

Note that B and C in this relation is a composite key that together determine D. Thus we draw a box around both of them to illustrate this dependency. And don’t forget F, even though nothing is pointing to it!

Candidate Keys & Non-Primes

It’s time to determine the candidate keys (CK:s). Recall that a CK is one or more attributes that with which you can determine all other attributes in the relation. (In the end, one of the CK:s will be chosen to be the primary key.)

To find the candidate keys:

  • Begin by selecting a start box (this can be a composite key too). If there is a box at which no arrow is pointing, this is usually a good place start. Follow the outwards arrows to the next box, and from this box to the next and so forth. The goal is to find find a way that covers all boxes.
  • If you can’t cover all boxes, try another starting box. If no single attribute (i.e. box) is able to cover all boxes by itself, the CK needs to include one or more additional attributes.
  • There can be many candidate keys, so repeat this until you are sure you haven’t missed anything.
  • If no arrow points at a box, the rule is that it must be part of all CK:s. This includes stand-alone boxes like F in example 2.
  • A candidate key is a minimal super key. Minimal means that you can’t remove any attribute from the key and still determine all other attributes. Note that you can still have keys of different sizes. For example, {A} and {B,C} in Example 3 below can both be candidate keys. But {A,B,C} would not be minimal since you could remove either {A} or {B,C} from it and still determine the relation.

Now that all CK:s are determined, also write down the non-primary attributes (NP). The Non-Primes are simply all attributes that are not part of any CK.

Finding the highest normal form

It’s time to determine the highest normal form of the relation. I will not use the exact formal definitions here, but you should learn them too in order to understand what’s happening here, especially if you need to motivate your answer on an exam!

  • Is the relation in 1NF? Yes, always!
  • Is the relation in 2NF? Is there a composite CK? If not, 2NF is automatically achieved. Otherwise, check your boxes. In any of your composite CKs: does a part of it (i.e. a box inside the composite box) by itself determine (point to) a non-prime attribute? If yes, NF2 does not pass (and the relation is thus in 1NF).
  • Is the relation in 3NF? Does the relation have any non-primes? If no, the relation is automatically in 3NF! Otherwise: Is there a non-prime that determines (point to) another non-prime? If yes, 3NF does not pass (and the relation is thus in 2NF).
  • Is the relation in BCNF? Are all the determinants also candidate keys? If yes, BCNF pass. This is easiest to check by looking at the dependencies you were given (e.g. A->B and B->{A,C} in example 1) and check that each key (left hand side of the arrow) is in your list of CKs.

Let’s try this! Here is another example:

R(A,B,C,D,E)
A → {B,C}
{B,C} → A,D
D → E

First find the CK:s. We can easily see that {B,C} together determines A and D. Since D also determines E, starting with {B,C} can determine all other attributes and is thus a CK. But wait! Since A determines {B,C} it must be a CK on its own!

Now that we have the CK:s, we see that D and E are not part of any CK, so they are non-primary attributes.

All relations are already in 1NF, so we note this. How about 2NF? There is a composite key {B,C} so we need to check if part of it can determine a non-prime on it’s own? Not a problem; this relation is in 2NF! 3NF say that a non-prime must not determine another non-prime. Here is a problem, since D determines E. Thus 3NF does not pass, and the relation only achieves 2NF!

Note the circle I draw to indicate the highest NF, e.g. 2NF in the above example. Since 3NF did not pass, the highest NF is 2NF. If you don’t do this, it is easy to make a mistake and pick the last one as the highest NF.

Tips

  • When you have finished drawing the boxes, do a final count of all attributes in R() and make sure they are all present as boxes. It’s very easy to forget the ones without any dependencies otherwise!
  • Remember that for example a 3NF relation is also in 2NF. If an exam question is “is the relation in 2NF?” and you determine that the highest NF is 3NF, then the answer to the question is true. Don’t mix up “highest” vs “is in”.
  • This page allows you to check that you have correctly determined the CKs and normal forms. It’s a bit quirky to use, but very convenient!

In the next post I will continue this tutorial and show how to normalize the relations to make sure they are in BCNF!


Viewing all articles
Browse latest Browse all 79

Trending Articles