{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Library/Python/2.7/site-packages/IPython/config.py:13: ShimWarning: The `IPython.config` package has been deprecated. You should import from traitlets.config instead.\n",
" \"You should import from traitlets.config instead.\", ShimWarning)\n",
"/Library/Python/2.7/site-packages/IPython/utils/traitlets.py:5: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.\n",
" warn(\"IPython.utils.traitlets has moved to a top-level traitlets package.\")\n"
]
},
{
"data": {
"text/plain": [
"'Connected: None@None'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%load_ext sql\n",
"%sql sqlite://"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Utility functions...\n",
"from IPython.core.display import display_html, HTML\n",
"def to_html_table(res, style=None):\n",
" html = '
'\n",
" html += ' | '.join(res.keys) + ' |
---|
'\n",
" html += ' |
'.join([' | '.join([str(cell) for cell in row]) for row in list(res)])\n",
" return html + ' |
'\n",
"def display_side_by_side(l, r):\n",
" s = \"display: inline-block;\"\n",
" html = to_html_table(l, style=s) + ' ' + to_html_table(r, style=s)\n",
" display_html(HTML(data=html))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"MVDs\n",
"===\n",
"\n",
"MVDs. Wow. Everyone seems to be talking about them these days. Is this because they appear in PS2 and we skipped going over them formally in lecture? Or... because they might just be the next big thing?\n",
"\n",
"Either way, this notebook is meant to go into a little more depth about them.\n",
"\n",
"**Note that there are some nice viualizations in the lecture slides that we won't reproduce here, so please look those over as well! Also check out the Activity of lecture 8...**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Formal definition\n",
"\n",
"Given a relation $R$ having attributes $A$, and two sets of attributes $X,Y\\subseteq A$, the **_multi-value dependency (MVD)_** $X\\twoheadrightarrow Y$ holds on $R$ if, for **any** tuples $t_1,t_2\\in R$ s.t. $t_1[X] = t_2[X]$, there exists a tuple $t_3\\in R$ s.t.:\n",
"\n",
"* $t_3[X] = t_1[X] = t_2[X]$\n",
"* $t_3[Y] = t_1[Y]$\n",
"* $t_3[A\\setminus Y] = t_2[A\\setminus Y]$\n",
"\n",
"where $A\\setminus B$ = all elements of $A$ not in $B$.\n",
"\n",
"So let's consider a toy example at this point:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"Done.\n",
"1 rows affected.\n",
"1 rows affected.\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" A | \n",
" B | \n",
" C | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
"
"
],
"text/plain": [
"[(1, 1, 0), (1, 0, 1)]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS R; CREATE TABLE R (A INT, B INT, C INT);\n",
"INSERT INTO R VALUES (1, 1, 0);\n",
"INSERT INTO R VALUES (1, 0, 1);\n",
"SELECT * FROM R;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's consider the first two rows as $t_1$ and $t_2$ respectively; what is the $t_3$ that the definition tells us we must add if we want the MVD $\\{A\\}\\twoheadrightarrow\\{B\\}$ to hold? Let's add it in:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 rows affected.\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" A | \n",
" B | \n",
" C | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
"
"
],
"text/plain": [
"[(1, 1, 0), (1, 0, 1), (1, 1, 1)]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql INSERT INTO R VALUES (1,1,1); SELECT * FROM R;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However what about if we consider the first two rows as $t_2$ and $t_1$ respectively? What row would the definition tell us we'd have to add in?"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 rows affected.\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" A | \n",
" B | \n",
" C | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
"
"
],
"text/plain": [
"[(1, 1, 0), (1, 0, 1), (1, 1, 1), (1, 0, 0)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql INSERT INTO R VALUES (1,0,0); SELECT * FROM R;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, if we checked every other pair like this, we'd see that we are done- the MVD $\\{A\\}\\twoheadrightarrow\\{B\\}$ holds!\n",
"\n",
"## A second definition\n",
"\n",
"We see that this suggests another (equivalent) definition, which we'll phrase slightly less formally:\n",
"\n",
"**For an MVD $\\{A\\}\\twoheadrightarrow\\{B\\}$ to hold, for any pair of tuples that agree on attributes $A$, we also must find the corresponding _\"swapped\"_ pair: a pair of tuples that also have the same value of $A$ but have their $B$ and $(A\\cup B)^C$ attributes swapped.**\n",
"\n",
"(*where $(A\\cup B)^C$ just means the attributes not in $A$ or $B$*)\n",
"\n",
"This definition should make sense, and might even feel more intuitive, given the toy example above!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Addressing one point of confusion:\n",
"\n",
"Remember, an MVD holds over a **relation** (not a single tuple, not a single pair of tuples $t_1,t_2$...). Again, look at the definition- an MVD, say $\\{X\\}\\twoheadrightarrow\\{Y\\}$, holds over a relation $R$ if **_for any possible_** pair of tuples $t_1,t_2$ in $R$ such that $t_1[A]=t_2[A]$, the condition in the definition holds...\n",
"\n",
"\n",
"### Another clarification of terminology:\n",
"So how can we ever _test_ if an MVD holds over a relation $R$, based just on one _instance_ of $R$? Couldn't someone always add in more tuples and violate it? Aren't we being a little sloppy when we say an MVD 'holds' on an _instance_ of a relation just based on checking the instance..?\n",
"\n",
"This is correct. Recall that we have the same situation with FDs- we need to have _external information_ (such as a set of functional dependencies) to _prove_ that an MVD holds in general over a relation. We **can** prove that it is violated or not violated by an instance of $R$, and that it thus could or could not potentially hold in general.\n",
"\n",
"So when we say an MVD (or an FD) _holds_ on an instance, based only on checking the instance, we implictly mean the above- that based on the instance we see, it _could_ hold in general."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### A third definition...\n",
"\n",
"At this point (especially if you're still reading) you may suspect a connection to cross-products and joins. Consider decomposing our toy example in its **original state (before we added in the two rows to make the MVD hold)**. Let's decompose it into two tables, split by the attribute sets of the MVD (recall it was $\\{A\\}\\twoheadrightarrow\\{B\\}$):"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"Done.\n",
"1 rows affected.\n",
"1 rows affected.\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" A | \n",
" B | \n",
" C | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
"
"
],
"text/plain": [
"[(1, 1, 0), (1, 0, 1)]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"DROP TABLE IF EXISTS R; CREATE TABLE R (A INT, B INT, C INT);\n",
"INSERT INTO R VALUES (1, 1, 0);\n",
"INSERT INTO R VALUES (1, 0, 1);\n",
"SELECT * FROM R;"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"Done.\n",
"Done.\n",
"Done.\n",
"Done.\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
" "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%sql DROP TABLE IF EXISTS R1; CREATE TABLE R1 AS SELECT A,B FROM R GROUP BY A,B;\n",
"%sql DROP TABLE IF EXISTS R2; CREATE TABLE R2 AS SELECT A,C FROM R GROUP BY A,C;\n",
"r1 = %sql SELECT * FROM R1;\n",
"r2 = %sql SELECT * FROM R2;\n",
"display_side_by_side(r1,r2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's take the join to recompose:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" A | \n",
" B | \n",
" C | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
"
"
],
"text/plain": [
"[(1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT r1.A AS A, r1.B AS B, r2.C AS C\n",
"FROM R1 r1, R2 r2\n",
"WHERE r1.A = r2.A;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Woah! The MVD that we decomposed along holds for the join of the decomposition! We see that this is another definition for an MVD, in terms of the _'local joins'_: an MVD $\\{A\\}\\twoheadrightarrow\\{B\\}$ holds if whenever we take a block of rows with the same value for the $A$ attributes, decompose it into $R_1(A,B)$ and $R_2(A,(A\\cup B)^C)$, and then take the join of these tables, we end up with the same rows we had before!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.13"
}
},
"nbformat": 4,
"nbformat_minor": 1
}