exercises/solutions/sol01.tex
author Eugen Sawin <sawine@me73.com>
Thu, 10 May 2012 02:47:23 +0200
changeset 0 f96eb9c3b3b0
permissions -rw-r--r--
Final solution for ex1.
sawine@0
     1
\documentclass[a4paper, 10pt, pagesize, smallheadings]{article}
sawine@0
     2
\usepackage{graphicx}
sawine@0
     3
%\usepackage[latin1]{inputenc}
sawine@0
     4
\usepackage{amsmath, amsthm, amssymb}
sawine@0
     5
\usepackage{typearea}
sawine@0
     6
\usepackage{algorithm}
sawine@0
     7
\usepackage{algorithmic}
sawine@0
     8
\usepackage{fullpage}
sawine@0
     9
\usepackage{mathtools}
sawine@0
    10
\usepackage[all]{xy}
sawine@0
    11
\usepackage{tikz}
sawine@0
    12
\usepackage{tikz-qtree}
sawine@0
    13
\addtolength{\voffset}{-40pt}
sawine@0
    14
\title{Machine Learning Exercise 01 Solution}
sawine@0
    15
\author{Eugen Sawin}
sawine@0
    16
\renewcommand{\familydefault}{\sfdefault}
sawine@0
    17
\newcommand{\E}{\mathcal{E}}
sawine@0
    18
\newcommand{\R}{\mathcal{R}}
sawine@0
    19
sawine@0
    20
%\include{pythonlisting}
sawine@0
    21
sawine@0
    22
\pagestyle{empty}
sawine@0
    23
\begin{document}
sawine@0
    24
\maketitle
sawine@0
    25
sawine@0
    26
\section*{Exercise 1.1}
sawine@0
    27
(a) Possible Pacman tasks that could be solved using machine learning are:
sawine@0
    28
\begin{itemize}
sawine@0
    29
  \item Survival: learn to escape the ghosts.
sawine@0
    30
  \item Fight back: learn to eat ghosts when they are vulnerable.
sawine@0
    31
  \item Eat: maximise dots eaten per elapsed time.
sawine@0
    32
\end{itemize}
sawine@0
    33
(b) I would apply supervised learning on:
sawine@0
    34
\begin{itemize}
sawine@0
    35
  \item Predict tomorrow’s price of a particular stock
sawine@0
    36
  \item Predict your life expectancy
sawine@0
    37
  \item Predict if it is going to rain tomorrow
sawine@0
    38
\end{itemize}
sawine@0
    39
(c) The following problems are classification problems:
sawine@0
    40
\begin{itemize}
sawine@0
    41
  \item Predict if it is going to rain tomorrow
sawine@0
    42
\end{itemize}
sawine@0
    43
sawine@0
    44
\section*{Exercise 1.2}
sawine@0
    45
(a) A version space consists of all consistent hypothesis from a hypothesis space with respect to the given set of examples. The order is induced by the strict ordering \emph{more-general-than}($>_g$)/\emph{more-specific-than}($<_g$).
sawine@0
    46
sawine@0
    47
$S$ represents the set of the most specific consistent hypothesis (summary of all positive examples) and $G$ represents the set of the most general consistent hypothesis (summary of all negative examples). All other consistent hypothesis are between these given boundaries, implicitely. 
sawine@0
    48
sawine@0
    49
For inconsistent examples $S$ and $G$ become empty and so does the version space, this is also the case if the language used for the definition of the hypothesis space is insufficient to describe the target concept.\\\\
sawine@0
    50
(b) Just imagine a beautiful looking graph here depicting the ordered hypothesis space.\\\\
sawine@0
    51
(c)
sawine@0
    52
\begin{tabular}{r c c}
sawine@0
    53
$i$ & $S_i$ & $G_i$\\\hline
sawine@0
    54
$0$ & $\{\langle \emptyset,\emptyset,\emptyset \rangle\}$
sawine@0
    55
    & $\{\langle ?,?,? \rangle\}$\\
sawine@0
    56
$1$ & $\{\langle +,+,+ \rangle\}$
sawine@0
    57
    & $\{\langle ?,?,? \rangle\}$\\
sawine@0
    58
$2$ & $\{\langle +,+,? \rangle\}$
sawine@0
    59
    & $\{\langle ?,?,? \rangle\}$\\
sawine@0
    60
$3$ & $\{\langle +,+,? \rangle\}$
sawine@0
    61
    & $\{\langle ?,+,? \rangle\}$\\
sawine@0
    62
$4$ & $\{\langle +,+,? \rangle\}$
sawine@0
    63
    & $\{\langle +,+,? \rangle\}$\\\hline
sawine@0
    64
$5$ & $\{\langle +,+,? \rangle\}$
sawine@0
    65
    & $\{\langle +,+,? \rangle\}$\\
sawine@0
    66
$6$ & $\{\langle +,+,? \rangle\}$
sawine@0
    67
    & $\{\langle +,+,? \rangle\}$\\
sawine@0
    68
\end{tabular}\\\\
sawine@0
    69
(d) No, but it does affect the convergence speed. The hypothesis space represents a systematic model of all consistent hypothesis and does not converge to local specialisations.\\\\
sawine@0
    70
(e) For concrete target concept like $\langle +,+ \rangle$ three examples are enough, one positive and two negative. For a concrete general concept like $\langle +,? \rangle$ we need four examples, two of each type. For the trivial concept $\langle ?,? \rangle$ two (obviously) positive examples are enough.\\\\
sawine@0
    71
(f)
sawine@0
    72
\begin{tabular}{r c c}
sawine@0
    73
$i$ & $S_i$ & $G_i$\\\hline
sawine@0
    74
$0$ & $\{\langle \emptyset,\emptyset,\emptyset.\emptyset \rangle\}$
sawine@0
    75
    & $\{\langle ?,?,?,? \rangle\}$\\
sawine@0
    76
$1$ & $\{\langle +,+,+,- \rangle\}$
sawine@0
    77
    & $\{\langle ?,?,?,? \rangle\}$\\
sawine@0
    78
$2$ & $\{\langle +,+,?,- \rangle\}$
sawine@0
    79
    & $\{\langle ?,?,?,? \rangle\}$\\
sawine@0
    80
$3$ & $\{\langle ?,?,?,? \rangle\}$
sawine@0
    81
    & $\{\langle ?,?,?,? \rangle\}$\\
sawine@0
    82
\end{tabular}\\
sawine@0
    83
It converged to a wrong hypothesis. If we move example $d_3$, it will result in an empty set $S$ and therefore not consistent hypothesis, because the concept to be learned is not in the defined hypothesis space.
sawine@0
    84
sawine@0
    85
\section*{Exercise 1.3}
sawine@0
    86
(a) $E(S) = -\frac{1}{7}\log_2\frac{1}{7}-\frac{2}{7}\log_2\frac{2}{7}-\frac{2}{7}\log_2\frac{2}{7}-\frac{2}{7}\log_2\frac{2}{7} \approx 1.95$\\
sawine@0
    87
$E(S|fever) = \frac{2}{7}(1) + \frac{3}{7}(2) + \frac{2}{7}(1) \approx 1.64$\\
sawine@0
    88
$E(S|vomiting) = \frac{4}{7}(-2\frac{1}{4}\log_2\frac{1}{4}-\frac{2}{4}\log_2\frac{2}{4}) + \frac{3}{7}(-\frac{1}{3}\log_2\frac{1}{3}-\frac{2}{3}\log_2\frac{2}{3}) \approx 1.25$\\
sawine@0
    89
$E(S|diarrhea) = \frac{3}{7}(-\frac{2}{3}\log_2\frac{2}{3}-\frac{1}{3}\log_2\frac{1}{3}) + \frac{4}{7}(-\log_2\frac{2}{4}) \approx 0.96$\\
sawine@0
    90
$E(S|shivering) = \frac{6}{7}(-2\frac{1}{6}\log_2\frac{1}{6}-2\frac{2}{6}\log_2\frac{2}{6}) + \frac{1}{7}\cdot 0 \approx 1.64$\\
sawine@0
    91
As we can clearly see, we get the greates gain with diarrhea $G(S,diarrhea) = 1.95 - 0.96 = 0.99$. Since the human brain is quite good at approximating entropy, this was visible from the beginning and I will refrain from calculating it for each step. So here is the tree.
sawine@0
    92
\begin{figure}[h]
sawine@0
    93
\centering
sawine@0
    94
\begin{tikzpicture}[level distance=50pt, sibling distance=30pt]
sawine@0
    95
\Tree
sawine@0
    96
  [.\node[draw]{diarrhea}; 
sawine@0
    97
      \edge node[auto=right]{no}; [.\node[draw]{fever}; 
sawine@0
    98
        \edge node[auto=right]{no}; \node[draw]{H};  
sawine@0
    99
        \edge node[auto=right]{avg}; \node[draw]{I};
sawine@0
   100
        \edge node[auto=left]{high}; \node[draw]{I};
sawine@0
   101
      ]
sawine@0
   102
      \edge node[auto=left]{yes}; [.\node[draw]{fever};
sawine@0
   103
        \edge node[auto=right]{no}; \node[draw]{B};
sawine@0
   104
        \edge node[auto=left]{avg}; [.\node[draw]{vomiting};
sawine@0
   105
          \edge node[auto=right]{no}; \node[draw]{S};
sawine@0
   106
          \edge node[auto=left]{yes}; \node[draw]{B};
sawine@0
   107
        ]
sawine@0
   108
        \edge node[auto=left]{high}; \node[draw]{S};
sawine@0
   109
      ]
sawine@0
   110
  ]
sawine@0
   111
\end{tikzpicture}
sawine@0
   112
\caption{The complete search tree with the dotted edges depicting the solution path.}
sawine@0
   113
\label{searchtree}
sawine@0
   114
\end{figure}\\
sawine@0
   115
(b) I'm just assuming what a disjoint definition is, no it does not, since both average and high fever results in influenza in the case of diarrhea.\\\\
sawine@0
   116
(c) Candidates are $A = \frac{164+169}{2}$, $B=\frac{175+176}{2}$, $C=\frac{179+180}{2}$, $D=\frac{184+185}{2}$.\\
sawine@0
   117
$E(A) = \frac{2}{9}\cdot 0 + \frac{7}{9}(-\frac{4}{7}\log_2\frac{4}{7}-\frac{3}{7}\log_2\frac{3}{7}) \approx 0.77$\\
sawine@0
   118
$E(B) = \frac{4}{9}\cdot 1 + \frac{5}{9}(-\frac{2}{5}\log_2\frac{2}{5}-\frac{3}{5}\log_2\frac{3}{5}) \approx 0.98$\\
sawine@0
   119
$E(C) = \frac{6}{9}(-\frac{2}{6}\log_2\frac{2}{6}-\frac{4}{6}\log_2\frac{4}{6}) + \frac{3}{9}(-\frac{2}{3}\log_2\frac{2}{3}-\frac{1}{3}\log_2\frac{1}{3}) \approx 0.92$\\
sawine@0
   120
$E(D) = \frac{8}{9}\cdot 1 + \frac{1}{9} \cdot 0 = 1$\\
sawine@0
   121
Splitting threshold $A$ gives the lowest entropy and therefore greatest gain $G = 0.99 - 0.77 = 0.22$.
sawine@0
   122
\end{document}