exercises/solutions/sol01.tex
changeset 0 f96eb9c3b3b0
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/exercises/solutions/sol01.tex	Thu May 10 02:47:23 2012 +0200
     1.3 @@ -0,0 +1,122 @@
     1.4 +\documentclass[a4paper, 10pt, pagesize, smallheadings]{article}
     1.5 +\usepackage{graphicx}
     1.6 +%\usepackage[latin1]{inputenc}
     1.7 +\usepackage{amsmath, amsthm, amssymb}
     1.8 +\usepackage{typearea}
     1.9 +\usepackage{algorithm}
    1.10 +\usepackage{algorithmic}
    1.11 +\usepackage{fullpage}
    1.12 +\usepackage{mathtools}
    1.13 +\usepackage[all]{xy}
    1.14 +\usepackage{tikz}
    1.15 +\usepackage{tikz-qtree}
    1.16 +\addtolength{\voffset}{-40pt}
    1.17 +\title{Machine Learning Exercise 01 Solution}
    1.18 +\author{Eugen Sawin}
    1.19 +\renewcommand{\familydefault}{\sfdefault}
    1.20 +\newcommand{\E}{\mathcal{E}}
    1.21 +\newcommand{\R}{\mathcal{R}}
    1.22 +
    1.23 +%\include{pythonlisting}
    1.24 +
    1.25 +\pagestyle{empty}
    1.26 +\begin{document}
    1.27 +\maketitle
    1.28 +
    1.29 +\section*{Exercise 1.1}
    1.30 +(a) Possible Pacman tasks that could be solved using machine learning are:
    1.31 +\begin{itemize}
    1.32 +  \item Survival: learn to escape the ghosts.
    1.33 +  \item Fight back: learn to eat ghosts when they are vulnerable.
    1.34 +  \item Eat: maximise dots eaten per elapsed time.
    1.35 +\end{itemize}
    1.36 +(b) I would apply supervised learning on:
    1.37 +\begin{itemize}
    1.38 +  \item Predict tomorrow’s price of a particular stock
    1.39 +  \item Predict your life expectancy
    1.40 +  \item Predict if it is going to rain tomorrow
    1.41 +\end{itemize}
    1.42 +(c) The following problems are classification problems:
    1.43 +\begin{itemize}
    1.44 +  \item Predict if it is going to rain tomorrow
    1.45 +\end{itemize}
    1.46 +
    1.47 +\section*{Exercise 1.2}
    1.48 +(a) A version space consists of all consistent hypothesis from a hypothesis space with respect to the given set of examples. The order is induced by the strict ordering \emph{more-general-than}($>_g$)/\emph{more-specific-than}($<_g$).
    1.49 +
    1.50 +$S$ represents the set of the most specific consistent hypothesis (summary of all positive examples) and $G$ represents the set of the most general consistent hypothesis (summary of all negative examples). All other consistent hypothesis are between these given boundaries, implicitely. 
    1.51 +
    1.52 +For inconsistent examples $S$ and $G$ become empty and so does the version space, this is also the case if the language used for the definition of the hypothesis space is insufficient to describe the target concept.\\\\
    1.53 +(b) Just imagine a beautiful looking graph here depicting the ordered hypothesis space.\\\\
    1.54 +(c)
    1.55 +\begin{tabular}{r c c}
    1.56 +$i$ & $S_i$ & $G_i$\\\hline
    1.57 +$0$ & $\{\langle \emptyset,\emptyset,\emptyset \rangle\}$
    1.58 +    & $\{\langle ?,?,? \rangle\}$\\
    1.59 +$1$ & $\{\langle +,+,+ \rangle\}$
    1.60 +    & $\{\langle ?,?,? \rangle\}$\\
    1.61 +$2$ & $\{\langle +,+,? \rangle\}$
    1.62 +    & $\{\langle ?,?,? \rangle\}$\\
    1.63 +$3$ & $\{\langle +,+,? \rangle\}$
    1.64 +    & $\{\langle ?,+,? \rangle\}$\\
    1.65 +$4$ & $\{\langle +,+,? \rangle\}$
    1.66 +    & $\{\langle +,+,? \rangle\}$\\\hline
    1.67 +$5$ & $\{\langle +,+,? \rangle\}$
    1.68 +    & $\{\langle +,+,? \rangle\}$\\
    1.69 +$6$ & $\{\langle +,+,? \rangle\}$
    1.70 +    & $\{\langle +,+,? \rangle\}$\\
    1.71 +\end{tabular}\\\\
    1.72 +(d) No, but it does affect the convergence speed. The hypothesis space represents a systematic model of all consistent hypothesis and does not converge to local specialisations.\\\\
    1.73 +(e) For concrete target concept like $\langle +,+ \rangle$ three examples are enough, one positive and two negative. For a concrete general concept like $\langle +,? \rangle$ we need four examples, two of each type. For the trivial concept $\langle ?,? \rangle$ two (obviously) positive examples are enough.\\\\
    1.74 +(f)
    1.75 +\begin{tabular}{r c c}
    1.76 +$i$ & $S_i$ & $G_i$\\\hline
    1.77 +$0$ & $\{\langle \emptyset,\emptyset,\emptyset.\emptyset \rangle\}$
    1.78 +    & $\{\langle ?,?,?,? \rangle\}$\\
    1.79 +$1$ & $\{\langle +,+,+,- \rangle\}$
    1.80 +    & $\{\langle ?,?,?,? \rangle\}$\\
    1.81 +$2$ & $\{\langle +,+,?,- \rangle\}$
    1.82 +    & $\{\langle ?,?,?,? \rangle\}$\\
    1.83 +$3$ & $\{\langle ?,?,?,? \rangle\}$
    1.84 +    & $\{\langle ?,?,?,? \rangle\}$\\
    1.85 +\end{tabular}\\
    1.86 +It converged to a wrong hypothesis. If we move example $d_3$, it will result in an empty set $S$ and therefore not consistent hypothesis, because the concept to be learned is not in the defined hypothesis space.
    1.87 +
    1.88 +\section*{Exercise 1.3}
    1.89 +(a) $E(S) = -\frac{1}{7}\log_2\frac{1}{7}-\frac{2}{7}\log_2\frac{2}{7}-\frac{2}{7}\log_2\frac{2}{7}-\frac{2}{7}\log_2\frac{2}{7} \approx 1.95$\\
    1.90 +$E(S|fever) = \frac{2}{7}(1) + \frac{3}{7}(2) + \frac{2}{7}(1) \approx 1.64$\\
    1.91 +$E(S|vomiting) = \frac{4}{7}(-2\frac{1}{4}\log_2\frac{1}{4}-\frac{2}{4}\log_2\frac{2}{4}) + \frac{3}{7}(-\frac{1}{3}\log_2\frac{1}{3}-\frac{2}{3}\log_2\frac{2}{3}) \approx 1.25$\\
    1.92 +$E(S|diarrhea) = \frac{3}{7}(-\frac{2}{3}\log_2\frac{2}{3}-\frac{1}{3}\log_2\frac{1}{3}) + \frac{4}{7}(-\log_2\frac{2}{4}) \approx 0.96$\\
    1.93 +$E(S|shivering) = \frac{6}{7}(-2\frac{1}{6}\log_2\frac{1}{6}-2\frac{2}{6}\log_2\frac{2}{6}) + \frac{1}{7}\cdot 0 \approx 1.64$\\
    1.94 +As we can clearly see, we get the greates gain with diarrhea $G(S,diarrhea) = 1.95 - 0.96 = 0.99$. Since the human brain is quite good at approximating entropy, this was visible from the beginning and I will refrain from calculating it for each step. So here is the tree.
    1.95 +\begin{figure}[h]
    1.96 +\centering
    1.97 +\begin{tikzpicture}[level distance=50pt, sibling distance=30pt]
    1.98 +\Tree
    1.99 +  [.\node[draw]{diarrhea}; 
   1.100 +      \edge node[auto=right]{no}; [.\node[draw]{fever}; 
   1.101 +        \edge node[auto=right]{no}; \node[draw]{H};  
   1.102 +        \edge node[auto=right]{avg}; \node[draw]{I};
   1.103 +        \edge node[auto=left]{high}; \node[draw]{I};
   1.104 +      ]
   1.105 +      \edge node[auto=left]{yes}; [.\node[draw]{fever};
   1.106 +        \edge node[auto=right]{no}; \node[draw]{B};
   1.107 +        \edge node[auto=left]{avg}; [.\node[draw]{vomiting};
   1.108 +          \edge node[auto=right]{no}; \node[draw]{S};
   1.109 +          \edge node[auto=left]{yes}; \node[draw]{B};
   1.110 +        ]
   1.111 +        \edge node[auto=left]{high}; \node[draw]{S};
   1.112 +      ]
   1.113 +  ]
   1.114 +\end{tikzpicture}
   1.115 +\caption{The complete search tree with the dotted edges depicting the solution path.}
   1.116 +\label{searchtree}
   1.117 +\end{figure}\\
   1.118 +(b) I'm just assuming what a disjoint definition is, no it does not, since both average and high fever results in influenza in the case of diarrhea.\\\\
   1.119 +(c) Candidates are $A = \frac{164+169}{2}$, $B=\frac{175+176}{2}$, $C=\frac{179+180}{2}$, $D=\frac{184+185}{2}$.\\
   1.120 +$E(A) = \frac{2}{9}\cdot 0 + \frac{7}{9}(-\frac{4}{7}\log_2\frac{4}{7}-\frac{3}{7}\log_2\frac{3}{7}) \approx 0.77$\\
   1.121 +$E(B) = \frac{4}{9}\cdot 1 + \frac{5}{9}(-\frac{2}{5}\log_2\frac{2}{5}-\frac{3}{5}\log_2\frac{3}{5}) \approx 0.98$\\
   1.122 +$E(C) = \frac{6}{9}(-\frac{2}{6}\log_2\frac{2}{6}-\frac{4}{6}\log_2\frac{4}{6}) + \frac{3}{9}(-\frac{2}{3}\log_2\frac{2}{3}-\frac{1}{3}\log_2\frac{1}{3}) \approx 0.92$\\
   1.123 +$E(D) = \frac{8}{9}\cdot 1 + \frac{1}{9} \cdot 0 = 1$\\
   1.124 +Splitting threshold $A$ gives the lowest entropy and therefore greatest gain $G = 0.99 - 0.77 = 0.22$.
   1.125 +\end{document}