On Using Hyperopt: Advanced Machine Learning

In Machine Learning one of the biggest problem faced by the practitioners in the process is choosing the correct set of hyper-parameters. And it takes a lot of time in tuning them accordingly, to stretch the accuracy numbers.

For instance lets take, SVC from well known library Scikit-Learn, class implements the Support Vector Machine algorithm for classification which contains more than 10 hyperparameters, now adjusting all ten to minimize the loss is very difficult just by using hit and trial. Though Scikit-Learn provides Grid Search and Random Search, but the algorithms are brute force and exhaustive, however hyperopt implements distributed asynchronous algorithm for hyperparameter optimization.

Introducing SMBO- Sequential Model Based Global Optimization. SMBO is one of the earlier versions of the hyperparameter optimization algorithms which takes next decision via observational history.

GP- Gaussian Processes approach is another such method.

One of the modern optimization algorithms, TPE (Tree Stuctured Parzen Estimator) approach which is an advanced tree based hyperparameter optimization algorithm.

Follow this link to know more on the former.

In the context of this article I will be highlighting Hyperopt, a module which implements TPE and use MongoDB trials and takes the optimization to the next level.

Lets first see what needs to be done while optimizing hyperparameters. An ML algorithm generally has a loss/cost function which needs to be minimized subject to the values of hyperparameters and parameters of the algorithm. Weights and Biases are decided on the basis of Optimization function of the ML algorithm. Selecting good hyperparameters further minimizes the loss fucntion, and makes model more robust and accurate, increasing the overall efficiency of the model. For instance in Gradient Descent, it is very important to select a good learning rate to reach the convergence point in shortest time, because if it is large the cost function will overshoot and won’t be able to find minima or if too small it will take forever to reach the minima.

So, Implementation of hyperopt enables us to minimize a loss function taking any number of variables. Lets work on a toy example, let the function be:

We have to minimize it, such that x and y belongs to a certain range R1 and R2.

To begin, lets get started with Installation firstI am assuming that you have python and sklearn installed, Hyperopt is a simple pip installation-

1
2
pip install hyperopt

One of the most common error you face while using hyperopt for first time is incorrect version of networkx, downgrade it to version 1.11 and you are good to go

1
2
sudo pip install networkx==1.11

Now that you have installed hyperopt, lets see a code sample to minimize above function using hyperopt:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from hyperopt import hp, tpe, fmin 
# we import tpe algorithm 
# fmin function which helps us minimize the equation
# hp which creates the search space
# creating the objective function
def function(args):
 x,y = args
 f = x**2 - y**2
 return f # returns a numerical value

# defining the search space, we'll explore this more later
space = [hp.uniform('x',-1,1),hp.uniform('y',-2,3)]

# calling the hyperopt function
best = fmin(function,space,algo=tpe.suggest,max_evals=10)
# fmin function’s first argument is the objective function
# second argument is the hyperopt space
# third the algorithm that is to be used for optimization
# maximum number of evaluations
# fmin returns a dictionary giving best hyper-parameter
print(best)

Now, lets go through how spaces are created then we will minimize the loss for a scikit-learn classifier.

Hyperopt Search Space:Unlike Scikit Learn’s Grid Search, Hyperopt search spaces does not take dictionary as it’s input. As can be seen in the example, I have used the list to define spaces for two different arguments x and y.

fmin() passes only one parameter to the function so, all the different spaces can be joined into one using either list, tuple or dictionary. It can be seen in the above example that I have extracted the two search spaces from the args parameter.More about different hyperopt spaces can be found here: https://github.com/hyperopt/hyperopt/wiki/FMin#21-parameter-expressions

I am mentionong few here which we will be using later-

hp.choice(label, options): label is a string input which refers to the hyperparameter, and options will contain a list, one element will be returned from the list for that particular label.

hp.uniform(label, low, high): Again the label will contain the string referring to hyperparameter and returns a value uniformly between low and high. And when optimizing, this variable is constrained to a two-sided interval.

hp.lognormal(label, mu, sigma): Returns a value drawn according to exp(normal(mu, sigma)) so that the logarithm of the return value is normally distributed. When optimizing, this variable is constrained to be positive.

Now, we’ll use a more complex hyperspace with a sklearn classifier and iris dataset, remember all you have to do is extract the correct hyperparameter search space from list/tuple/dictionary, we’ll use dictionary here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from hyperopt import tpe, hp, fmin
from sklearn.metrics import mean_squared_error
iris = datasets.load_iris()
x = iris.data
y = iris.target
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2)

def objective_func(args):
 if args['model']==KNeighborsClassifier:
 n_neighbors = args['param']['n_neighbors']
 algorithm = args['param']['algorithm']
 leaf_size = args['param']['leaf_size']
 metric = args['param']['metric']
 clf = KNeighborsClassifier(n_neighbors=n_neighbors,
 algorithm=algorithm,
 leaf_size=leaf_size,
 metric=metric,
 )
 elif args['model']==SVC:
 C = args['param']['C']
 kernel = args['param']['kernel']
 degree = args['param']['degree']
 gamma = args['param']['gamma']
 clf = SVC(C=C, kernel=kernel, degree=degree,gamma=gamma)
 
 clf.fit(x_train,y_train)
 y_pred_train = clf.predict(x_train)
 loss = mean_squared_error(y_train,y_pred_train)
 print("Test Score:",clf.score(x_test,y_test))
 print("Train Score:",clf.score(x_train,y_train))
 print("\n=================")
 return loss
space = hp.choice('classifier',[
 {'model': KNeighborsClassifier,
 'param': {'n_neighbors':
 hp.choice('n_neighbors',range(3,11)),
 'algorithm':hp.choice('algorithm',['ball_tree','kd_tree']),
 'leaf_size':hp.choice('leaf_size',range(1,50)),
 'metric':hp.choice('metric', ["euclidean","manhattan",
 "chebyshev","minkowski"
 ])}
 },
 {'model': SVC,
 'param':{'C':hp.lognormal('C',0,1),
 'kernel':hp.choice('kernel',['rbf','poly','rbf','sigmoid']),
 'degree':hp.choice('degree',range(1,15)),
 'gamma':hp.uniform('gamma',0.001,10000)}
 }
 ])
best_classifier = fmin(objective_func,space,
 algo=tpe.suggest,max_evals=100)
print(best_classifier)

See, how I created the space, first I used hp.choice so one of the two dictionaries is returned, now since we have to take parameter as per the classifier, I used simple if else statement to see which classifier it is. Then I accordingly assigned hyperopt spaces to different variables, after doing that I defined the clf model, and fit x_train and y_train to the model.fmin function will iterate on different sets of algorithm and their hyperparameters and return the the set on which loss is minimum. Note that I have minimized the loss on x_train and y_train, you can use cross validation and stuff to prevent overfitting.

So, another wrapper built on hyperopt is Hyperopt-Sklearn check it out here: https://github.com/hyperopt/hyperopt-sklearn

However, TPE can be exploited more when we compute the models with different sets of hyper-parameters in parallel instead of serial, Tree structured form of TPE makes it easy to draw many candidates at a single time to evaluate based on maximization of Expected Improvement. Refer to https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf

to check out the mathematics. And currently Hyperopt-Sklearn does not support MongoDB Trials.

Installation of MongoDB is pretty easy, just go here https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/#import-the-public-key-used-by-the-package-management-system and follow the instruction.

Lets start using Mongo with an example, and minimize a simple sin function.

Code will be:

1
2
3
4
5
6
7
8
9
10
import math
from hyperopt import fmin, tpe, hp
from hyperopt.mongoexp import MongoTrials

trials = MongoTrials('mongo://localhost:27017/foo_db/jobs',
 exp_key='exp1')
best = fmin(math.sin, 
 hp.uniform('x', -2, 2),trials=trials,
 algo=tpe.suggest, max_evals=10)

Only one line needed to be changed, i.e we added a trials parameter, now we will go step by step on how to execute this code.

Start a local MongoDB serverTo start a mongodb server as daemon use commandsudo service mongod start

And this will get you the port number on which the server is createdlsof -i |grep mongo

Or you can also check the log files, but you’ll need root access,tail -f /var/log/mongodb/mongod.log

Usually by default port is localhost:27017.

Configure and Compile the fileAs can be seen in above example, a line calling MongoTrials was added

1
2
3
4
5
6
7
8
9
trials = MongoTrials('mongo://localhost/27017/foo_db/jobs',
 exp_key='exp1')
# first argument is the name of the local server, 
# edit the number of the port as per your server
# second argument is exp_key which is the key given to a 
# certain trail in database, if you want to compile 
# the program again on the same sever in the same database, 
# do remember to change the name of exp_key

See that you don’t remove jobs at the end of the path, current implementation requires jobs at the end of name of database.Compile the python file, it will hang untill it gets the worker to connect it the mongodb sever.

Running Hyperopt Mongo Worker

1
2
1
2
hyperopt-mongo-worker --mongo=localhost:27017/foo_db --poll-interval=0.1

This will start the worker, and the jobs assigned by the MongoTrials will be sent to the server which parallelizes the evaluations.

So, it was an easy three step process for a simple sin() function, however the problem occurs when you create an objective function, so in the final section we’ll discuss about the errors that we usually see while executing worker.Again lets consider an example where we want to minimize an objective function:

1
2
3
4
5
6
7
8
9
10
11
12
13
from hyperopt import hp, tpe, fmin
from hyperopt.mongoexp import MongoTrials
def function(args):
 x,y = args
 f = x**2 - y**2
 return f
space = [hp.uniform('x',-1,1),hp.uniform('y',-2,3)]
trials = MongoTrials('mongo://localhost/27017/foo_db/jobs',
 exp_key='exp1')
best = fmin(function,space, 
 trials=trials,algo=tpe.suggest,max_evals=10)
print(best)

So, start the Mongo server, compile the file, but when you execute the hyperopt-mongo-worker command you might face this kind of error:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
INFO:hyperopt.mongoexp:PROTOCOL mongo
INFO:hyperopt.mongoexp:USERNAME None
INFO:hyperopt.mongoexp:HOSTNAME localhost
INFO:hyperopt.mongoexp:PORT 27017
INFO:hyperopt.mongoexp:PATH /foo_db/jobs
INFO:hyperopt.mongoexp:DB foo_db
INFO:hyperopt.mongoexp:COLLECTION jobs
INFO:hyperopt.mongoexp:PASS None
INFO:hyperopt.mongoexp:Error while unpickling. Try installing dill via "pip install dill" for enhanced pickling support.
INFO:hyperopt.mongoexp:job exception: 'module' object has no attribute 'obj'
Traceback (most recent call last):
 File "/home/greatskull/anaconda2/bin/hyperopt-mongo-worker", line 6, in <module>
 sys.exit(hyperopt.mongoexp.main_worker())
 File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1302, in main_worker
 return main_worker_helper(options, args)
 File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1249, in main_worker_helper
 mworker.run_one(reserve_timeout=float(options.reserve_timeout))
 File "/home/greatskull/anaconda2/lib/python2.7/site-packages/hyperopt/mongoexp.py", line 1064, in run_one
 domain = pickle.loads(blob)
AttributeError: 'module' object has no attribute 'obj'
INFO:hyperopt.mongoexp:exiting with N=9223372036854775803 after some consecutive exceptions

So to solve this error, make a seperate file(eg. objective.py) for the objective function:

1
2
3
4
5
def function(args):
 x,y = args
 f = x**2 - y**2
 return f

and call it inside the file(hyperopt.py) where you are using hyperopt

1
2
3
4
5
6
7
8
9
10
from hyperopt import hp, tpe, fmin
from hyperopt.mongoexp import MongoTrials
from objective import function
space = [hp.uniform('x',-1,1),hp.uniform('y',-2,3)]
trials = MongoTrials('mongo://localhost/27017/foo_db/jobs',
 exp_key='exp1')
best = fmin(function,space, 
 trials=trials,algo=tpe.suggest,max_evals=10)
print(best)

Still one more step to go,Copy the file objective.py to the file path where hyperopt-mongo-worker is present, mine was this one- /home/greatskull/anaconda2/bin/ as you can see in the error above.

Now compile the hyperopt.py to and connect it to worker again using same worker command

1
2
1
2
hyperopt-mongo-worker --mongo=localhost:27017/foo_db --poll-interval=0.1

It should work now.

If you plan to save results, like for each iteration or something from objective.py file, check the current file path while executing the program by

1
2
3
import os
print(os.getcwd())

Doubts? Ask in comments.

Tanay Agrawaltanay_agrawal@hotmail.comMachine Learning/Deep Learning Enthusiast