## For a binary classification model, is there a way to impose an asymmetric cost function during the training process? - scikit-learn

I am trying to build a Neural Network in scikit-learn where the cost of a Type I error (false-positive) is more costly than a Type II error (false-negative). Is there a way to impose this during the training process (i.e. inputting a cost matrix)?
Thanks

## Related

### Neural networks for an imbalanced dataset

I have a very imbalanced dataset consisting of 186219 rows of data by 6 dimensions including 132 true positives against 186087 false positives, what types of neural network would you recommend to try? This spreadsheet in my google drive IPDC_algorithm_training_dataset contains my training dataset. If the value in output tab has a value of 100, that feature is a true positive, and if a feature has a value of 0 means that feature is a false positive. I am tied up with MATLAB now, so it would be more convenient for me if I use MATLAB for this problem.

With a dataset that imbalanced you have limited options. If you trained a neural network on the entire dataset, it'd achieve 99.9% accuracy just by always predicting false positives. You need to deal with that imbalance in some way, such as discarding (vast swathes of) false positive samples or weighting your loss function to account for the imbalance. With an imbalance as extreme as this, you'd probably need to apply both (along with regularisation to prevent overfitting the remaining data). In terms of what network type to use, you could try just a basic MLP (Multi-Layer Perceptron), at least as a baseline – there's no point in building a complicated architecture, with more parameters to train, with a very limited dataset. In reality, you'd probably be better off using a shallow learning algorithm, such as boosted trees or naive Bayes, or getting more data to enable use of a neural network. If new data is likely to remain as imbalanced, you'd need a very large amount of extra data.

### Output from a Neural Network model

I have created a couple of models in MATLAB Neural Network Toolbox with a hundred of inputs and 10 outputs that correspond to different classes. At the end, NN gives a plot regarding a performance which is a number. What does this measure correspond to? Is it sum of errors from each outputs? How can I know if NN is classifying well?

The performance metric depends on the performance function set in the neural network toolbox parameters. For instance, if performFcn is 'mse' then it will use the mean square error as the performance metric. See http://uk.mathworks.com/help/nnet/ug/analyze-neural-network-performance-after-training.html for more information on how Matlab sets these parameters. In general, when using anything like Neural Networks, it is important to understand what it is trying to optimise, and how, to avoid problems such as overfitting. There are a lot of parameters to tune! Have a look at this answer for more detailed information

### MATLAB idnlgrey multidimensional pem: parallelization

I am trying to do a parameter estimation of a nonlinear multidimensional dynamical model specified as an idnlgrey object. In particular, i'm using the 'lsqnonlin' estimator with the pem function. I'm satisfied both with accuracy and performance when fitting a model which is up to 8 dimensions. The problem with performance starts arising as long as the dimensionality grows (my objective whould be scaling up to some hundreds of dimensions). From the documentation I wasn't able to have a clear idea on whether pem itself can be run in parallel, nor it is clear if it can be considered a memory or CPU bound function. I wonder if I can take advantage of the parallelization toolbox.

### Finding best neural network structure using optimization algorithms and cross-validation in MATLAB

I'm using optimization algorithm to find best structure+inputs of a patternnet neural network in MATLAB R2014a using5-fold cross validation`. Where should i initialize weights of my neural network? *Position_1(for weight initialization)* for i=1:num_of_loops *Position_2(for weight initialization)* - repeating cross validation for i=1:num_of_kfolds *Position_3(for weight initialization)* - Cross validation loop end end I'm repeating 5-fold cross validation (because random selection of cross validation) to have more reliable outputs (average of neural network outputs). Which part is better for weight initialization (Position_1,Position_2 or Position_3) and why? thanks.

Weights are usually randomised when the Neural Network is constructed for training. I do not fully understand your question, but I believe what you are asking is 'When should the weights be initialised and why?'. I am also assuming that you are creating five different Neural Networks with different fold subsets of training data, with the results averaged to evaluate generalisation error. If the above is true, I believe that each individual Neural Net should be assigned different random weights (as defined by your weight range parameter). These should be assigned before the training commences on your NNs.

### Batch training of a very large data set using matlab neural network toolbox

I want to use MATLAB neural network toolbox for speech recognition in KTH data set. Now my training data is so large that I cannot load them into a large matrix for batch training at once. One solution I found is use incremental training using adapt with chunks of data at one time. But as far as I know, this may reduce the accuracy. I was using NICO toolkit earlier. In it we can give the input as names of file containing training data.It will read the files and do batch training.But I couldn't find such and option in MATLAB. Is there a way to do batch training for such large data sets in MATLAB ?

I would not recommend adaption for very large datasets. Adaptive learning is best for datasets that represent a changing relationship (dataset) over time. If you have access to the Parallel Computing Toolbox and MATLAB Distributed Computing Server you can use the Neural Network Toolbox to spread calculations and data across multiple machines. To increase the size of dataset that can be trained within a specific amount of RAM, use TRAINSCG (Scaled conjugate gradient, which uses less memory than Jacobian methods, or alternatively use TRAINBR (Bayesian Regularization Jacobian) with memory reduction (trades off time for memory space). Assuming you still don't have enough RAM, one possible solution is to train multiple networks on different random subsets of the data using the TRAINBR with memory reduction. TRAINBR attempts to find the simplest possible solution, yet each training session will likely find a quite different solution. After training several dozen (or more) neural networks on different sets of data the outputs of each network can be averaged. Essentially a lot of simple solutions are combined to find a complex relationship. Be sure to maintain some data which is not used to train any of the TRAINBR neural networks so that the generality of their combined (averaged) outputs can be independently measured.