From 太極
Jump to navigation Jump to search


Neural network

Types of artificial neural networks

neuralnet package

nnet package

OneR package

So, what is AI really?

h2o package


Explainable 'AI' using Gradient Boosted randomized networks Pt2 (the Lasso)

Survival data


GitHub: The top 10 programming languages for machine learning

Keras (high level library)

Keras is a model-level library, providing high-level building blocks for developing deep-learning models. It doesn’t handle low-level operations such as tensor manipulation and differentiation. Instead, it relies on a specialized, well-optimized tensor library to do so, serving as the backend engine of Keras.

Currently, the three existing backend implementations are the TensorFlow backend, the Theano backend, and the Microsoft Cognitive Toolkit (CNTK) backend.

On Ubuntu, we can install required packages by

$ sudo apt-get install build-essential cmake git unzip \
                  pkg-config libopenblas-dev liblapack-dev
$ sudo apt-get install python-numpy python-scipy python- matplotlib python-yaml
$ sudo apt-get install libhdf5-serial-dev python-h5py
$ sudo apt-get install graphviz
$ sudo pip install pydot-ng
$ sudo apt-get install python-opencv

$ sudo pip install tensorflow  # CPU only
$ sudo pip install tensorflow-gpu # GPU support

$ sudo pip install theano

$ sudo pip install keras
$ python -c "import keras; print keras.__version__"
$ sudo pip install --upgrade keras  $ Upgrade Keras

To configure the backend of Keras, see Introduction to Python Deep Learning with Keras.

TensorFlow (backend library)


Some terms

Machine Learning Glossary from

Dense layer and dropout layer

In Keras, what is a "dense" and a "dropout" layer?

Fully-connected layer (= dense layer). You can choose "relu" or "sigmoid" or "softmax" activation function.

Activation function

  • Artificial neural network -> Neural networks as functions [math]\displaystyle{ \textstyle f (x) = K \left(\sum_i w_i g_i(x)\right) }[/math] where K (commonly referred to as the activation function) is some predefined function, such as the hyperbolic tangent or sigmoid function or softmax function or rectifier function.
  • Rectifier/ReLU f(x) = max(0, x).
  • Sigmoid. Binary problem. Logistic function and hyperbolic tangent tanh(x) are two examples of sigmoid functions.
  • Softmax. Multiclass classification.


Convolutional network

Deep Learning with Python

Jupyter notebooks for the code samples of the book "Deep Learning with Python"

sudo apt install python3-pip python3-dev

sudo apt install build-essential cmake git unzip \
   pkg-config libopenblas-dev liblapack-dev
sudo apt-get install python3-numpy python3-scipy python3-matplotlib \
sudo apt install libhdf5-serial-dev python3-h5py
sudo apt install graphviz
sudo pip3 install pydot-ng

# sudo apt-get install python-opencv

sudo pip3 install keras

Keras using R

  • R Markdown Notebooks for "Deep Learning with R"
  • R interface to Keras
  • Python vs R
  • Derivative of a tensor operation: the gradient
    • Define loss_value = f(W) = dot(W, x)
    • W1 = W0 - step * gradient(f)(W0)
  • Stochastic gradient descent
  • Tensor operations:
    • relu(x) = max(0, x)
    • Each neural layer from our first network example transforms its input data:output = relu(dot(W, input) + b) where W and b are the weights or trainable parameters of the layer.

Training process:

  1. Draw a batch of X and Y
  2. Run the network on x (a step called the forward pass) to obtain predictions y_pred.
    • How many layers to use.
    • How many “hidden units” to chose for each layer.
  3. Compute the loss of the network on the batch
    • loss
    • optimizer: determines how learning proceeds (how the network will be updated based on the loss function). It implements a specific variant of stochastic gradient descent (SGD).
    • metrics
  4. Update all weights of the network in a way that slightly reduces the loss on this batch.
    • batch_size
    • epochs (=iteration over all samples in a batch_size of samples)

Keras (in order to use Keras, you need to install TensorFlow or CNTK or Theano):

  1. Define your training data: input tensors and target tensors.
  2. Define a network of layers (or model). Two ways to define a model:
    1. using the keras_model_sequential() function (only for linear stacks of layers, which is the most common network architecture by far) or
      model <- keras_model_sequential() %>%
        layer_dense(units = 32, input_shape = c(784)) %>%
        layer_dense(units = 10, activation = "softmax")
    2. the functional API (for directed acyclic graphs of layers, which let you build completely arbitrary architectures)
      input_tensor <- layer_input(shape = c(784))
      output_tensor <- input_tensor %>%
        layer_dense(units = 32, activation = "relu") %>%
        layer_dense(units = 10, activation = "softmax")
      model <- keras_model(inputs = input_tensor, outputs = output_tensor)
  3. Compile the learning process by choosing a loss function, an optimizer, and some metrics to monitor.
    model %>% compile(
      optimizer = optimizer_rmsprop(lr = 0.0001),
      loss = "mse",
      metrics = c("accuracy")
  4. Iterate on your training data by calling the fit() method of your model.
    model %>% fit(input_tensor, target_tensor, batch_size = 128, epochs = 10)

Custom loss function

Custom Loss functions for Deep Learning: Predicting Home Values with Keras for R


Docker RStudio IDE

Assume we are using rocker/rstudio IDE, we need to install some packages first in the OS.

$ docker run -d -p 8787:8787 -e USER=XXX -e PASSWORD=XXX --name rstudio rocker/rstudio

$ docker exec -it rstudio bash
# apt update
# apt install python-pip python-dev
# pip install virtualenv

And then in R,

install_keras(tensorflow = "1.5")

Use your own Dockerfile

Data Science for Startups: Containers Building reproducible setups for machine learning

Some examples

See Tensorflow for R from RStudio for several examples.

Binary data (Chapter 3.4)

  • The final layer will use a sigmoid activation so as to output a probability (a score between 0 and 1, indicating how likely the sample is to have the target “1”.
  • A relu (rectified linear unit) is a function meant to zero-out negative values, while a sigmoid “squashes” arbitrary values into the [0, 1] interval, thus outputting something that can be interpreted as a probability.
imdb <- dataset_imdb(num_words = 10000)
c(c(train_data, train_labels), c(test_data, test_labels)) %<-% imdb

# Preparing the data
vectorize_sequences <- function(sequences, dimension = 10000) {...}
x_train <- vectorize_sequences(train_data)
x_test <- vectorize_sequences(test_data)
y_train <- as.numeric(train_labels)
y_test <- as.numeric(test_labels)

# Build the network
## Two intermediate layers with 16 hidden units each
## The final layer will output the scalar prediction
model <- keras_model_sequential() %>% 
  layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>% 
  layer_dense(units = 16, activation = "relu") %>% 
  layer_dense(units = 1, activation = "sigmoid")
model %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = c("accuracy")
model %>% fit(x_train, y_train, epochs = 4, batch_size = 512)
## Error in py_call_impl(callable, dots$args, dots$keywords) : MemoryError: 
## 10.3GB memory is necessary on my 16GB machine

# Validation
results <- model %>% evaluate(x_test, y_test)

# Prediction on new data
model %>% predict(x_test[1:10,])

Multi class data (Chapter 3.5)

  • Goal: build a network to classify Reuters newswires into 46 different mutually-exclusive topics.
  • You end the network with a dense layer of size 46. This means for each input sample, the network will output a 46-dimensional vector. Each entry in this vector (each dimension) will encode a different output class.
  • The last layer uses a softmax activation. You saw this pattern in the MNIST example. It means the network will output a probability distribution over the 46 different output classes: that is, for every input sample, the network will produce a 46-dimensional output vector, where outputi is the probability that the sample belongs to class i. The 46 scores will sum to 1.
reuters <- dataset_reuters(num_words = 10000)
c(c(train_data, train_labels), c(test_data, test_labels)) %<-% reuters

model <- keras_model_sequential() %>% 
  layer_dense(units = 64, activation = "relu", input_shape = c(10000)) %>% 
  layer_dense(units = 64, activation = "relu") %>% 
  layer_dense(units = 46, activation = "softmax")
model %>% compile(
  optimizer = "rmsprop",
  loss = "categorical_crossentropy",
  metrics = c("accuracy")
history <- model %>% fit(
  epochs = 9,
  batch_size = 512,
  validation_data = list(x_val, y_val)
results <- model %>% evaluate(x_test, one_hot_test_labels)
# Prediction on new data
predictions <- model %>% predict(x_test)

Regression data (Chapter 3.6)

  • Because so few samples are available, we will be using a very small network with two hidden layers. In general, the less training data you have, the worse overfitting will be, and using a small network is one way to mitigate overfitting.
  • Our network ends with a single unit, and no activation (i.e. it will be linear layer). This is a typical setup for scalar regression (i.e. regression where we are trying to predict a single continuous value). Applying an activation function would constrain the range that the output can take. Here, because the last layer is purely linear, the network is free to learn to predict values in any range.
  • We are also monitoring a new metric during training: mae. This stands for Mean Absolute Error.
dataset <- dataset_boston_housing()
c(c(train_data, train_targets), c(test_data, test_targets)) %<-% dataset

build_model <- function() {
  model <- keras_model_sequential() %>% 
    layer_dense(units = 64, activation = "relu", 
                input_shape = dim(train_data)[[2]]) %>% 
    layer_dense(units = 64, activation = "relu") %>% 
    layer_dense(units = 1) 
  model %>% compile(
    optimizer = "rmsprop", 
    loss = "mse", 
    metrics = c("mae")
# K-fold CV
k <- 4
indices <- sample(1:nrow(train_data))
folds <- cut(1:length(indices), breaks = k, labels = FALSE) 
num_epochs <- 100
all_scores <- c()
for (i in 1:k) {
  cat("processing fold #", i, "\n")
  # Prepare the validation data: data from partition # k
  val_indices <- which(folds == i, arr.ind = TRUE) 
  val_data <- train_data[val_indices,]
  val_targets <- train_targets[val_indices]
  # Prepare the training data: data from all other partitions
  partial_train_data <- train_data[-val_indices,]
  partial_train_targets <- train_targets[-val_indices]
  # Build the Keras model (already compiled)
  model <- build_model()
  # Train the model (in silent mode, verbose=0)
  model %>% fit(partial_train_data, partial_train_targets,
                epochs = num_epochs, batch_size = 1, verbose = 0)
  # Evaluate the model on the validation data
  results <- model %>% evaluate(val_data, val_targets, verbose = 0)
  all_scores <- c(all_scores, results$mean_absolute_error)


An R Shiny app to recognize flower species

Google Cloud Platform


Notebooks from the Practical AI Workshop 2019

R interface to