Neural Networks - University of Washington

Neural Networks - University of Washington

Neural Networks Geoff Hulten The Human Brain (According to a computer scientist) Network of ~100 Billion Neurons Each ~1,000 10,000 connections Send electro-chemical signals Activation time ~10 ms second Image from Wikipedia ~100 Neuron chain in 1 second Artificial Neural Network Grossly simplified approximation of how the brain works Artificial Neuron (Sigmoid Unit) Features used as input to an initial set of artificial neurons Output of artificial neurons used as input to others

Output of the network used as prediction Mid 2010s image processing ~50-100 layers ~10-60 million artificial neurons Example Neural Network Fully connected network Single Hidden Layer 2313 weights to learn 1 connection per pixel + bias 1 connection per pixel + bias ( =1) 1 connection per pixel + bias 576 Pixels (Normalized) Output Layer

1 connection per pixel + bias 5 Weights 2,308 Weights Hidden Layer Example of Predicting with Neural Network 0.5 -1.0 1.0 0.0 ~0.5 1.0 1.5 0.5 1.0

~0.75 0.25 1.0 1.0 0.5 -1.0 1.0 ( =1 )= 0.82 Whats Going On? Very limited feature engineering on input Hidden nodes learn useful features instead Positive Weight?

( =1) Weights from Hidden Node 1 Negative Weight? Input Image (Normalized) Weights from Hidden Node 2 Logistic Regression with Responses as input Another Example Neural Network Fully connected network Two Hidden Layers 2333 weights to learn 1 connection per pixel + bias 1 connection per pixel

+ bias ( =1) 1 connection per pixel + bias 576 Pixels (Normalized) Output Layer 1 connection per pixel + bias 2,308 Weights 5 Weights 20 Weights Hidden Layer Hidden Layer Output Layer Single network (training

run), multiple tasks Hidden nodes learn generally useful filters () () () 576 Pixels (Normalized) (h) Hidden Layer Output Layer Neural Network Architectures/Concepts Fully connected layers Recurrent Networks (LSTM & attention) Convolutional Layers

Embeddings MaxPooling Residual Networks Activation (ReLU) Batch Normalization Softmax Dropout Will explore in more detail later Loss well use for Neural Networks .5 .1 1

0 .1 .95 1 1 All sorts of options for loss functions for Neural Networks Optimizing Neural Nets Back Propagation Gradient descent over entire networks weight vector Easy to adapt to different network architectures Converges to local minimum (usually wont find global minimum) Training can be very slow! For this weeks assignmentsorry For next week well use a package In general very well suited to run on GPU

1. Forward Propagation Conceptual Backprop 2. Back Propagation 3. Update Weights h1 1.0 ~0.5 Figure out how much each part contributes to the error. 0.5 1 ~0.75 h 2 Step each weight to reduce

the error it is contributing to ~0.82 Figure out how much error the network makes on the sample: 1. Forward Propagation Backprop Example 2. Back Propagation 3. Update Weights =0.1 0.5 -1.0 h1

1.0 = (1 )( ) =0.027 ~0.5 1.0 ~0.82 Error = ~0.18 0.25 0.5 1 1.0

~0.75 1.0 005 005 25 0.5 -1.0 1.0 h 2 h=h (1h ) h 2=.005

h Backprop Algorithm Initialize all weights to small random number (-0.05 0.05) While not time to stop repeatedly loop over training data: Input a single training sample to network and calculate for every neuron Back propagate the errors from the output to every neuron Update every weight in the network Stopping Criteria: # of Epochs (passes through data) Training set loss stops going down Accuracy on validation data Backprop with Hidden Layer 1. (or multiple outputs) 2. Back Propagation 3. Update Weights

+) h1,1 1,1 2,1 Forward Propagation h 2,1 1,1 2,2 1.0 0.5 1 h1,2 h 2,2

= (1 )( ) h=h (1h ) h Stochastic Gradient Descent Gradient Descent Calculate gradient on all samples Step Per Sample Gradient Stochastic Gradient Descent

Calculate gradient on some samples Step Stochastic can make progress faster (large training set) Stochastic takes a less direct path to convergence Gradient Descent Stochastic Gradient Descent Batch Size: N instead of 1 Local Optimum and Momentum Local Optimum Loss Why is this okay? In practice: Neural networks overfit Momentum

Power through local optimums Converge faster (?) Parameters Dead Neurons & Vanishing Gradients Neurons can die * Large weights cause gradients to vanish Test: Assert if this condition occurs What causes this Poor initialization of weights Optimization that gets out of hand Input variables unnormalized What should you do with Neural Networks? As a model (similar to others weve

learned) Fully connected networks Few hidden layers (1,2,3) A few dozen nodes per hidden layer Tune # layers Tune # nodes per layer Do some feature engineering Be careful of overfitting Simplify if not converging Leveraging recent breakthroughs Understand standard architectures Get some GPU acceleration Get lots of data Craft a network architecture More on this next class Summary of Artificial Neural Networks Model that very crudely approximates the way human brains

work Neural networks learn features (which we might have hand crafted without them) Each artificial neuron similar to linear model, with non-linear activation function Many options for network architectures Neural networks are very expressive, can learn complex concepts (and overfit) Backpropagation is a flexible algorithm to learn neural networks

Recently Viewed Presentations

  • Toward Object Discovery and Modeling via 3-D Scene Comparison

    Toward Object Discovery and Modeling via 3-D Scene Comparison

    Toward Object Discovery and Modeling via 3-D Scene Comparison. Evan Herbst, Peter Henry, XiaofengRen, Dieter Fox. University of Washington; Intel Research Seattle. Overview. Goal: learn about an environment by tracking changes in it over time.
  • Überschrift 01 - KTWSystems

    Überschrift 01 - KTWSystems

    Aircraft turbines, for example from Rolls-Roye or Pratt & Whitney, have a high purchase price, high MRO* costs and high fuel consumption! For example, for a turbine with 450 hp, the total cost at 5,000 flight hours is approx. 1.6...
  • Kevin Flanigan, Ph.D. West Chester University kflanigan@wcupa.edu Why

    Kevin Flanigan, Ph.D. West Chester University [email protected] Why

    All definitions are NOT equal Luminous - emitting light, especially self-generated light; lucid, resplendent, incandescent, refulgent Some definitions define an unknown word with OTHER unknown words Student-friendly definitions PLUS CONTEXT Longman Dictionary of Contemporary English LUMINOUS - shining in the...
  • Response of First-Order Circuits - Faculty Pages

    Response of First-Order Circuits - Faculty Pages

    Response of First-Order Circuits RL Circuits RC Circuits The Natural Response of a Circuit The currents and voltages that arise when energy stored in an inductor or capacitor is suddenly released into a resistive circuit.
  • Hammett Parameters - CCHF

    Hammett Parameters - CCHF

    The Hammett Plot is a type of Linear Free-Energy Relationship (LFER) analysis designed to model the electronic effect of substituents on aromatic systems (in the para and meta positions only). Information gathered can be used to probe the mechanism of...
  • ORC Update - What's New and Different for 2012

    ORC Update - What's New and Different for 2012

    What is the ORC? A $1.7 million collection of curricular-aligned, authoritative digital resources licensed on behalf of all Alberta: K-12 students and their parents . School staff. Pre-service teachers . Public library staff * Funded by a yearly Grant-In-Aid from...
  • Scaling-up HIV Testing among African American & Hispanic

    Scaling-up HIV Testing among African American & Hispanic

    Conducted monthly process monitoring calls and data review (QA/QC) Provided training and TA. Site visits, webinars, conference calls. Utilized "fee-for-service" contract with partners. Payment per tests, positives, and linkage - structured to incentivize MTI goals
  • Blue Sky Strategy for SmartTracks - CDE

    Blue Sky Strategy for SmartTracks - CDE

    Strategies for Creating Safe, Inclusive Schools. Student-led groups that focus on creating inclusive schools (e.g. Gay Straight Alliances, No Place for Hate) Inclusive curriculum and classroom materials (e.g. Facing History and Ourselves) Inclusive celebration of events and holidays (e.g. African...