Machine Learning Resources

Home/Research/Machine Learning Resources
Machine Learning Resources 2017-09-27T07:02:17+00:00

Articles and Publications
When I started this page in late 2015, I posted articles that stood out, those that were breakthroughs in the field of Machine Learning. Now, it is impossible to keep up as there are breakthroughs nearly every day, which is both incredible and incredibly overwhelming.

I do my best to keep the following links relevant, but do welcome feedback and suggestions should I have missed something obvious or new.

 

History and Foundation
Automated Design Using Darwinian Evolution and Genetic Programming (lecture) – February 18, 2009
John Koza, founder of GP, describes an automated “What You Want Is What You Get” process for designing complex structures based on the principles of natural selection, sexual recombination, and developmental biology.

Artificial Intelligence – Thinking Allowed (interview) by Jeffrey Mishlove – November 3, 2011
An interview with John McCarthy (1927-2011), inventor of LISP; discusses the history of artificial intelligence and the future role which non-monotonic reasoning will play in enabling computers to simulate the human mind.

The UC Berkeley School of Information provides an introduction to Machine Learning as applied to Data Science, “Whether you knew it or not, you’ve probably been taking advantage of the benefits of machine learning for years. Most would find it hard to go a full day without using … Amazon, Facebook, Netflix, or Google …”

Before you dive into the serious stuff, something just too much fun to pass up, a Machine Learning A Cappella – Overfitting Thriller!

Finally, if you want to play with one of the coolest things to ever come out of Deep Learning, Inceptionism (Deep Dream) by Google is worth a look, even if you just read the article. This repository contains IPython Notebook with sample code, complementing Google Research blog post about Neural Network art. See original gallery for more examples.

 

Articles and Publications
Articles in Machine Learning are not difficult to find. Simply search for your favorite topic, and you will be engulfed in an information overload in no time. For publications, the arXiv offers a repository of scientific papers in the fields of mathematics, physics, astronomy, computer science, quantitative biology, statistics, and quantitative finance. In particular, there are 3 sub-categories which appeal to those interested in the latest in Machine Learning:

And it is always interesting to see what comes out of Google Research and Google Brain, with their list of publications.

So many more, but this will get you going …

 

Tutorials, Lectures, and Classes

Machine Learning (Coursera) by Andrew Ng, Stanford University, 2012
A MUST for anyone who dives into Machine Learning. The entire series of lectures may be downloaded in a single archive here.

Deep Learning Specialization, Master Deep Learning, and Break into AI (Coursera) by Andrew Ng, 2017
This is the follow-up to Ng’s incredibly successful original course (which launched Coursera). You will learn the foundations of Deep Learning, understand how to build neural networks, and learn how to lead successful machine learning projects … You will master not only the theory, but also see how it is applied in industry. You will practice all these ideas in Python and in TensorFlow.

Machine Learning in Python by Raj Bandyopadhyay, Springboard, current
As an experienced data scientist, Raj applies machine learning, natural language processing, text analysis, graph analysis and other cutting-edge techniques to a variety of real-world problems, especially around detecting fraud and malicious activity in phone and network security. Raj co-founded Data Science ATL meetup group in Atlanta and a wildly successful summer internship program at Georgia Tech, Data Science for Social Good – Atlanta modeled after a similar program in Chicago.

Michael Nielsen, Recurse Center, current
Scientist, writer, and programmer Michael Nielsen provides a wealth of knowledge and expertise in Machine Learning. Author of “Reinventing Discovery”, “Quantum Computation and Quantum Information”, and “Neural Networks and Deep Learning“, a free online book explaining the core ideas behind artificial neural networks and deep learning (includes free Python code).

Neural Networks Demystified (animated) by Stephen Welch – Nov 4, 2014 with the code for each video also available.

Deep Learning TV YouTube channel presents a series of animated, professionally narrated introductions to a full range of Deep Learning topics, titled Deep Learning SIMPLIFIED.

Neural Networks and Deep Learning (free e-book) – January 2016
Neural networks and deep learning currently provide the best solutions to many problems in image recognition, speech recognition, and natural language processing. This book will teach you many of the core concepts behind neural networks and deep learning:

  • Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data
  • Deep learning, a powerful set of techniques for learning in neural networks

 

Software
The software stacks for Machine Learning are maturing. No longer does one need to struggle for weeks or months to develop his or her own code when so many packages are readily downloaded and ready to go. A great place to start is the machine learning showcase at github, and then the following …

SciKit Learn
The go-to for Machine Learning in Python, presenting a tried and tested, industry standard library of simple and efficient tools for data mining and data analysis built on NumPy, SciPy, and matplotlib on an open source, commercially usable, BSD license. The full package is available from github. SciKit Learn offers code examples and tutorials for:

  • Classification
  • Regression
  • Clustering
  • Dimensionality reduction
  • Model selection
  • Preprocessing

Caffe
A deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. Caffe offers expressive architecture encourages application and innovation, extensible code fosters active development, speed, and a community of academic research projects, startup prototypes, and even large-scale industrial applications in vision, speech, and multimedia.

Theano
A Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features:

  • Tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions.
  • Transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
  • Efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
  • Speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
  • Dynamic C code generation – Evaluate expressions faster.
  • Extensive unit-testing and self-verification – Detect and diagnose many types of errors.

Tensor Flow
An open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.

mxnet
A lightweight, portable, flexible distributed/mobile Deep Learning with Dynamic, mutation-aware data flow dep scheduler for Python, R, Julia, Scala, Go, Javascript and more http://mxnet.rtfd.org

sklearn-theano
Experiments with scikit-learn compatible estimators, transformers, and datasets for Theano.

Keras: Deep Learning library for Theano and TensorFlow
Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.

Digits by NVIDIA
If you just want to start with zero know how, NVIDIA Digits is an interface to Caffe DNN enviornment (an industry standard) is the best option. As of now DIGITS is designed to play with image kind of datasets. But if you know Caffe then you can easily tweak the parameters to work for any kind of datasets (time series, text etc). A very useful and functional interface to do professional work. The Github download is available here.

OpenAI Gym Beta
A toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Go.

Tinker With a Neural Network
Tinker With a Neural Network Right Here in Your Browser. Don’t Worry, You Can’t Break It. We Promise.

Genetic Programming Implementations (at geneticprogramming.com) and my personal favorite (a bit biased), Karoo GP, a Genetic Programming suite written in Python which provides both symbolic regression and classification analysis. TensorFlow enabled for multicore and GPU support, Karoo GP is a scalable platform with multicore support, designed to readily work with realworld data. No programming required.