Machine Learning Tools

Prerna Aditi
5 min readJun 23, 2018

As we know, tools are significant component of machine learning. These tools make machine learning quite easier. It provides us the capabilities which can be used on many steps of our machine learning projects.

Why should we use machine learning tools?

· Tools can automate each step in the machine learning procedure. This means the time between ideas to results is greatly shortened with the help of tools.

· We can spend our time in choosing the good tools instead of researching and implementing techniques to implement.

· We can spend extra time to get better results or we can work on more projects rather than spending time on creating our tools.

How to decide which tool is better?

It is must to know which one is the better tool for better working. We can decide by following properties:

· Interface: Machine learning tools provide an intuitive interface. The interface should have a good mapping and aptness for the task.

· Best Practice: Machine learning tools represents best practices for process, configuration and implementation. Examples: automatic configuration of machine learning algorithms and good procedure built into the configuration of the tool.

· Trusted Resource: The good machine learning tool is updated and is well-maintained by a specific group of people.

Some machine learning tools:

1. Scikit-Learn: As we know python is quite popular these days for math, science and statistics due to its easy adoption and huge libraries. Scikit is built on several Python packages like NumPy, Pandas, SciPy and Matplotlib for math and science. So, we can say it is machine learning library for python. By the help of this tool we can perform data analysis. Since it is an open source tool therefore it is easily accessible for everyone.

2. TensorFlow: It is an open source software library developed by Google Brain Team within the Google’s machine intelligence research organization. It is used as an internal framework for deep learning. By using TensorFlow we can build any type of neural network. It allows communication and collaboration between researchers. TensorFlow provides lot of support to developers and to those not well-known with the platform, or Python by providing documentation, tutorials and online resources. It allows to research from one location for developing the idea and then sending the code so that another user can use the code from another location.

3. Shogun: Though it is written in C++ but it can be used in Java, Python, Ruby, C# and Matlab. Shogun is capable of processing huge datasets consisting of up to 10 million samples as it was developed with bioinformatics applications. It supports pre-calculated kernels. We can also use a combined kernel. A combined kernel consists of a linear combination of arbitrary kernels over different domains.

4. Spark Mlib: It is the machine learning library for Apache Spark and Apache Hadoop and has many algorithms and data types. It has the objective of making machine learning easy and scalable. The primary language for working in Mlib is Java but Python users can also connect Mlib with the NumPy Library. It has common algorithms and utilities including classification, collaborative filtering, clustering and regression along with low level optimization primitives.

5. Accord.Net Framework: Accord is a machine learning and signal processing framework for .Net. It consists of a set of libraries that helps in processing audio signals and image streams. It also has:

· Vision processing algorithms that can be used for face detection or for tracking moving objects.

· Libraries which provides more predictable scope of machine learning functions, from neural network to decision-tree systems.

6. GoLearn: Made by Stephen Whitworth, GoLearn is a machine learning library for Google’s Go language. It has been created for

· Simplicity: That means, the method in which the data is handled and loaded in the library

· Customizability: It lies in how some of the data structures can be extended in an application.

7. Deeplearn.js: It is an open source hardware-accelerated JavaScript library for machine learning that brings performant machine learning building blocks to the web allowing training neural networks in a browser or running pre-trained models in inference models. Though it was developed by Google Brain PAIR team for building a powerful and interactive machine learning tool for browser but now it can be used for education, art projects and model understanding.

We can install this library via yarn:

yarn add deeplearn

or npm:

npm install deeplearn

8. PredictionIO: Built on the top of open source stack it allows data scientists and developers for creating predictive engines for the machine learning tasks. We can install it with Apache Spark, Mlib, Hbase for accelerating machine learning infrastructure. Once deployed as the web-service it can respond to dynamic queries in real-time and can also gather comprehensive predictive analysis by unifying data from multiple platforms in batches. It helps in creating machine learning engines by providing template systems.

9. Weka: Developed at University of Waikato, New Zealand, Weka stands for Waikato Environment for Knowledge Analysis. Written in Java this is a machine learning algorithms made for data mining and has free availability under the GNU General Public License. It consists of collection of visualization tools and algorithms for data analysis and predictive modeling along with graphical user interface. It was initially designed for as a tool for analyzing agriculture domain data but now the recent version (fully Java based, Weka 3) is used in many various application areas, particularly in educational purposes and research.

10. H2O: Designed by it is an open source software tool that is written in Java, Python and R programming language and is embedded with machine learning platform for businesses and developers. It is built to make it easy for the languages developers to apply the machine learning and predictive analytics. The datasets in the Apache Hadoop and cloud file systems can be analyzed by using H2O. It is available on operating systems such as: Windows, Linux and MacOS.

11. Torch: Based on the Lua programming language torch is a script language and an open source machine learning library. It consists of a variety of algorithms for deep machine learning. Before it was acquired by Google, Torch was used by DeepMind and now it is used by Facebook AI Research Group.

12. Apache Mahout: Implemented on top of Apache Hadoop using the MapReduce standard Apache Mahout is the free and open source project of the Apache Software Foundation. It is developed with the objective of making scalable and distributed machine learning algorithms for areas like collaborative filtering, classification and clustering. It offers Java libraries and collections for the different kinds of mathematical operation. Mahout provides tools to find meaningful patterns in Big Data sets. By using Apache Mahout we can do following works:

· Build a recommendation engine: Mahout offers us tools for building a recommendation engine.

· Clustering: Mahout supports several clustering algorithms, such as Canopy, k-Means, Mean-Shift, Dirichlet, etc.

13. Amazon Machine Learning (AML): It is a machine learning service that has many visualization tools and wizards which helps in creating high-end sophisticated machine learning models without any necessity for learning complex ML algorithms and technologies. Through AML, the predictions for applications can be attained using easy APIs without custom prediction generation code or complex infrastructure.