machine learning

What is machine learning?

Algorithms based on machine learning learn from data to solve issues that are too complicated for conventional programming.

Machine learning is a subfield of artificial intelligence that covers techniques or algorithms for constructing models automatically from data. In contrast to a system that completes a task by following explicit rules, a machine learning system acquires knowledge through experience. In contrast to a rule-based system, a machine learning system can be trained to improve its performance by exposing its algorithm to new data.

Typically, machine learning algorithms are classified into supervised (the training data are labeled with the answers) and unsupervised (the training data are not labeled with the answers) categories (any labels that may exist are not shown to the training algorithm). Classification (predicting non-numeric answers, such as the probability of a late mortgage payment) and regression are subcategories of supervised machine learning issues (predicting numeric answers, such as the number of widgets that will sell next month in your Manhattan store).

Unsupervised learning is further subdivided into grouping, association, and dimensionality reduction (projection, feature selection, and feature extraction).

Machine learning applications

Every day, we hear about applications of machine learning, but not all of them are complete triumphs. Self-driving cars are a good example, as their functions range from simple and reliable (parking assistance and lane-following on the highway) to sophisticated and questionable (full vehicle control in urban settings, which has led to several deaths).

In the games of checkers, chess, shogi, and Go, game-playing machine learning is highly successful, having defeated human world champions. Some language pairs function better than others, and many machine translations can yet be improved by human translators.

Automatic speech to text works reasonably well for those with standard accents, but not so well for people with certain strong regional or national accents; performance varies on the training sets utilized by vendors. Automatic social media sentiment analysis has a reasonable success rate, most likely because the training sets (such as Amazon product reviews, which pair a comment with a numeric score) are huge and easily accessible.

Automatic resume screening is a contentious subject. Amazon was forced to remove its internal system due to training sample biases that caused it to penalize all female job applications.

Other resume screening systems that are now in use may contain training biases that cause them to promote candidates who are "similar" to current employees in ways that are not permitted by law (e.g. young, white, male candidates from upscale English-speaking neighborhoods who played team sports are more likely to pass the screening). Microsoft and others focus their research efforts on removing latent biases from machine learning.

Automatic classification of pathology and radiology images has progressed to the point that it can aid (but not replace) pathologists and radiologists in detecting some types of abnormalities. In the meanwhile, facial recognition technologies are both contentious when they function well (because to privacy concerns) and less accurate for women and persons of color than for white men (because of biases in the training population).