RSM318H1 Chapter Notes - Chapter 1,2: Machine Learning, Unsupervised Learning, Labeled Data
Document Summary
Is the creation of intelligence by learning from large volumes of data. Either predicting variable that can take a continuum if values, or models used for classification. Data contains features (variables from which predictions are made) and labels (values of the target to be predicted) Goal is understanding environment represented by data better. Goal is to identify patterns, not to forecast. Have some data with labels and some without labels. Unlabeled data can be used in connection with labeled data to produce clusters to help prediction. Can predict for unlabeled data based on which cluster they belong to. Machine learning algorithm is interacting with environment and taking series of decisions. Danger that machine learning model works well for data used to build model but not for other data. Data used in model must be representative of situations to which model will be applied. Data that is different from sample data used to determine parameters of the model.