
Which to pick? Well, that’s a combination of mathematical background, experimentation, and knowing the data. Within each of these broad categories are often several alternative algorithms or derivatives of algorithms. This is not all you can do (by far) but these are some of the common uses along with the algorithms to accomplish them. There is a lecture on collaborative filtering from the Stanford Machine Learning course, which started on July 10 (but you can still get in). You see this on Google Maps or Yelp when you search for restaurants (you can then filter by service, food, decor, good for kids, romantic, nice view, cost). This can be done on multiple properties for recommender systems. In a social or e-commerce setting, if you use the likes and dislikes of various users, you can figure out which is the “best” result for most users or even specific sets of people. If enough people click on the second cat picture it must be better than the first cat picture. The company I work for uses this to improve search results. Man, collaborative filtering is a popularity contest.
Clustering restaurant category yelp api series#
Here’s a course on Coursera with a lecture series specifically on clustering and yes, they cover k-means for that next interview, but I find it slightly creepy when half the professor floats over the board (you’ll see what I mean). There are different flavors of that, such as sorting customers into credit or retention risk groups, or into buying groups (fresh produce or prepared foods), but it is also used for things like fraud detection. Customer segmentation is a very common use. The big difference between “clustering” and “classification” is that we don’t know the labels (or groups) up front for clustering. Though different from classification, clustering is often used to sort people into groups. Because of these and other complexities there are a lot of different “clustering” algorithms. There may be smaller clusters in the big cluster. There may be one big one and one small one on the side. You can “see” these clusters but there may be clusters that are close together. If you take a set of attributes you may find “groups” of points that seem to be pulled together by gravity. If k-means clustering is the only thing out of someone’s mouth after you ask them about machine learning, you know that they just read the crib sheet and don’t know anything about it. In Coursera’s Machine Learning Specialization there is a course specifically on this that started on July 10, but I’m sure you can still get in. Essentially classification “learns” to label things based on labels applied to past data and can apply those labels in the future. You may also be able to diagnose patients or determine which customers are likely to cancel their broadcast cable subscription (people who don’t watch live sports). You’ve run into this with spam filters, which use a list of words spam usually has. If you think of someone looking through a set of forms and sorting them into categories, this is classification. The trick here is coming up with the attribute that matches the “class,” and there is no right answer there. What are you? If you take a set of attributes you can get the computer to sort “things” into their right category.

You’re just in time to enroll in a Basic Statistics Course on Coursera.

That “fit” might be “did you like it” or did the new algorithm provide “better” results than the old one. Spark’s hypothesis testing allows you to do a Pearson chi-squared or a Kolmogorov–Smirnov test to see how well something “fits” or whether the distribution of values is “normal.” This can be used most anywhere we have two series of data. On average the new seat might be slightly more comfortable but if no one over 6 feet tall buys the car anymore, we’ve failed somehow. At the other end, taller customers will say it is really uncomfortable to the point that they wouldn’t buy the car and the people in the middle balance out the difference. At one end the shorter customers may say the seat is much more comfortable. Consider if a car manufacturer replaces the seat in a car and surveys customers on how comfortable it is. Frequently in business we assume that if two averages are the same then the two things are roughly equivalent. Mainly you’ll use these APIs for A-B testing or A-B-C testing.
