The task of grouping data points into groups (clusters) such that points in a group are more ‘similar’ to each other than to points outside the group is called clustering. But how does one know if a data point is similar to another point or not? This act of defining similarity is what distinguishes various clustering methods from each other — K-Means defines similarity by the closeness of a data point to the centroid of the clusters while DBSCAN defines similarity by grouping together data points that are within the same density region.
Given the rise of ‘big-data’, effectively managing and working with that stored data becomes very important — you’re only able to ask better questions if you can quickly glean insights from the data you have. Unlike smaller datasets that can fit easily into a local computer’s hard disk, big data by definition won’t fit into local storage and must instead be stored in a database — a structured set of information that can be queried, accessed, and updated in various ways.
Over the past few decades, the digitization of our society has led to massive amounts of data being stored. Combining this increase in the scale of stored information with advances in hardware computational power and algorithmic innovations, the field of artificial intelligence (AI) has jumped into the spotlight as machines seem to possess the ‘magical’ ability to learn without being told explicitly what to do.
Examples of impressive feats performed by machines include: AlphaGo defeating 9dan rank Go champion Lee Sedol, self-driving cars navigating city streets, and a computer learning to beat Super Mario World by itself.