As interest in Artificial Intelligence (AI), and specifically Machine Learning (ML), grows and more engineers enter this popular field, the lack of de facto standards and frameworks for how work should be done is becoming more apparent. A new focus on optimizing the ML delivery pipeline is starting to gain momentum.
A common problem that comes up in machine learning is needing to find the l2-distance between two sets of vectors. For example, in implementing the k-nearest-neighbors algorithm, we have to find the l2-distance between the a set of test vectors, held in a matrix X (MxD), and a set of training vectors, held in a matrix X_train (NxD). Our goal is to create a distance matrix D (MxN) that contains the l2-distance from every test vector to every training vector. How can we do this efficiently?
Topics: Data science