In a previous post I talked about how tools like word2vec are used to numerically understand the meanings behind words. In this post, I’m going to continue that discussion by describing ways we can find numerical representations for whole documents. So, I’ll be assuming you’re already familiar with the concept of word embeddings.
Why do we need document embeddings? Many real-world applications need to understand the content of text which is longer than just a single word.
I think its way to hard to manage small projects. There are so many project planning platforms out there and they typically fall into one of two major pitfalls for small teams. Either they are free and simplistic, i.e. Trello, or they are expensive and complicated, i.e. Jira.
Of course, there are millions of people who make these systems work for them everyday, but in my experience I find that it is hard for a small, well-intentioned group to actually use these.
Recently, in text mining circles, a new method of representing words has taken off. This has been due, in a large part, to recent papers from Mikolov et al. and tools like word2vec 1. Since then, many other projects have applied this concept to a wide variety of areas within data mining 2. So what is all the hype about? What are these embeddings and why do we need them?