# Posts

### Math of Neural Networks

I gave this talk both internally to my lab group, as well as at the Clemson SIAM Graduate Student Seminar.

### BigData 2018

I’m here in Seattle, WA attending the IEEE International Conference on Big Data. I’ll be presenting two recent works. The first, presents a new method to validate hypothesis generation systems. The second, uses that method to determine the quality of input papers needed to make good conclusions. With two papers in the same conference, I will be giving a double-length talk! If you’re around, I’ll be at the end of the L12 session Wednesday morning.

### Moliere Poster -- Google PIRC

I have the chance to present my work at the Google Ph.D. Intern Research Conference (PIRC). This poster represents all of the work we have added to the Moliere project since our original paper last year.

### Basic Iterative Numeric Optimization

Today in a class, we were asked to write an iterative solver for numerical equations. Now, many students in the class did not have an optimization background, so for the benefit of everyone, I want to share a simple overview of this exercise and how to go about solving it.

The problem was stated as follows:

$$M(a) = 2\times a + 14$$ $$G(b) = b - 2$$

And our goal was to find some solution $x$ such that $M(x) = G(x)$. Additionally, we were supposed to do so iteratively, so just solving the system of equations was out of the question. This is because our next exercise would have a different $M$ and $G$, so our code should be able to support whatever.

For the sake of generalization, my solution here will assume only the $M$ and $G$ are continuous, but I will not assume we know their derivatives. Additionally, I will be writing my code in python, simply because I find that it is easier for anybody to understand. Knowledge of python, hopefully, won’t be necessary. But first, lets go over some aspects of the problem…

### Moliere Software Overhaul

Over the last couple of days, I have retooled MOLIERE into a system that anyone1 can deploy it and run their own queries. The code is over at the default repo2 and should be pretty straightforward, the code even downloads raw data itself! Just run build_network.py and point it at a big parallel file systen — in a few hours you’ll have your very own knowledge network!

### Mass Validation of Hypothesis Generation Systems

We have publicly available code and experimental data. Our validation information has been incorporated to THIS REPO.

Our experimental data and results can be found in THIS OTHER REPO.

But, we are still working on uploading all of the supporting data.

### Run Moliere Yourself

I have finally had time to package Moliere, our Automatic Hypothesis Generation System, into a single easy-to-use package!

Take a second to check it out at my repo.

### Document Embedding

In a previous post I talked about how tools like word2vec are used to numerically understand the meanings behind words. In this post, I’m going to continue that discussion by describing ways we can find numerical representations for whole documents. So, I’ll be assuming you’re already familiar with the concept of word embeddings. Why do we need document embeddings? Many real-world applications need to understand the content of text which is longer than just a single word.

### Agile Project Management in Google Sheets

I think its way to hard to manage small projects. There are so many project planning platforms out there and they typically fall into one of two major pitfalls for small teams. Either they are free and simplistic, i.e. Trello, or they are expensive and complicated, i.e. Jira. Of course, there are millions of people who make these systems work for them everyday, but in my experience I find that it is hard for a small, well-intentioned group to actually use these.

### Word Embedding Basics

Recently, in text mining circles, a new method of representing words has taken off. This has been due, in a large part, to recent papers from Mikolov et al. and tools like word2vec 1. Since then, many other projects have applied this concept to a wide variety of areas within data mining 2. So what is all the hype about? What are these embeddings and why do we need them?