Recent Posts

More Posts

I have finally had time to package Moliere, our Automatic Hypothesis Generation System, into a single easy-to-use package!

Take a second to check it out at my repo.


In a previous post I talked about how tools like word2vec are used to numerically understand the meanings behind words. In this post, I’m going to continue that discussion by describing ways we can find numerical representations for whole documents. So, I’ll be assuming you’re already familiar with the concept of word embeddings. Why do we need document embeddings? Many real-world applications need to understand the content of text which is longer than just a single word.


I think its way to hard to manage small projects. There are so many project planning platforms out there and they typically fall into one of two major pitfalls for small teams. Either they are free and simplistic, i.e. Trello, or they are expensive and complicated, i.e. Jira. Of course, there are millions of people who make these systems work for them everyday, but in my experience I find that it is hard for a small, well-intentioned group to actually use these.


Recent Publications

Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. We discover these connections with our tool MOLIERE.
KDD’17, 2017.

Since the Agile Manifesto, many organizations have explored agile development methods to replace traditional waterfall development. Interestingly, waterfall remains the most widely used practice, suggesting that there is something missing from the many “flavors” of agile methodologies. We explore seven of the most common practices to explore this, and evaluate each against a series of criteria centered around product quality and adherence to agile practices. We find that no methodology entirely replaces waterfall and summarize the strengths and weaknesses of each. From this, we conclude that agile methods are, as a whole, unable to cope with the realities of technical debt and large scale systems. Ultimately, no one methodology fits all projects., 2017.

By utilizing General Parallel File System (GPFS) policy scans, distsync finds changed files without navigating between directories. This allows our tool to more efficiently synchronize large out of date file systems.
[WIP] PSDW’15, 2015.


MOLIERE: Automatic Biomedical Hypothesis Generation

We discover potential connections within existing scientific literature. Currently, we are preparing MOLIERE for large-scale public usage.

Bridge Health Classification With Automotive Sensing

We classify bridge health using Support Vector Regression and other Machine Learning Techniques. In partnership with Clemson Civil Engineers.

Learn to Program Python

An introductory video series for people absolutly new to programming. Learn the basics of programming!

Rapid Replication of Multi-Petabyte File Systems

Distsync is a parallel storage system syncronization utility which leverages cluster computing capabilities to unify large out-of-sync distributed file systems.


  • McAdams Hall Office 224. McMillan Rd, Clemson, SC 29631
  • Email for appointment