# Selected Publications

Are Abstracts Enough for Hypothesis Generation?

MOLIERE: Automatic Biomedical Hypothesis Generation System

# Recent Posts

### Basic Iterative Numeric Optimization

Today in a class, we were asked to write an iterative solver for numerical equations. Now, many students in the class did not have an optimization background, so for the benefit of everyone, I want to share a simple overview of this exercise and how to go about solving it.

The problem was stated as follows:

$$M(a) = 2\times a + 14$$ $$G(b) = b - 2$$

And our goal was to find some solution $x$ such that $M(x) = G(x)$. Additionally, we were supposed to do so iteratively, so just solving the system of equations was out of the question. This is because our next exercise would have a different $M$ and $G$, so our code should be able to support whatever.

For the sake of generalization, my solution here will assume only the $M$ and $G$ are continuous, but I will not assume we know their derivatives. Additionally, I will be writing my code in python, simply because I find that it is easier for anybody to understand. Knowledge of python, hopefully, won’t be necessary. But first, lets go over some aspects of the problem…

### Moliere Software Overhaul

Over the last couple of days, I have retooled MOLIERE into a system that anyone1 can deploy it and run their own queries. The code is over at the default repo2 and should be pretty straightforward, the code even downloads raw data itself! Just run build_network.py and point it at a big parallel file systen — in a few hours you’ll have your very own knowledge network!

### Mass Validation of Hypothesis Generation Systems

We have publicly available code and experimental data. Our validation information has been incorporated to THIS REPO.

Our experimental data and results can be found in THIS OTHER REPO.

But, we are still working on uploading all of the supporting data.

# Recent Publications

### Are Abstracts Enough for Hypothesis Generation?

Are abstracts enough for HG, or does it need full-text papers? How many papers does an HG system need to make valuable predictions? What effect do corpus size and document length have on HG results? To answer these questions we train multiple versions of knowledge network-based HG system, Moliere, on varying corpora in order to compare challenges and tradeoffs in terms of result quality and computational requirements.
arXiv.org, 2018.

### Validation and Topic-driven Ranking for Biomedical Hypothesis Generation Systems

Literature underpins research, providing the foundation for new ideas. But as the pace of science accelerates, many researchers struggle to stay current. Some scientists leverage hypothesis generation systems, but, many resort to expert analysis to validate such systems. We devise a validation challenge, and adapt MOLIERE through a number of new metrics to rise to our challenge.
arXiv.org, 2018.

### MOLIERE: Automatic Biomedical Hypothesis Generation System

Hypothesis generation is becoming a crucial time-saving technique which allows biomedical researchers to quickly discover implicit connections between important concepts. We discover these connections with our tool MOLIERE.
KDD’17, 2017.

### To Agile or Not to Agile: A Comparison of Software Development Methodologies

Since the Agile Manifesto, many organizations have explored agile development methods to replace traditional waterfall development. Interestingly, waterfall remains the most widely used practice, suggesting that there is something missing from the many “flavors” of agile methodologies. We explore seven of the most common practices to explore this, and evaluate each against a series of criteria centered around product quality and adherence to agile practices. We find that no methodology entirely replaces waterfall and summarize the strengths and weaknesses of each. From this, we conclude that agile methods are, as a whole, unable to cope with the realities of technical debt and large scale systems. Ultimately, no one methodology fits all projects.
arxiv.org, 2017.

### Rapid Replication of Multi-Petabyte File Systems

By utilizing General Parallel File System (GPFS) policy scans, distsync finds changed files without navigating between directories. This allows our tool to more efficiently synchronize large out of date file systems.
[WIP] PSDW’15, 2015.

# Projects

#### MOLIERE: Automatic Biomedical Hypothesis Generation

We discover potential connections within existing scientific literature. Currently, we are preparing MOLIERE for large-scale public usage.

#### Bridge Health Classification With Automotive Sensing

We classify bridge health using Support Vector Regression and other Machine Learning Techniques. In partnership with Clemson Civil Engineers.

#### Learn to Program Python

An introductory video series for people absolutly new to programming. Learn the basics of programming!

#### Rapid Replication of Multi-Petabyte File Systems

Distsync is a parallel storage system syncronization utility which leverages cluster computing capabilities to unify large out-of-sync distributed file systems.

# Contact

• justin@sybrandt.com
• McAdams Hall Office 224. McMillan Rd, Clemson, SC 29631
• Email for appointment