Run Moliere Yourself

I have finally had time to package Moliere, our Automatic Hypothesis Generation System, into a single easy-to-use package!

Take a second to check it out at my repo.

System Requirements

  • gcc (something recent enough to support c++11)
  • mpich (mpich 1, NOT 2, needs to provide mpicxx)
  • python 3
  • preferred: some sort of parallel file system

Install Process

The install process is pretty informal. The following steps are how I would setup Moliere on a linux system.

export MOLIERE_HOME=<some install directory>
mkdir -p $MOLIERE_HOME
git clone .

Next, you just need to get the data file into $MOLIERE_HOME/data. The latest data we have avalible can be found in the following link:

Get Data Here (Google Drive Link)


Okay, now you have everything you need, assuming you ran make, setup the $MOLIERE_HOME environment variable, and downloaded the provided data in to $MOLIERE_HOME/data. To run our system, you will be executing Note, feel free to move this file anywhere you would like, as long as $MOLIERE_HOME is set, you’re good to go.

If you run -h you can see the options for the system. But, the general usage is going to be -n $TOPIC_COUNT -m $TERM_A $TERM_B. Note that -n sets the number of topics (defaults to 100) and -m moves the resulting topic model and evaluation files from the local cache (default to /tmp) to the working directory.


The work shown here is primarily based off this paper so if you use our tool in research, please include the following citation:

 author = {Sybrandt, Justin and Shtutman, Michael and Safro, Ilya},
 title = {MOLIERE: Automatic Biomedical Hypothesis Generation System},
 booktitle = {Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
 series = {KDD '17},
 year = {2017},
 isbn = {978-1-4503-4887-4},
 location = {Halifax, NS, Canada},
 pages = {1633--1642},
 numpages = {10},
 url = {},
 doi = {10.1145/3097983.3098057},
 acmid = {3098057},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {hypothesis generation, mining scientific publications, topic modeling, undiscovered public knowledge},


comments powered by Disqus