At The Data Incubator, we pride ourselves on having the most up to date data science curriculum available. Much of our curriculum is based on feedback from corporate and government partners about the technologies they are using and learning. In addition to their feedback we wanted to develop a data-driven approach for determining what we should be teaching in our data science corporate training and our free fellowship for masters and PhDs looking to enter data science careers in industry. Here are the results.
Below is a ranking of 23 open-source deep learning libraries that are useful for Data Science, based on Github and Stack Overflow activity, as well as Google search results. The table shows standardized scores, where a value of 1 means one standard deviation above average (average = score of 0). For example, Caffe is one standard deviation above average in Github activity, while deeplearning4j is close to average. See below for methods.
Results and Discussion
The ranking is based on equally weighing its three components: Github (stars and forks), Stack Overflow (tags and questions), and Google Results (total and quarterly growth rate). These were obtained using available APIs. Coming up with a comprehensive list of deep learning toolkits was tricky – in the end, we scraped five different lists that we thought were representative (see methods below for details). Computing standardized scores for each metric allows us to see which packages stand out in each category. The full ranking is here, while the raw data is here.