Exploring GitHub, Docker, and Heroku

Photo by Paweł Czerwiński on Unsplash

I’ve been programming now for eight years, and it wasn’t until just months ago that I was able to answer a question I’ve had this whole time: “How do I share my project with someone?”

When I say “project,” I’m not talking about a single R script or a handful of bash commands − even 22-year old me could figure out copy and paste! I mean a project that has several files, perhaps in multiple languages, with external dependencies. Do I just throw it all into a zip folder? How do I deal with new versions of languages and packages…

Hands-on Tutorials

Impress your friends with SQLAlchemy and PyMongo

Artist’s interpretation of a MongoDB database. Photo by Joel Filipe on Unsplash

From ancient government, library, and medical records to present-day video and IoT streams, we have always needed ways to efficiently store and retrieve data. Yesterday’s filing cabinets have become today’s computer databases, with two major paradigms for how to best organize data: the relational (SQL) versus non-relational (NoSQL) approach.

Databases are essential for any organization, so it’s useful to wrap your head around where each type is useful. We’ll start with a brief primer on the history and theory behind SQL and NoSQL. But memorizing abstract facts can only get you so far − we’ll then actually create each type…

How to get more likes when you share your project on Instagram

Photo by Leio McLaren (leiomclaren.com) on Unsplash

Welcome to the final post in our spam catching saga! In the first post, we covered the theory for how to build a model to catch spam. In the last post, we built out the backend for our app by creating the spam classifier and a small Flask app to serve the model. We ended by creating an API and enabling our model to be invoked from Python scripts anywhere on our computer.

In this post, we’ll take it a step further by designing a nice frontend so you can interact with the model outside of Python. …

Well-said! I like how you explored the "two C's" options.


How understanding their differences made me a better programmer

One of the few locations where you can avoid the R vs. Python debates. Photo by Johannes Plenio on Unsplash

My first programming language was R. I fell in love with the nuance R granted for visualizing data, and how with a little practice it was straightforward to pull off complex statistical analyses. I coded in R throughout my Ph.D., but I needed to switch to Python for my first non-academic job. The transition was… bumpy, to say the least.

This post outlines some of the major differences between R and Python, as well as why those differences exist.

How do I actually get started?

Thanks to the explosion of interest in data science over the last decade, there are tons of excellent online classes for…

Office Hours

The human side of machine learning

Photo by Erol Ahmed on Unsplash

No matter how talented you are at crunching numbers and writing code, your effectiveness as a data scientist is limited if you chase questions that don’t actually help your company, or you can’t get anyone to incorporate the results of your analyses. Similarly, how do you stay motivated and relevant in a field that’s constantly evolving?

In this post, we’ll outline the business and personal skills needed to translate your technical skills into impact.

But of course, a disclaimer: I’ve spent my data science career so far at companies with fewer than 100 employees. This post would likely look different…

Machine learning and Flask — what’s not to like?

Photo by Ian Battaglia on Unsplash

Welcome back! In the last post, we covered the theory for why we use NLP and machine learning for spam classification. In this post, we’ll actually build such a classifier, and we’ll also create a Flask micro web service to make it much easier to interact with our model. In the next post, we’ll connect a sleek frontend to our Flask app so you can use the model without needing to know Python.

If you want to skip ahead, you can check out the actual app here and the source code here. We’ll keep this post simple by excluding a…

The theory behind building a model to identify spam

Photo by Jason Richard on Unsplash

SpamCatch is a fun side project I did to bring together natural language processing, Flask, and the front-end. Classifying spam text messages is a classic machine learning problem, but I’d never seen people test their classifier on raw strings of text. I’d also never seen a spam classifier hooked up to a nice user interface, where people could use the classifier without needing to know Python or Git.

There’s a lot to cover, so I’ll split this into three posts. This post will set the stage for what spam is and how we can build a model to automatically identify…

How to leverage in-demand skills and forget the rest

Photo by Brett Jordan on Unsplash

“I could always do data science if academia doesn’t work out.” It’s a recurring thought many graduate students and postdocs experience, especially if their work involves hearty servings of programming and statistics, the core elements of data science.

Data science can be a rewarding alternative to academia, and academics do have many qualities that make them attractive candidates for data science roles. However, there are also often large holes in academics’ skill sets that can deter them from being hired straight off the bat.

This post will outline the skills needed to make the leap from the ivory tower to…

Matt Sosna

Data scientist. PhD from Princeton. Passionate about sustainability. Addicted to learning how things work. www.mattsosna.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store