I’ve been programming now for eight years, and it wasn’t until just months ago that I was able to answer a question I’ve had this whole time: “How do I share my project with someone?”
When I say “project,” I’m not talking about a single R script or a handful of bash commands − even 22-year old me could figure out copy and paste! I mean a project that has several files, perhaps in multiple languages, with external dependencies. Do I just throw it all into a zip folder? How do I deal with new versions of languages and packages…
From ancient government, library, and medical records to present-day video and IoT streams, we have always needed ways to efficiently store and retrieve data. Yesterday’s filing cabinets have become today’s computer databases, with two major paradigms for how to best organize data: the relational (SQL) versus non-relational (NoSQL) approach.
Databases are essential for any organization, so it’s useful to wrap your head around where each type is useful. We’ll start with a brief primer on the history and theory behind SQL and NoSQL. But memorizing abstract facts can only get you so far − we’ll then actually create each type…
Welcome to the final post in our spam catching saga! In the first post, we covered the theory for how to build a model to catch spam. In the last post, we built out the backend for our app by creating the spam classifier and a small Flask app to serve the model. We ended by creating an API and enabling our model to be invoked from Python scripts anywhere on our computer.
In this post, we’ll take it a step further by designing a nice frontend so you can interact with the model outside of Python. …
My first programming language was R. I fell in love with the nuance R granted for visualizing data, and how with a little practice it was straightforward to pull off complex statistical analyses. I coded in R throughout my Ph.D., but I needed to switch to Python for my first non-academic job. The transition was… bumpy, to say the least.
This post outlines some of the major differences between R and Python, as well as why those differences exist.
No matter how talented you are at crunching numbers and writing code, your effectiveness as a data scientist is limited if you chase questions that don’t actually help your company, or you can’t get anyone to incorporate the results of your analyses. Similarly, how do you stay motivated and relevant in a field that’s constantly evolving?
In this post, we’ll outline the business and personal skills needed to translate your technical skills into impact.
Welcome back! In the last post, we covered the theory for why we use NLP and machine learning for spam classification. In this post, we’ll actually build such a classifier, and we’ll also create a Flask micro web service to make it much easier to interact with our model. In the next post, we’ll connect a sleek frontend to our Flask app so you can use the model without needing to know Python.
SpamCatch is a fun side project I did to bring together natural language processing, Flask, and the front-end. Classifying spam text messages is a classic machine learning problem, but I’d never seen people test their classifier on raw strings of text. I’d also never seen a spam classifier hooked up to a nice user interface, where people could use the classifier without needing to know Python or Git.
“I could always do data science if academia doesn’t work out.” It’s a recurring thought many graduate students and postdocs experience, especially if their work involves hearty servings of programming and statistics, the core elements of data science.
Data science can be a rewarding alternative to academia, and academics do have many qualities that make them attractive candidates for data science roles. However, there are also often large holes in academics’ skill sets that can deter them from being hired straight off the bat.