Accelerating Python functions with Numba

In this post, I will provide a brief overview of Numba, an open-source just-in-time function compiler, which can speed up subsets of your Python code easily, and with minimal intervention. Unlike other popular JIT compilers (e.g. Cython, pypy) Numba simply requires the addition of a function decorator, with the premise of approaching the speed of… Continue reading Accelerating Python functions with Numba

Harvesting the metadata of 1.5million arXiv papers

arXiv is the world's leading online repository of scientific research in physics, mathematics, computer science and related fields. It enables scientists to open-source manuscripts of their work easily and quickly, which are sometimes never published elsewhere.The metadata of the ~1.5 million manuscripts that it hosts, form an ideal dataset for many NLP and data analysis… Continue reading Harvesting the metadata of 1.5million arXiv papers

A beginner’s guide to running Jupyter Notebook on Amazon EC2

As a beginner in large-scale data manipulation, I quickly found the computational needs of my projects exceeding the capabilities of my personal equipment. I was therefore amazed by Amazon’s EC2 offering — renting virtual computers on which computer applications can be run remotely from a local machine, and for free! What I was subsequently more amazed by,… Continue reading A beginner’s guide to running Jupyter Notebook on Amazon EC2