This is a very basic example on using Luigi as a task pipeline
It is incredibly easy to write a script to process some data in python. But if you have a lot of tasks that depend on each other, and you need to create a robust work flow, then thinking in terms of a data pipeline is useful.
Luigi is a framework for building data pipelines, and managing workflows. The onus of setting up each unit of work is on you as the developer. Luigi taks care of resolving dependencies, manages the overall workflow, and most importantly handles failures. As a bonus Luigi provides a rather nice visualization tool and a command line interface.
In Luigi, a data pipeline is built by defining Task instances. For every Task, you can define its dependency by specifying the requires method for the Task. Every Task can define an output method to specify the Target where the results of the Task should go. Lets look at a simple example to get our feet wet, and gradually build complex cases.
This example is rather self-explanatory. I use the MockFile class as the Target just so that I can print to console. One can instead use luigi.LocalFileTarget(filename) to use the file system as the target. The main_task_cls specifies SimpleTask as the task to run. The actual processing part of the task is encapsulated in the run method of the SimpleTask class.
When the script is executed, you should see an output that looks like this:
DEBUG: Checking if SimpleTask() is complete INFO: Scheduled SimpleTask() (PENDING) INFO: Done scheduling tasks INFO: Running Worker with 1 processes DEBUG: Asking scheduler for work... DEBUG: Pending tasks: 1 INFO: [pid 30338] Worker Worker(salt=329921834, host=G-ubuntu, username=myuser, pid=30338) running SimpleTask() SimpleTask: Hello World! INFO: [pid 30338] Worker Worker(salt=329921834, host=G-ubuntu, username=myuser, pid=30338) done SimpleTask() DEBUG: 1 running tasks, waiting for next task to finish DEBUG: Asking scheduler for work... INFO: Done INFO: There are no more tasks to run at this time INFO: Worker Worker(salt=329921834, host=G-ubuntu, username=myuser, pid=30338) was stopped. Shutting down Keep-Alive thread
There you go! You have learnt a basic example.
Here we learnt a really basic example that should give you some sense of building Luigi based task pipelines.