Hello, My name is Preetwinder, I am a 2nd year IT student in India. I usually program in Python, but with some practice can find my around in Java, C++ and Haskell. I have a great deal of interest in Information Retrieval, Programming Languages and Distributed Systems.
I have been selected to work on Python 3 support for frontera under the Google Summer of Code program in which Scrapinghub is participating as an mentoring organization. My mentors will be Paul Tremberth(main mentor), Alexander Sibiryakov and Mikhail Korobov.
I am very exited to be a part of this program and the frontera community. I hope to make a useful contribution to frontera.
Frontera is a distributed web crawling framework, which when coupled with a Fetcher(such as scrapy) allows us to store and prioritize the URL ordering in a scalable manner. You can read about frontera in greater detail here.
The past few weeks have been the community bonding phase of the program, during this time the candidates are supposed to get familiar with their mentors and the codebase of their organizations. During this time I have prepared a better timeline, discussed the changes to be made with my mentors, and have improved my understading of the working of frontera. I have split my task into two phases, in the first phase I will focus on improving tests and bring python 3 support to the single process mode. In the second phase(post mid-term evaluation) I will focus on improving tests and extend python 3 support to distrubuted mode. The major challenges in this project will be testing of some components which are a bit tricky to test, and getting unicode/bytes to work correctly.
I hope to successfully port frontera, and have a productive summer.