Data team harmony lies with open source
Harmony and analytics are two terms not often found together, especially when getting a team of diverse data professionals to execute data science chores - together. A data science team is often made up of people from diverse backgrounds, with diverse skillsets - from the machine learning specialist, to the master Python coder, to the beginning data analyst. To successfully build and execute any sized data science project requires harmony across all of the team members.
Everyone needs to work effectively and efficiently, using the tools they know best. What’s more, the growing deficit of Data Scientists, along with the closed nature of many analytics tools makes building effective teams even more difficult. That said, all is not lost.
There exists a vast ecosystem of open source tools that are available to the masses, which can help to level the playing field, and bring data analytics capabilities to professionals of all stripes. Yet, much like the cola wars of the 80’s, there is an almost infinite variety of flavors and formulas that drive tastes, at least when it comes to standardising analytics tool sets.
This is a conundrum that can only be solved by creating harmony among team members and their tools of choice. However, harmony means many things to many people, in the case of data science, harmony takes on the form of people being able to interact with their tools of choice, as well as having some mechanism to orchestrate those tools.
Naturally, orchestration and harmony cannot happen without a conductor, and in the world of data science, that conductor takes the form of a software platform that fuels interoperability, and tears down barriers.
Dataiku digitises that conductor with Dataiku Data Science Studio (DSS), a platform that embraces the ideologies of open source technologies, and bridges those technologies together to give teams choices, while promoting collaboration. Dataiku DSS connects to more than 25 different data storage systems, including closed source and open source databases, such as SQL Server, HDFS, NoSQL, and so forth.
Dataiku DSS also supports numerous programing languages (Python, R, Spark, etc) allowing data professionals to work with the programming tools of their choice and still have connectivity to the data shared by the team. Critical features such as team knowledge sharing, change management, and project monitoring further fuel collaboration, while eliminating silos of operation.
Dataiku DSS’s platform approach centralises open source elements, creating an environment where team knowledge is shared, and never lost when teams are reconfigured. What’s more, integrated to-do lists, document sharing, and unified logs make it easier to onboard new team members, as well as perform forensics on previous projects.
Simply put, open source tools may become the lifeblood of enterprise data science projects, but without proper orchestration, that life blood, as well as communal knowledge is sure to be lost over a short period of time.