What makes a successful data scientist?
At the recent O’Reilly’s AI conference in London I got the chance to speak with Chris Hillman, Principal Data Scientist at Teradata, a company that delivers real-time answers and leverages 100% of significant data. This is done on-premises in the cloud and anywhere in between.
Hillman has been working in analytics for around 25 years, and when he joined Teradata six years ago he officially changed his title to Data Scientist. He began by explaining: “Traditionally Teradata is an enterprise data warehousing company, so their product is a massive parallel database that has been around since the 70s when the terabyte was a big thing. But recently we have re-branded and now everything is cloud-based and available in AWS, azure and more. Now it is more than just a database and is more of an analytics platform.”
Companies with big datasets don’t want to keep moving data out to do analytics to put it back in, so now it is all in one big database.
Hillman explained that when it comes to AI projects failing, there are two types of failure.
Failfast methodology: A failure that actually ends up being positive. Sometimes if failure occurs, it can move you on in another direction. “A lot of people don’t record failures, but then it can just happen again. If you record, you have the method in hardcopy of what not to do,” Hillman explained it is really important to document this. “Academics don’t publish the work that isn’t a success, and that shouldn’t always be the case.”
He added that nobody wants to admit they have spent all this time and money for something that doesn't work, but it is still valid and helpful for people to see what doesn't work.
Real failure: This is the type of failure you don’t want. Hillman said: “A lot of people won’t agree, but in my opinion failure is any project that doesn’t produce lasting value. This is a failure and a waste of everyone’s time.” He added that if you are an academic or work in research you may not agree with him on that one, but when it comes to big companies that is the method Teradata works with.
Hillman explained there can be many reasons that failure occurs, so it comes down to these three qualities to ensure you don’t:
- A valid business case (ROI)
- A valid hypothesis to test
- A route to production - all teams to work together (good communication)
Of course security is a big issue that everyone needs to address, but especially when working with data, so I asked Hillman how Teradata deals with security. “The banks we work with always have a twin system, that people don't have access to.”
Talking about security as a whole Hillman commented: “There is an issue with integrated data, so you can take a record of transaction data and anomalise it and we wouldn't know who that person was, but if you add in factors such as time and location and other metadata around it, you can narrow it down. There is much more danger there so it is essential to have a strict system down before you even start a project.”
This is normally done by security experts, Teradata for example have a team of people that used to work in that industry that now cover all security issues. Hillman said: “There has always been this idea of a Data Scientist that knew everything and could do everything, but that is definitely not the case, you need a team around you.”
For Hillman, to make a successful data scientist you need adaptability, he said: “I have always pushed for data scientist to become a proper professional like an accountant or a lawyer, so to be able to do this we really need to be able to define what a data scientist does.”
He explained that adaptability is key as you need to be able to adapt to the change and the times, just look back to six years ago and things were so much different to what they are now. “To be a good data scientist you need to have a curiosity to be looking for new stuff, and have a research-based mind in which you never stop learning.”
Teradata works across a number of industries globally, but traditionally the companies it works with mostly includes banks, telcos and retail companies. Hillman said: “We find some of the regions we go to are desperate to hear advice on how they can implement machine learning and analytics.”
But data science can be used across all industries, so Teradata also works in gas, manufacturing, life sciences, and there is huge demand from people that work in IoT and smart cities - as they have the data that warrants the big parallel twin world.
Looking back at his time in the industry Hillman said: “We have seen a lot of changes, but one of the biggest has to be the volume of data increase, and the types of data there is.” He continued by explaining there has been a big shift to self-service in the data science world.
When asked what he thought the next big thing was, and how he felt about automation taking over, Hillman explained: “I think the next big thing is AI, as that is everything we do. But I don’t think people need to be worried about automation taking over jobs. I see a lot of mundane boring data science tasks being automated, jobs that would take us so much longer and would be a complete waste of time - now systems can do it, it’s so much better!”
In the long term AI can be used in a positive way to do more long winded jobs, in a lot more beneficial ways. Hillman added: “But we still need that creativity from humans, I have always loved technology and always thought that mostly it has been a good thing.”
Hillman definitely seemed optimistic on the whole topic. “I don't really have any reservations on it really, as historically technology and automation have always provided more jobs than it has taken away.”
To conclude Hillman said: “If we should have any reservations it should be with cyber security, as nothing has ever been made that can’t be hacked.”