Top Skills Needed to Work as Data Scientist in iGaming

It’s a known fact that data science jobs are at the peak of popularity now. In 2007 the volume of digital data exceeded the volume of analog data by almost 15 times, amounting to 280 Exabytes of digital data to only 19 analog Exabytes. Currently, more than 90% of all information is digital. With the advent of huge data arrays, called Big Data, there’s a need for boost development of methods and models able to process large amount of information quick and efficiently. The so called “three V’sâ€� are traditionally used as the defining characteristics for big data:

  • Volume – as physical scope,

  • Velocity - growth rate as well as need for high-speed processing and results,

  • Variety - possibility to simultaneously process different types of structured and semi-structured data.

So here’s the superficial list of data scientist’s functions: processing and systematization of data, revealing of hidden relationships and patterns, structuring, visualization and many more.At the moment I’m working for casino-now.co.uk project as external specialist and I would like to share my experience for first-time analysts and developers in order to provide them with the smoothest way to become a competent data scientist. In this regard I’ve prepared a special mind map, which helped a lot in growth hacking of the gambling portal: Here is explanation of the presented mind map, as well as relative gradation of the knowledge level that will help you to be competitive in terms of data science dealing with gambling projects:

  • hard skill – high level of difficulty.

  • middle skill – average level of complexity.

  • soft skill – fairly easy to acquire.

1. Technical skills

  • Python or R (medium skill) - knowledge of specialized programming languages. In this article, I will not dwell on the advantages of these particular languages, as this information is public and indisputable. I want to say only that any idea of a customer can be realized with the help of link libraries and the minimum number of codelines: writing your parser, creating unique images automatically and more.

  • SQL (medium skill). Knowledge & advanced use of the database system. You should be able to: store data, make simple sorts, merge data tables — perhaps understanding of SQL will open up wide possibilities and greatly simplify your current work.

  • Data Science FrameWorks: tensorflow (hard skill), keras (middle skill), Apache Spark (hard skill). This part of knowledge is a priority for many; your task as a data researcher is not to reinvent the wheel, but to use ready-made and powerful tools for implementing ideas. More information about each framework you can easily find around the web.

  • GPU Technology (middle skill). This’s an ability to organize fast distributed calculations of huge data arrays. Graphic processors are 1000 times faster than CPU, therefore, it’s the most important tool for data science developer, which allows you to train neural networks on a large amount of information, what is an integral part of data science. Thanks to distributed calculations on graphics card, NLP projects support exists and is actively developing now. You have to rationally evaluate computing capacity of the hardware you plan to work with. So don’t try to save on RAM and graphics card when buying hardware.

  • API Functions (soft skill) – to collect and process information, it’s necessary to use third-party resources. For instance, talking about my field - gambling I often deal with such services as Majestic, Ahrefs. API functions today are provided by all major information portals and social networks: Twitter, Facebook, IMDB and others. It should be understood that API functions of each service are unique, but the comprehension and speed grow geometrically with its application. Don’t afraid to work with API, as it maximizes the speed of data mining. Also pay attention to post and get json requests - simplify your life!

2. Education

This unit is one of the most important. Not every modern & successful specialist has a good knowledge background and opportunities to get high-quality education in world leading universities, however, when there’s a will there’s a way!

Free theoretical background with practical basisThis part includes modern educational portals, such as coursera, udemy, lynda and many others. Perhaps you should pay attention to the following courses, which may build a certain foundation to master key frameworks in the work of data science specialist:

Practical skills improvement.If you are still reading this article, then I think such projects as kaggle, drivendata, crowdanalytix, crowdai, topcoder, tunedit need no introductions . However, I still say a few words. For me at a certain stage of professional growth, such competitions were the main opportunity to hone my professional skills on large volumes of real data. This’s the problem of lack of practice with real data that most data science researchers face. However, even now, working on large projects in highly competitive gambling niche, kaggle and drivenData remain a kind of support for me. Often looking through the top solutions to any problem, an idea how to modernize and adapt the solution proposed by someone to my case comes up.

3.Knowledge of the Subject Area

  • Understand the Business/Online Casino/Gambling (middle skill)Advanced knowledge of the subject area is the key to success, since only if you are guided in the subject area, you will be able to set useful tasks for your business employer. You have to understand the types of casinos, what rtp is, know the needs of customers, know the variety of brands and products they provide.

  • Communication Skills (hard/middle/soft skills).Perhaps the only skill which comprehension is difficult to assess, but you should not underestimate its importance. The most common mistake of data scientists - they are deeply buried in mathematical methods and models, strategies, frameworks, when in practice a set of simplest methods and models with top MVP often works fine, after which hardcore methods may be needed.

  • Teamwork (middle skills). An important skill of any data scientist is ability, desire and potential to work in a team. Only by enriching yourself with ideas, listening to the opinion of your teammates, you can get a truly clear, elegant, and most import, an efficient solution that will be useful and may bring profit.

4.Mathematical tools: methods and models

To begin with, it’s worthwhile to define that all tasks can be divided into 4 large categories: data clustering, regression dependency mapping, visualization and analysis. And each block of knowledge needs its own set of methods and models.However, at present, I advise you to turn your attention to machine learning (hard skill).

  • Basic models and descriptive statistics (soft skill)b.Tree-based models (middle skill)It’s almost universal mean for solving basic classes of problems. If you have the skill of boosting decision trees, then the process of solving by these methods will be fascinating and has the maximum benefit. For me, these models are of particular value when solving clustering problems, when I have an object-property matrix, a sample without a teacher, and I need to build close groups. Decision trees are fairly resistant to emissions, and they are also quite difficult to retrain, for example, in contrast to neural networks; and by means of validation, you can always check the quality of the tree you built. In addition, the rules - tree branches formed in the process of learning are fairly easy to interpret and provide a qualitatively new understanding of the processes.

Deep learning (hard skill) i.The Boltzmann machine (soft skill)ii.Convolutional neural network (CNN) (middle skill) iii.Recurrent neural networks (RNN) (hard skill) - allowing to train model on the processes in timeiv.Recursive neural networks(hard skill) - allowing to include feedback between circuit elementsThe modern diversity of neural networks allows us to solve a huge pool of problems of increased complexity. Within the framework of gambling, such trend as natural language processing is highly perspective. From a popular example of evaluating the tonality of reviews for casino brands or individual slots, to semantic analysis and clustering of requests for potential customers in order to form an optimized task for content writing that would be as customer-oriented as possible and fully optimized for requests. The scope of data science knowledge and skills of is truly limitless!

In my opinion, the key quality for data scientist is curiosity! It leads to the permanent development of professional qualities, and forces us to spend hours in front of code lines and numbers in order to understand and find previously unrevealed patterns that will allow to implement any projects, the quality and quantity of which is limited only by your imagination.However, every data scientist should remember:

With great power there must also come great responsibility

After all, a correctly developed and interpreted model can lead to a big gain for the employer, but one false regression and pseudo-dependencies can be fatal.Let me end up with the words of classic Dare! Wish you good luck and success in your sphere! Don’t forget to share your feedback: tinaward@mail.uk or TinaWard