+41 768307656
info@htc-sagl.ch

Tag: data scientist

Vending Machine – Data Analysis – From study to action and how to improve performance

Data analysis of A vending machine can be very helpful because the information given after trasforming & visusalizing data can enhance logistics, avoid losses and improve performance.

A vending machine is one of those machines installed in shopping mall, offices and stores. They can sell anything which is inside. Any item is stored in a coil and can be bought at a fixed price.

The new models allows to collects usefull data in csv format and then can be manipulated in a way that can give a lot of information like customer profile, spending, preferences and also to discover some correlation between two or more products are sold together.

This study collects data from a single vending machine and try to analyse and search for some correlation between items sold.

Data consist in a single file with 6445 rows and 16 columns. Rows corresponds to a a single operation, from January to August. Most important columns for this study correspond to:

  • Name
  • DateofSale: day,month, day number, year
  • Type of Food: Carbonated, Non-Carbonated, Food, Water
  • Type of Payment: credit card, cash
  • RCoil: coil number of the product
  • RPrice: price of the product in the coil
  • QtySold: quantity sold
  • TransTotal: total amount of the transaction. Normally 1 sold, 1 paid, but can happens that more than item can be sold

Preprocessing

Data is loaded as follow, removing unnecessary fields from raw data:
After cleaning and transforming data, the following table shows the entire dataset consisting of 6445 rows and 10 columns.

The first thing to do, is a preliminary calculation to see which categories are present in the dataset.

We can see that the 2 most important categories are food and carbonated drinks, which correspond to 78% of total transactions in 8 months of sampling. In the following sections we will go deep into data analytics

Carbonated

The following table corresponds Carbonated products and quantity sold, sorted from hightest to lowest:

The first 5 positions, corresponding to 37% of the types of carbonated drink, sold 1431

Food

The following table corresponds Food products and quantity sold, sorted from hightest to lowest

In case of food, the first 5 position covers only 23% of the total quantity sold, in addition to this the number of categories/brands is 7 times bigger than carbonated drinks. This creates a spread in the sales because the user/client has more types to choose. The above short section shown the data extracted from the main dataset that is usefull to provide an indication of trending products. The information given is without any statistical inference, but merely data extracted, loaded and transformed (ELT).

Monthly sales

If you want to see the overall study and discover if there is a correlation between a carbonated drink is sold with food, you can find it below

Which Is The Difference Between Data Scientist And Data Engineer?

Data scientist and data engineer are both essential roles in the field of data analytics, but they have distinct responsibilities. According to Max Shron in “Thinking with Data: How to Turn Information into Insights,” “data science is more like a research project, while data engineering is more like a development project.” This means that while data scientists focus on analyzing data to extract insights and make predictions, data engineers are responsible for designing and maintaining the systems that enable data scientists to work with the data.

Andreas Müller and Sarah Guido echo this sentiment in “Introduction to Machine Learning with Python: A Guide for Data Scientists,” stating that “data scientists are concerned with asking the right questions and finding meaningful insights from data. Data engineers are responsible for designing and maintaining the systems that enable data scientists to work with the data.” DJ Patil and Hilary Mason similarly note in “Data Driven: Creating a Data Culture” that “data engineering involves building the infrastructure to support data science, while data science involves using that infrastructure to extract insights from data.”

Joel Grus adds in “Data Science from Scratch: First Principles with Python” that “data engineering involves building the infrastructure to support data science, while data science involves using that infrastructure to extract insights from data.” Finally, Martin Kleppmann sums it up in “Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems” by saying that “data science is about making sense of data, while data engineering is about making data make sense.”

In summary, data scientists focus on extracting insights from data, while data engineers focus on building the infrastructure to store and process that data. While there may be some overlap between the roles, they have distinct responsibilities and focus on different aspects of working with data. Both roles are crucial in modern data-driven organizations, and they often work together closely to achieve common goals