In the attached file, there is a list of users’ chats in Divar’s personal items category. Read them and write a list of your suggestions for providing the service on the Divar chat in order of priority, noting the frequency of repetition (send the tagged file along with the task answer).
Input Dataset
The attached file was a CSV file containing two columns: conversation and type. Here is the sample screenshot of it:
Dataset Sample
Let’s talk about the suggested solution.
Solution
First of all, i implemented a script to get the frequency of each word in the chats. If you run this code (python3 main.py), it outputs a sorted csv file (output.csv) that shows the frequency of repetition of each word. It should be mentioned that to improve the quality of the output, stop words are removed from the output. I used the data set of this address for the list of such words.
Here is the script:
# -*- coding: utf-8 -*-
importcsvfromcollectionsimportCounter######################################################################
FILE_PATH='dataset.csv'STOP_WORDS_FILE='stop_words.txt'OUTPUT_FILE_PATH='output.csv'######################################################################
defget_stopwords(file_path):withopen(file_path,'r')asstopfile:stopwords=stopfile.read().splitlines()returnstopwordsdefcount_word_frequency(file_path,stopwords):word_frequency=Counter()withopen(file_path,'r')ascsvfile:reader=csv.DictReader(csvfile)forrowinreader:conversation=row['conversation']words=conversation.split()words=[word.lower()forwordinwordsifword.lower()notinstopwords]word_frequency.update(words)returnword_frequencydefwrite_to_csv(output_file,word_freq):withopen(output_file,'w',newline='')ascsvfile:writer=csv.writer(csvfile)writer.writerow(['Word','Frequency'])forword,frequencyinword_freq.items():writer.writerow([word,frequency])######################################################################
stop_words_list=get_stopwords(STOP_WORDS_FILE)word_freq=count_word_frequency(FILE_PATH,stop_words_list)sorted_word_freq=dict(sorted(word_freq.items(),key=lambdaitem:item[1],reverse=True))######################################################################
write_to_csv(OUTPUT_FILE_PATH,sorted_word_freq)print(f"Word frequency data has been written to {OUTPUT_FILE_PATH}")
The output CSV file should be something like this:
Output Sample
Suggested Services
The following table shows the suggested services based on important and frequent keywords:
Recommended Service Title
Keywords (translated to English)
Payment (or earnest money)
Price, Discount, Toman, Money, Deposit, Card, Cost, Earnest Money, How Much?, Bill
Address, In-Person, Where?, City, Town, House, Passage
Photo Selection (OS API)
Photo, Gallery
Contact the seller or buyer
Number, Contact, Phone
Audio to text conversion
voice
Prioritization
The previous part was just a list of services. According to the task, we should also prioritize which is a challenging matter. I always make a simple formula to prioritize. I select N criterias. I give all of them a score from 1 to 10 (some of them need to be normalized). Then I add the average weight and the job is done. At least, if it is not true, there is some logic behind it. It’s better than gut feeling! Currently, we only have the word frequency variable in the chat here. Other variables such as business impact, technical effort, degree of dependence on each other, approximate cost of operationalization, etc. are among the things that we can consider, but since each of these requires the involvement of a high-ranking stakeholder in the organization which we dont’ have here, Therefore, we cannot use them here. So let’s go to the frequency of repeating keywords of each service and see how much they are used. Finally, the priority of implementing our services will be based on the total frequency. The output of the previous code has a defect, and that is that it calculates the frequency of similar words separately. For example, he gave a number for the word “buy” and for the word “bought” and “buying” Other frequencies, but the point is that we should consider these as one and add their frequencies together. So, I implemented a second script. If you run the merger.py code, it will create an output file named output_merged.csv that performs this merging process, and I also defined a configuration to delete those that are repeated below 20 times. In order to increase the output quality of these things in practice, Data Scientists normally use specialized libraries like NLTK and other things that I didn’t know and didn’t use them. I admit that there are some errors in the outputs, which can be ignored in the scope of this report!
Here is the merging script:
# -*- coding: utf-8 -*-
importcsv######################################################################
INPUT_FILE_PATH='output.csv'OUTPUT_FILE_PATH='output_merged.csv'FREQUENCY_THRESHOLD=20######################################################################
defmerge_similar_words(input_file_path):word_frequency={}withopen(input_file_path,'r')ascsvfile:reader=csv.DictReader(csvfile)forrowinreader:word=row['Word']frequency=int(row['Frequency'])found=Falseforkeyinword_frequency:ifwordinkeyorkeyinword:word_frequency[key]+=frequencyfound=Truebreakifnotfound:word_frequency[word]=frequencyreturnword_frequencydefwrite_to_csv(output_file,word_freq):withopen(output_file,'w',newline='')ascsvfile:writer=csv.writer(csvfile)writer.writerow(['Word','Frequency'])forword,frequencyinword_freq.items():writer.writerow([word,frequency])######################################################################
merged_word_freq=merge_similar_words(INPUT_FILE_PATH)filtered_word_freq={word:freqforword,freqinmerged_word_freq.items()iffreq>=FREQUENCY_THRESHOLD}sorted_word_freq=dict(sorted(filtered_word_freq.items(),key=lambdaitem:item[1],reverse=True))######################################################################
write_to_csv(OUTPUT_FILE_PATH,sorted_word_freq)print(f"Merged and sorted word frequency data has been written to {OUTPUT_FILE_PATH}")
The table below is the sorted table of suggested services based on the frequency of repetition of keywords tagged to each service:
Price, Discount, Toman, Money, Deposit, Card, Cost, Earnest Money, How Much?, Bill
693
Product Delivery
Delivery, Address, Post, Transmission, Order, City, Home, Fare
414
Contact the seller or buyer
Number, Contact, Phone
410
Photo Selection (OS API)
Photo, Gallery
320
Navigation
Address, In-Person, Where?, City, Town, House, Passage
255
Audio to text conversion
voice
21
Evaluation
After we implemented and functionally tested one or more of these suggested services and finally released them, We have to montitor them. We can do a series of qualitative and quantitative work to evaluate feature adoption:
Quantitative work: in order to know whether the released service is working properly or not, we can deal a bit with data! This is both from a technical point of view (for example, check Sentry Log or Crashlytics to see if the system is stable or not) and from a product and business point of view. For example, do some CRO (conversion rate optimization) to see how much drop we have on each funnel in the considered journey(s) and Those who have a drop that doesn’t fit with reason, what is the reason (we have to combine it with the qualitative work that I will say below). The survey is also the answer for those who went to the end of the funnel. Of course, there is a challenge because here we are dealing with external services and we don’t have any data about their funnels either. As a result, we have to have continuous interaction with them here.
Qualititative work: In the previous part, when we came to the conclusion that a funnel is not working well, here it comes the qualititative work.We have to take out a sample of those who got stuck in each funnel and see what was their pain that got them stuck. Or at all, those who we showed the options to but did not use, we have to ask them what was the reason.
The evaluation output should become a series of actions to take and improve the product in next iterations.
Whether you’re an experienced professional or just starting out in the world of product management, project management, or agile software development, having a strong grasp of the specialized terminology can be invaluable. In this article, we’ll explore a comprehensive list of words, expressions, and action verbs related to these domains, designed to enhance your communication skills and deepen your understanding of these crucial fields.
The Pomodoro Technique is a time management method developed by Francesco Cirillo in the late 1980s. The technique is designed to help improve focus, productivity, and overall work efficiency by breaking work into focused intervals, followed by short breaks. The name “Pomodoro” (Italian for “tomato”) was inspired by the tomato-shaped kitchen timer that Cirillo initially used to track his work intervals. In this post, I’ll investigate and add a feature to Pomodoro Timer Application.
In today’s fast-paced business environment, effective knowledge management is crucial for organizations to stay ahead of the curve. One approach that has gained traction in recent years is the C4 Model, a visual notation technique for software architecture. In this blog post, we will explore the C4 Model, its components, and how it can be applied to manage knowledge within an organization. By the end of this comprehensive guide, you will have a better understanding of the C4 Model and its potential use cases in knowledge management.
When it comes to personal development, education is one of the most valuable investments one can make. Attending a course may seem like a small undertaking, but it can have a profound impact on a person’s personal and professional growth. There are numerous benefits to participating in courses ranging from acquiring new skills to networking opportunities. Courses can help individuals to increase their knowledge, enhance their credibility, and boost their career prospects. In this fast-paced world, individuals who are willing to learn and adapt have a competitive edge, and attending a course is one of the best ways to do so. Here, I listed the courses i’ve attended so far.
Leave a Comment