Category Archives: Data science
Abstractive Summarization with HuggingFace pre-trained models
Text summarization is a well explored area in NLP. As shown in Figure 1, the field of text summarization can be split based on input document type, output type and purpose. Regarding output type, text summarization dissects into extractive and abstractive methods. • Extractive: In the Extractive methods, a summarizer tries to find and combine the […]
Easy tutorial on Spark SQL and DataFrames
In this tutorial, you will learn how to load a DataFrame and perform basic operations on DataFrames with both API and SQL. I’m using colab to run the code. First, we need to To download the required tools !apt-get install openjdk-8-jdk-headless -qq > /dev/null !wget -q https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz !tar -xvf spark-3.1.1-bin-hadoop2.7.tgz !pip install -q findspark Then, […]
K-Nearest Neighbors Algorithm with Scikit-Learn
K Nearest Neighbor(KNN) is a very simple, easy to understand, supervised machine learning algorithms. KNN classifier classifies new data in a particular class based on a similarity measure How does KNN works A new observation is classified by a majority of its neighbors If K=1, then the class is simply assigned to the class of […]
Data preprocessing key steps
Data preprocessing is a technique that is used to transform raw data into an understandable format. Raw data often contains numerous errors (lacking attribute values or certain attributes or only containing aggregate data) and lacks consistency (containing discrepancies in the code) and completeness. This is where data preprocessing comes into the picture and provides a proven […]