Data science Archives

Category Archives: Data science

Abstractive Summarization with HuggingFace pre-trained models

Text summarization is a well explored area in NLP. As shown in Figure 1, the field of text summarization can be split based on input document type, output type and purpose. Regarding output type, text summarization dissects into extractive and abstractive methods. • Extractive: In the Extractive methods, a summarizer tries to find and combine the […]

June 14, 2021 hela

Data science, Spark

Easy tutorial on Spark SQL and DataFrames

In this tutorial, you will learn how to load a DataFrame and perform basic operations on DataFrames with both API and SQL. I’m using colab to run the code. First, we need to To download the required tools !apt-get install openjdk-8-jdk-headless -qq > /dev/null !wget -q https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz !tar -xvf spark-3.1.1-bin-hadoop2.7.tgz !pip install -q findspark Then, […]

June 1, 2021 hela

Data science, Python

K-Nearest Neighbors Algorithm with Scikit-Learn

K Nearest Neighbor(KNN) is a very simple, easy to understand, supervised machine learning algorithms. KNN classifier classifies new data in a particular class based on a similarity measure How does KNN works A new observation is classified by a majority of its neighbors If K=1, then the class is simply assigned to the class of […]

April 8, 2021 hela

Data science

Data preprocessing key steps

Data preprocessing is a technique that is used to transform raw data into an understandable format. Raw data often contains numerous errors (lacking attribute values or certain attributes or only containing aggregate data) and lacks consistency (containing discrepancies in the code) and completeness. This is where data preprocessing comes into the picture and provides a proven […]

January 5, 2021 hela

Data Geek

Category Archives: Data science

Abstractive Summarization with HuggingFace pre-trained models

Easy tutorial on Spark SQL and DataFrames

K-Nearest Neighbors Algorithm with Scikit-Learn

Data preprocessing key steps

Recent Posts

Archives

Categories