Microsoft Business Intelligence (Data Tools)|DataBricks - Change column names from CamelCase to Snake

Saturday, February 24, 2024

DataBricks - Change column names from CamelCase to Snake_Case by Scala

In this tutorial, you will learn "How to Change column names from CamelCase to Snake_Case by using Scala" in Databricks.

💡Imagine we have an input Dataframe (as in the image). Our goal is to achieve the desired output Dataframe (also in the image).

Basically, you have to change the names of column as follows-
Age -> Age ,

FirstName -> First_Name,

CityName -> City_Name,

CountryName -> Country_Name

To create a Dataframe in Scala, you can use Apache Spark's Dataframe API.

In this example: 💎Import necessary Spark classes for Dataframe operations. 💎Create a SparkSession which is the entry point to Spark SQL functionality. 💎Define a schema for our Dataframe using StructType and StructField. 💎Define the data as a sequence of rows, where each row represents a record in the Dataframe. 💎Create the Dataframe using createDataFrame method of SparkSession, passing in the data and schema. 💎Display the Dataframe using show() method. 💎Create Variable to store Regex Pattern 💎Create Variable to store new Snake Case columns 💎Create new Dataframe with Snake Case Columns 💎Finally, display the data from the Dataframe

// import libararies
import org.apache.spark.sql.{SparkSession, Row}
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

//Create Spark session
val spark=SparkSession.builder().appName("CamelToSnakeCase").getOrCreate()

//define the schema of dataframe
val schema = StructType(Array(StructField("Age", IntegerType, nullable=true),
StructField("FirstName", StringType, nullable=true),
StructField("CityName", StringType, nullable=true),
StructField("CountryName", StringType, nullable=true)
))

// Define the data as a Seq of rows
val data=Seq(Row(30,"John Ramsay","New York", "USA"),
Row(32,"Alice Wang","New York", "USA"),
Row(41,"Bob Builder","Los Angeles", "USA"),
Row(25,"Ryan Arjun","New York", "USA"))

//bind the data into dataframe
val df= spark.createDataFrame(spark.sparkContext.parallelize(data), schema)

// diaplay df
df.show()

// show schame of the dataframe
df.printSchema()


// Define Pattern
val regexPattern="(?=[A-Z])".r

// Change the column names
val newColumns:Array[String]=for {col <- df.columns} 
yield { 
  // define variable to split column name
  val splitName = regexPattern.split(col)
  if (splitName.length>1)
   splitName.mkString("_")
  else
  splitName(0)

}

//print new columns
print(newColumns)

//create new dataframe
val newDF=df.toDF(newColumns:_*)

// display records from new dataframe
newDF.show()

Show output

Please watch our demo video at Youtube-

To learn more, please follow us - 🔊 http://www.sql-datatools.com To Learn more, please visit our YouTube channel at — 🔊 http://www.youtube.com/c/Sql-datatools To Learn more, please visit our Instagram account at - 🔊 https://www.instagram.com/asp.mukesh/ To Learn more, please visit our twitter account at -

🔊 https://twitter.com/macxima

Mukesh Singh

With over 17 years of experience in the Data Engineering stack across a variety of cloud and on-premises systems, I have successfully delivered more than ten complete business product solutions. My expertise lies in building robust infrastructure and architecture to support data engineering, data analytics, and machine learning processes. These solutions have significantly improved collaboration among cross-functional teams, including data scientists, business analysts, software engineers, and stakeholders. Key Contributions Data Modelling and Integration • Data Modeling: Developed various data models to produce suitable data for business users, data analytics, data science, and data visualization teams. • Legacy Systems and Cloud Technologies: Integrated legacy systems with modern cloud-based technologies (AWS, Azure, GCP), data lakes, and data warehouses. • Streamlined Data Pipelines: Built efficient data pipelines, data warehouses, BI reports, and dashboards to streamline data access and insights.

Microsoft Business Intelligence (Data Tools)

Saturday, February 24, 2024

DataBricks - Change column names from CamelCase to Snake_Case by Scala

No comments:

Post a Comment

Popular Posts