Friday, April 22, 2022

PySpark — Read All files from nested Folders/Directories

As we know that PySpark is a Python API for Apache Spark where as Apache Spark is an Analytical Processing Engine for large scale powerful distributed data processing and machine learning applications.

Note : I’m using Jupyter Notebook for this process and assuming that you guys have already setup PySpark on it.
#import all the libraries of pyspark.sql
from pyspark.sql import*#import SparkContext and SparkConf
from pyspark import SparkContext, SparkConf
#setup configuration property 
#set the master URL
#set an application name
conf = SparkConf().setMaster("local").setAppName("sparkproject")#start spark cluster
#if already started then get it else start it
sc = SparkContext.getOrCreate(conf=conf)#initialize SQLContext from spark cluster
sqlContext = SQLContext(sc)
#variable to hold the main directory path
dirPath='/content/PySparkProject/Datafiles'
#variable to store file path list from main directory
Filelists=sc.wholeTextFiles("/content/PySparkProject/Datafiles/*/*.csv").map(lambda x: x[0]).collect()
#for loop to read each file into dataframe from Filelists
for filepath in Filelists:
print(filepath)
#read data into dataframe by using filepath
df=sqlContext.read.csv(filepath, header=True)
#show data from dataframe
df.show()
#set sparksession 
sparkSession=SparkSession(sc)
#variable to hold the main directory path
dirPath='/content/PySparkProject/Datafiles'
#read files from nested directories
df= sparkSession.read.option("recursiveFileLookup","true").option("header","true").csv(dirPath)
#show data from data frame
df.show()
To learn more, please follow us -
To Learn more, please visit our YouTube channel at —
To Learn more, please visit our Instagram account at -
To Learn more, please visit our twitter account at -

5 comments:

  1. 薛如冰怀着曾经去做室内设计师的梦想,对房子的美有独到的理解。在她手上挂盘的出售房源,经过团队的精心策划和布置后,无一不是市场上璀璨的明星。无论什么类型的房产,薛如冰懂得如何去发挥一栋房子的长处,懂得如何抓住买家心理,对市场的熟知让薛如冰在定价上精准,在谈判中胸有成竹, 她手上售出的物业,往往可以突破市场价并顺利成交,让卖房这件有压力的事情变得轻松愉悦。
    https://rubyxue.com/

    ReplyDelete
  2. 功夫卡海外充值是一家专为海外华人服务的商城,功夫卡海外充值拥有着全非常齐全的商品,优质的客服服务。商品齐全

    包括海外游戏点卡、手游代充、腾讯业务代充、加速器代充、快手直播平台代充等等,更有海外华人专属代购业务,不管你有任何代购代充需求我们就能帮忙达成。

    ReplyDelete
  3. 在iTangka上如何购买苹果iTunes礼品卡?
    打开iTangka海外充值官网(https://www.itangka.com) ,免费注册一个iTangka账号,现在注册账号即可领取价值88美金的优惠券,下单即可抵用;

    ReplyDelete
  4. We are Suiet Wallet, as a part of Sui ecosystem, our goal is to build a Sui wallet for everyone and onboard the next billion Web3 users.

    ReplyDelete
  5. iOS 版 TG纸飞机 于 2013 年8 月 14 日推出。2013 年 10 月 20 日,TG纸飞机Android 的 alpha 版本正式推出。越来越多的TG纸飞机 客户端出现,由独立开发者使用 TG纸飞机 的开放平台构建。

    ReplyDelete