Converting Row into list RDD in PySpark

  • Last Updated : 18 Jul, 2021
In this article, we are going to convert Row into a list RDD in Pyspark.

Creating RDD from Row for demonstration:


# import Row and SparkSession
from pyspark.sql import SparkSession, Row
# create sparksession
spark = SparkSession.builder.appName('').getOrCreate()
# create student data with Row function
data = [Row(name="sravan kumar",
            subjects=["Java", "python", "C++"],
            lang=["Spark", "Java", "C++"],
            subjects=["DS", "PHP", ".net"],
            lang=["Python", "C", "sql"],
            lang=["CSharp", "VB"],
rdd = spark.sparkContext.parallelize(data)
# display actual rdd


[Row(name='sravan kumar', subjects=['Java', 'python', 'C++'], state='AP'),
Row(name='Ojaswi', lang=['Spark', 'Java', 'C++'], state='Telangana'),
Row(name='rohith', subjects=['DS', 'PHP', '.net'], state='AP'),
Row(name='bobby', lang=['Python', 'C', 'sql'], state='Delhi'),
Row(name='rohith', lang=['CSharp', 'VB'], state='Telangana')]

Using map() function we can convert into list RDD


where, rdd_data is the data is of type rdd.

Finally, by using the collect method we can display the data in the list RDD.


# convert rdd to list by using map() method
b =
# display the data in b with collect method
for i in b.collect():


['sravan kumar', ['Java', 'python', 'C++'], 'AP']
['Ojaswi', ['Spark', 'Java', 'C++'], 'Telangana']
['rohith', ['DS', 'PHP', '.net'], 'AP']
['bobby', ['Python', 'C', 'sql'], 'Delhi']
['rohith', ['CSharp', 'VB'], 'Telangana']

