Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

Java Program to Extract Paragraphs From a Word Document

  • Difficulty Level : Medium
  • Last Updated : 19 Oct, 2021

The article demonstrates how to extract paragraphs from a word document using the getParagraphs() method of XWPFDocument class provided by the Apache POI package. Apache POI is a project developed and maintained by Apache Software Foundation that provides libraries to perform numerous operations on Microsoft office files using java. 

To extract paragraphs from a word file, the essential requirement is to import the following library of Apache.

Attention reader! Don’t stop learning now. Get hold of all the important Java Foundation and Collections concepts with the Fundamentals of Java and Java Collections Course at a student-friendly price and become industry ready. To complete your preparation from learning a language to DS Algo and many more,  please refer Complete Interview Preparation Course.

poi-ooxml.jar

Approach

  1. Formulate the path of the word document
  2. Create a FileInputStream and XWPFDocument object for the word document.
  3. Retrieve the list of paragraphs using the getParagraphs() method.
  4. Iterate through the list of paragraphs to print it.

Implementation

  • Step 1: Getting the path of the current working directory where the word document is located.
  • Step 2: Creating a file object with the above-specified path.
  • Step 3: Creating a document object for the word document.
  • Step 4: Using the getParagraphs() method to retrieve the paragraphs list from the word file.
  • Step 5: Iterating through the list of paragraphs
  • Step 6: Printing the paragraphs
  • Step 7: Closing the connections

Sample Input

The content of the Word document is as follows:



Implementation

Example

Java




// Java program to extract paragraphs from a Word Document
  
// Importing IO package for basic file handling
import java.io.*;
import java.util.List;
// Importing Apache POI package
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
  
// Main class to extract paragraphs from word document
public class GFG {
  
    // Main driver method
    public static void main(String[] args) throws Exception
    {
  
        // Step 1: Getting path of the current working
        // directory where the word document is located
        String path = System.getProperty("user.dir");
        path = path + File.separator + "WordFile.docx";
  
        // Step 2: Creating a file object with the above
        // specified path.
        FileInputStream fin = new FileInputStream(path);
  
        // Step 3: Creating a document object for the word
        // document.
        XWPFDocument document = new XWPFDocument(fin);
  
        // Step 4: Using the getParagraphs() method to
        // retrieve the list of paragraphs from the word
        // file.
        List<XWPFParagraph> paragraphs
            = document.getParagraphs();
  
        // Step 5: Iterating through the list of paragraphs
        for (XWPFParagraph para : paragraphs) {
  
            // Step 6: Printing the paragraphs
            System.out.println(para.getText() + "\n");
        }
  
        // Step 7: Closing the connections
        document.close();
    }
}


Output




My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!