Java program to delete duplicate lines in text file
Prerequisite : PrintWriter , BufferedReader
Given a file input.txt . Our Task is to remove duplicate lines from it and save the output in file say output.txt
Naive Algorithm :
1. Create PrintWriter object for output.txt 2. Open BufferedReader for input.txt 3. Run a loop for each line of input.txt 3.1 flag = false 3.2 Open BufferedReader for output.txt 3.3 Run a loop for each line of output.txt -> If line of output.txt is equal to current line of input.txt -> flag = true -> break loop 4. Check flag, if false -> write current line of input.txt to output.txt -> Flush PrintWriter stream 5. Close resources.
To successfully run the below program input.txt must exits in same folder OR provide full path for it.
// Java program to remove // duplicates from input.txt and // save output to output.txt import java.io.*; public class FileOperation { public static void main(String[] args) throws IOException { // PrintWriter object for output.txt PrintWriter pw = new PrintWriter( "output.txt" ); // BufferedReader object for input.txt BufferedReader br1 = new BufferedReader( new FileReader( "input.txt" )); String line1 = br1.readLine(); // loop for each line of input.txt while (line1 != null ) { boolean flag = false ; // BufferedReader object for output.txt BufferedReader br2 = new BufferedReader( new FileReader( "output.txt" )); String line2 = br2.readLine(); // loop for each line of output.txt while (line2 != null ) { if (line1.equals(line2)) { flag = true ; break ; } line2 = br2.readLine(); } // if flag = false // write line of input.txt to output.txt if (!flag){ pw.println(line1); // flushing is important here pw.flush(); } line1 = br1.readLine(); } // closing resources br1.close(); pw.close(); System.out.println( "File operation performed successfully" ); } } |
Output:
File operation performed successfully
Note : If output.txt exist in cwd(current working directory) then it will be overwritten by above program otherwise new file will be created.
A better solution is to use HashSet to store each line of input.txt. As set ignores duplicate values, so while storing a line, check if it already present in hashset. Write it to output.txt only if not present in hashset.
To successfully run the below program input.txt must exits in same folder OR provide full path for them.
// Efficient Java program to remove // duplicates from input.txt and // save output to output.txt import java.io.*; import java.util.HashSet; public class FileOperation { public static void main(String[] args) throws IOException { // PrintWriter object for output.txt PrintWriter pw = new PrintWriter( "output.txt" ); // BufferedReader object for input.txt BufferedReader br = new BufferedReader( new FileReader( "input.txt" )); String line = br.readLine(); // set store unique values HashSet<String> hs = new HashSet<String>(); // loop for each line of input.txt while (line != null ) { // write only if not // present in hashset if (hs.add(line)) pw.println(line); line = br.readLine(); } pw.flush(); // closing resources br.close(); pw.close(); System.out.println( "File operation performed successfully" ); } } |
Output:
File operation performed successfully
Note : If output.txt exist in cwd(current working directory) then it will be overwritten by above program otherwise new file will be created.
This article is contributed by Gaurav Miglani. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.
Please Login to comment...