Home Is it possible to extract lines from a .docx document?
Reply: 1

Is it possible to extract lines from a .docx document?

Alex
1#
Alex Published in 2017-11-14 21:57:11Z

Is it possible to extract lines from a .docx document?

I don't mean the lines as created by the user when pressing '\n', I mean the lines in the .docx as they appear when opening the file (i.e., the soft text wrapping).

My preference would be doing this in Python (I am aware of the .docx Python library but I don't think it does the trick). However, any programming language is welcome as long as it provides me with what I want.

Thanks a lot!

E-Aged
2#
E-Aged Reply to 2018-01-18 13:16:09Z

I'm not sure if I understand your problem/question properly but I was dealing with same problem a few week. I've succeeded to parse docx file to a txt file line by line as it was. I hope this piece of code works for you too. Sorry for my limited knowledge about writing in English.

public class parseDocx {
public static void parse(String src,String dest) {
    try {
        FileInputStream fis = new FileInputStream(src);
        XWPFDocument docx = new XWPFDocument(OPCPackage.open(fis));
        XWPFWordExtractor extractor = new XWPFWordExtractor(docx);
        FileWriter fw = new FileWriter(dest);
        String[] Linelist = extractor.getText().split("[\\r?\\n]+");
        for(String str : Linelist){
            fw.write(str + "\r\n");
        }
        fw.flush();
        fis.close();
        fw.close();
        System.out.println(extractor.getText());
    }catch(IOException | InvalidFormatException ex) {} 
}

Above in the code you can delete the system.out line and src stands for source file's directory and dest stands for destination file's directory.

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.350377 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO