Home Reading line-by-line Hebrew text file shows gibberish no matter which encoding I use, why?
Reply: 1

Reading line-by-line Hebrew text file shows gibberish no matter which encoding I use, why?

DJ5000
1#
DJ5000 Published in 2018-01-12 07:10:30Z

I made the site Text-Files-Oriented. The site is in Hebrew, using Razor Pages, Asp.Net Core 2.
Environment: Visual Studio 2017 with all updates.

In _Layout file I have:

<meta charset="utf-8" />
<meta lang="he" dir="rtl" />

also, in site.css:

body {
    background-color:black;

    padding-top: 50px;
    padding-bottom: 20px;
    direction:rtl; /*right to left*/
    font-family: 'opensanshebrew'; /*defined above it*/
    font-size:16px;
}

In a razor page Poems, I want to simply show the first line of every txt file in "Poems" folder in wwwroot. and it goes like this:

<div class="row">
    <div id="fileListArea" class="col-lg-8">
        <h2>רשימת השירים שכתבתי:</h2>

        @foreach (var p in Model.PoemsList)
        {
            <span>@p.Title</span><br />
        }
    </div>
</div>

[I'll put it on a grid later]

in code behind:

public void OnGet()
{
    string tpath = _env.WebRootPath + "\\Poems";
    Filelist = fileTools.GetFileList(tpath);
    PoemsList = new List<PoemCover>();
    foreach(string fn in Filelist)
    {
        PoemsList.Add(new PoemCover(fileTools.GetTitle(tpath + "\\" + fn, Encoding.ASCII), fn));
    }
}

in fileTools

public static string GetTitle(string pathWfilename,Encoding encd)
{
    string rslt;

    try
    {
        using (StreamReader strm = new StreamReader(pathWfilename, encd))
        {
            string nextLine;
            rslt = strm.ReadLine();
            nextLine = strm.ReadLine();

            if (nextLine != null)
                if (nextLine.Length >= 2)
                {
                    int didx = NthOccurence(rslt, ' ', 3);
                    if (didx < 2)
                    { rslt = (rslt.Substring(0, rslt.Length - 1)) + "..."; }
                    else { rslt = (rslt.Substring(0, didx)) + "..."; }
                }
        }
    }
    catch(IOException ex)
    {
        rslt = "Error reading Title from - " + pathWfilename + " - " + ex.Message;
        Console.WriteLine("{0}", rslt);
    }

    return rslt;
}

It works but the lines are gibberish...
I've tried:

fileTools.GetTitle(tpath + "\\" + fn, Encoding.ASCII)
fileTools.GetTitle(tpath + "\\" + fn, Encoding.Unicode) 
fileTools.GetTitle(tpath + "\\" + fn, Encoding.UTF8)
fileTools.GetTitle(tpath + "\\" + fn, Encoding.UTF7) 
fileTools.GetTitle(tpath + "\\" + fn, Encoding.UTF32) 
fileTools.GetTitle(tpath + "\\" + fn, Encoding.GetEncoding("Windows-1255")) 
//which gives error of no such encoding

Some show gibberish, some shows different kinds of question marks. One shows some weird fonts...

How can I Read Hebrew text files?

S.Fragkos
2#
S.Fragkos Reply to 2018-01-12 08:51:01Z

I have no knowledge of Hebrew language but I found a string in google to work with for testing, so here it comes:

TL;TR:

 GetTitle(@"C:\dataUpload\test.txt",Encoding.GetEncoding("windows-1255")) ; 

print of my test:

I used your GetTitle method just made it simpler to serve my tests.

and my test.txt file looks like this:

גליון_1

Take a note that "windows-1255" in GetEncoding starts with NON-CAPITAL letter!!

Good luck with your progress and feel free to contact me for any information.

PS. I dont understand Hebrew so in case my answer is off provide me some Hebrew strings and the expected output to work with. Also check with what encoding you have saved your txt files. I saved my .txt file as UTF-8 and now Encoding.UFT8 works too....

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.36066 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO