Home Scraping XML from a website with VB.NET and LINQ
Reply: 0

Scraping XML from a website with VB.NET and LINQ

user11583
1#
user11583 Published in September 19, 2018, 4:03 am

I've been reading thorough the LINQ documentation and looking at some previous answers on Stack Overflow but I'm still pretty confused about how LINQ works. I want to grab some data from a website, but I can't figure out how to get the xml to parse into strings. Here is what I have so far:

Public Class Form1
    'Dim xml As XDocument
    Dim ns As XNamespace
    Dim strXMLSource As String = "http://gd2.mlb.com/components/game/mlb/year_2018/month_03/day_29/gid_2018_03_29_anamlb_oakmlb_1/linescore.xml"

    Dim xml As XDocument = <?xml version="1.0" encoding="utf-16"?>
                           <game>
                               <id>
                               </id>
                               <venue>
                               </venue>
                           </game>


    Private Sub Form1_Load(sender As System.Object, e As System.EventArgs) Handles MyBase.Load
        txtXMLSource.Text = strXMLSource
    End Sub

    Private Sub cmdGetData_Click(sender As System.Object, e As System.EventArgs) Handles cmdGetData.Click
        ns = txtXMLSource.Text
        Dim strGame As XElement = xml.Descendants(ns + "game").First
        Dim strId As String = strGame.Descendants(ns + "id").First
        MessageBox.Show(strId)
    End Sub
End Class

So when the form loads it sets up an XNamespace as ns and an XDocument as xml. When I click the cmdGetData button on the form, it should load the website name to the XNamespace and then grab the value of the first id element and put it in the strId variable. And then it should print that value in a message box. I know I'm doing something wrong but I have no idea what to do to fix it.

share|improve this question
  • As far as I can see, there is only one record there, so why do you need to query it? – Plutonix Jan 12 at 23:01
  • Because there are a lot of similar pages, one for each mlb game during the season. Once I figure out how to make it work, I will build some logic to loop through the dates and build the urls and fetch the one record for each game. – Michael T Jan 13 at 0:47
  • The code above is basically based on @Neolisk 's answer here: stackoverflow.com/questions/21611098/… But now I'm thinking maybe it's not doing the web scraping part right? – Michael T Jan 13 at 1:56
  • It's totally unclear what you're doing in code. In cmdGetData_Click you effectively assign the values of strXMLSource to ns, i.e. you assign the web-address of XML to namespace variable. Total nonsense. – JohnyL Jan 13 at 7:45
  • Yes I think I was confused about what the namespace is (still am). I just wanted to get the xml from a website and parse it. The second part of that seems to be well documented but the first part (getting from a web page) is not. All of the examples I have found online either use a file or type the xml right into the code, never from a website. – Michael T Jan 13 at 15:00

1 Answer 1

active oldest votes
up vote 1 down vote accepted
You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.407095 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO