<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Immersive Intellegence Colleagues</title>
	<atom:link href="http://im-tel.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://im-tel.org</link>
	<description>...exploring collaborative virtual spaces to solve hard problems</description>
	<lastBuildDate>Sun, 08 Apr 2012 17:10:13 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Human Judgment versus Machine Learning</title>
		<link>http://im-tel.org/2012/04/07/human-judgment-versus-machine-learning/</link>
		<comments>http://im-tel.org/2012/04/07/human-judgment-versus-machine-learning/#comments</comments>
		<pubDate>Sun, 08 Apr 2012 04:07:45 +0000</pubDate>
		<dc:creator>richardh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[BigData]]></category>
		<category><![CDATA[DataMining]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[MachineLearning]]></category>
		<category><![CDATA[VisualAnalytics]]></category>

		<guid isPermaLink="false">http://im-tel.fragileearthstudios.com/?p=656</guid>
		<description><![CDATA[This last week a nine-week online course entitled &#8220;Learning From Data&#8221;started, taught by by Caltech Professor Yaser Abu-Mostafa. As they promoted&#8230; &#8220;A real Caltech course, not a watered-down version, broadcast live from the lecture hall at Caltech.&#8221; The course objective is &#8220;machine learning that covers the basic theory, algorithms, and applications, that enables computational systems [...]]]></description>
			<content:encoded><![CDATA[<p>This last week a <a href="http://work.caltech.edu/telecourse.html" target="_blank">nine-week online course entitled &#8220;Learning From Data&#8221;</a>started, taught by by Caltech Professor Yaser Abu-Mostafa. As they promoted&#8230; &#8220;A real Caltech course, <span style="text-decoration: underline">not</span> a watered-down version, broadcast live from the lecture hall at Caltech.&#8221; The course objective is &#8220;machine learning that covers the basic theory, algorithms, and applications, that enables computational systems to adaptively improve their performance with experience accumulated from the observed data.&#8221; A <a href="http://www.amazon.com/Learning-From-Data-Yaser-Abu-Mostafa/dp/1600490069/" target="_blank">book by the same title </a>covering the same material is available.</p>
<p>I am attending (when schedule permits) because I believe that Machine Learning (ML) will (has) become a basic analysis technique of any complex system. However, I was surprised by a recent <a href="http://www.kdnuggets.com/2012/04/sceptical-of-machine-learning.html" target="_blank">poll in KDnuggets</a> that asked: &#8220;Can Machine Learning on Big Data replace Domain Expertise?&#8221; The majority (55%) felt that &#8220;there are many domains where machine learning cannot beat domain expertise&#8221;.  However, Gregory Piatetsky-Shapiro (newsletter editor) argued that there are growing number examples where ML of Big Data outperform domain expertise. Many of the Knowledge Discovery (KD) competitions over the past ten years confirmed this.</p>
<p>I believe that successful applications of ML will involve a synthesis of ML with human domain expertise. The ML component will provide hints and basis instrumentation. However, humans will provide judgment and insights based on their domain expertise. Could a naive domain expert use ML functionality to perform useful analyses? Of course. However, a savvy domain expert could leverage ML much more.</p>
]]></content:encoded>
			<wfw:commentRss>http://im-tel.org/2012/04/07/human-judgment-versus-machine-learning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Beginning of Interactive Data Visualization</title>
		<link>http://im-tel.org/2012/04/06/beginning-of-interactive-data-visualization/</link>
		<comments>http://im-tel.org/2012/04/06/beginning-of-interactive-data-visualization/#comments</comments>
		<pubDate>Sat, 07 Apr 2012 03:23:18 +0000</pubDate>
		<dc:creator>richardh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[DataViz]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[VisualAnalytics]]></category>

		<guid isPermaLink="false">http://im-tel.fragileearthstudios.com/?p=658</guid>
		<description><![CDATA[I was poking around in Nathan Yau&#8217;s FlowingData blogs and found a historical gem. On January 1, 2008, Nathan wrote a blog on John Tukey, the pioneer in exploratory statistics. I did not realize that Tukey was also a pioneer in the early use of computers for data visualization! In 1972 using &#8220;32 buttons and [...]]]></description>
			<content:encoded><![CDATA[<p>I was poking around in Nathan Yau&#8217;s <a href="http://flowingdata.com/" target="_blank">FlowingData</a> blogs and found a historical gem. On January 1, 2008, Nathan wrote a blog on John Tukey, the pioneer in exploratory statistics. I did not realize that Tukey was also a pioneer in the early use of computers for data visualization!</p>
<p>In 1972 using &#8220;32 buttons and a lightpen&#8221; on &#8220;an Information Display&#8217;s IDIIOM refresh CRT driven by a <a href="http://en.wikipedia.org/wiki/Varian_Data_Machines" target="_blank">Varian 620/i minicomputer</a> linked to an IBM 360/91&#8243;, Tukey developed the PRIM-9 program to do multivariate analysis. It handled up to 9 dimensional data with the functions of &#8220;picturing, rotation, isolation and masking&#8221;. A <a href="http://books.google.com/books?hl=en&amp;lr=&amp;id=pZTIv3uq1KsC&amp;oi=fnd&amp;pg=PA91&amp;dq=Prim-9+J.W.+Tukey,+J.H.+Friedman+and+M.A.+Fisherkeller&amp;ots=4bpFO8HsIJ&amp;sig=V7Z2242N0yUoX_oO8htmm69TPxg#v=onepage&amp;q=Prim-9%20J.W.%20Tukey%2C%20J.H.%20Friedman%20and%20M.A.%20Fisherkeller&amp;f=false" target="_blank">paper in May of 1974</a> describes the operation of his program. Particularly insightful is the Discussion section at the end, in which Tukey gives his best practices for discovering meaningful relationships hidden within the 9 dimensions of the data. Nathan suggests that the <a href="http://ggobi.org/" target="_blank">GGobi </a>visualization software by Hadley Wickham et al owes a bit of its heritage to Tukey&#8217;s PRIM-9.</p>
<p>The real treat is a <a href="http://stat-graphics.org/movies/prim9.html" target="_blank">25-minute video</a> from 1973. Take the time for view this! Despite the awkwardness of ancient computer equipment, Tukey teases out the patterns in what initially appear as random dots.</p>
<p>Toward the end of this video, I was stuck with the implications of Tukey&#8217;s data visualization work to Immersive Intelligence. Here is one of pioneers of modern data analysis showing us the value of 3-D visualizations&#8230;as an early approach to immersing oneself in the data. And this was forty years ago!</p>
]]></content:encoded>
			<wfw:commentRss>http://im-tel.org/2012/04/06/beginning-of-interactive-data-visualization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>VAST Challenge: What is a Healthy Network?</title>
		<link>http://im-tel.org/2012/03/23/vast-challenge-what-is-a-healthy-network/</link>
		<comments>http://im-tel.org/2012/03/23/vast-challenge-what-is-a-healthy-network/#comments</comments>
		<pubDate>Fri, 23 Mar 2012 19:55:48 +0000</pubDate>
		<dc:creator>richardh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[DataMining]]></category>
		<category><![CDATA[DataViz]]></category>
		<category><![CDATA[VASTchallenge]]></category>
		<category><![CDATA[VisualAnalytics]]></category>

		<guid isPermaLink="false">http://im-tel.fragileearthstudios.com/?p=635</guid>
		<description><![CDATA[In the overview blog of the VAST Challenge, we described the background and focus of the challenge, along with available data. In this blog, let&#8217;s probe the criteria for a healthy network has defined in Mini-Challenge 1A: It seems that the criteria for network health is loosely defined. Any anomaly to the normal pattern could [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a title="VAST Challenge: Initial Look" href="http://im-tel.org/2012/03/18/vast-challenge-initial-look/" target="_blank">overview blog</a> of the <a href="http://www.vacommunity.org/VAST+Challenge+2012" target="_blank">VAST Challenge</a>, we described the background and focus of the challenge, along with available data. In this blog, let&#8217;s probe the criteria for a healthy network has defined in Mini-Challenge 1A:</p>
<span id="box_typeshadow_Create_a_visualization_of_the_health_and_policy_status_of_the_entire_bank_enterprise_as_of_2_pm_on_February_2._What_areas_of_concern_do_you_observebox_"><h4><em><div class='et-box et-shadow'>
					<div class='et-box-content'>Create a visualization of the health and policy status of the entire bank enterprise as of 2 pm on February 2. What areas of concern do you observe?</div></div> </em></h4></span>
<p>It seems that the criteria for network health is loosely defined. Any anomaly to the normal pattern could be an area of concern.</p>
<span id="Normal_Operation"><h3>Normal Operation</h3></span>
<p>Normal operation is fairly easy to identify according to the data definitions. A policy status of &#8220;1&#8243; is &#8220;machine is functioning normally and is healthy&#8221;. Likewise, a activity flag of &#8220;1&#8243; is &#8220;normal with only normal activity detected on the equipment&#8221;.</p>
<p>A new table &#8216;health&#8217; was created by joining &#8216;meta&#8217; with &#8216;windowOneSingle&#8217; on &#8216;ipaddr&#8217;. This generated health records for 809,216 devices in 51 business units and 206 facilities. For policy = &#8217;1&#8242; and activity = &#8217;1&#8242;, there were 646,127 (80%) devices that were in normal operation.</p>
<span id="Normal_Business_Hours"><h3>Normal Business Hours</h3></span>
<p>We probably should add normal business hours (7am to 6pm Monday-Friday) to the criteria for &#8216;normal operation&#8217; &#8230;at least for workstation (and not for servers and ATMs). Workstations that should be turned off outside of business hours.</p>
<p>The &#8216;health&#8217; table had records for eight timezones &#8211; BMT-4 to BMT-11. This data was a snapshot at BMT = 14:00 (2:00pm) on February 2, which is a Thursday. Hence, normal business hours is only for timezones BMT-4 to BMT-7, implying the BMT-8 to BMT-11 are before normal business hours. There were records for 84,696 (10%) workstations that were in operation before normal business hours. NOTE: criteria for whether a workstation is turned off is uncertain.</p>
<span id="Normal_Maintenance"><h3>Normal Maintenance</h3></span>
<p>Another criteria for &#8216;normal operation&#8217; could be whether a device is under maintenance and whether is was &#8216;planned on a regular schedule&#8217;.</p>
<p>Out of 809, 216 devices, there were only 743 devices under maintenance (activity = &#8217;2&#8242;) across all timezones. This seems odd! Not much maintenance being performed on this network. Further, there seems to be no indication as to whether this maintenance was planned or not.</p>
<p>More analysis on this &#8216;health&#8217; table will performed using several data viz tools, like Quikview and Tableau, and reported in future blogs.</p>
]]></content:encoded>
			<wfw:commentRss>http://im-tel.org/2012/03/23/vast-challenge-what-is-a-healthy-network/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>VAST Challenge: Surveying the Geography</title>
		<link>http://im-tel.org/2012/03/20/vast-challenge-surveying-the-geography/</link>
		<comments>http://im-tel.org/2012/03/20/vast-challenge-surveying-the-geography/#comments</comments>
		<pubDate>Tue, 20 Mar 2012 15:23:29 +0000</pubDate>
		<dc:creator>richardh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[DataViz]]></category>
		<category><![CDATA[VASTchallenge]]></category>

		<guid isPermaLink="false">http://im-tel.fragileearthstudios.com/?p=625</guid>
		<description><![CDATA[In the overview blog of the VAST Challenge, we described the background and focus of the challenge, along with available data. In this blog, let&#8217;s survey the geography of this weird planet called BankWorld. It is the same size of Earth, but consist of a single large land mass, about the size of Europe and [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a title="VAST Challenge: Initial Look" href="http://im-tel.org/2012/03/18/vast-challenge-initial-look/" target="_blank">overview blog</a> of the <a href="http://www.vacommunity.org/VAST+Challenge+2012" target="_blank">VAST Challenge</a>, we described the background and focus of the challenge, along with available data. In this blog, let&#8217;s survey the geography of this weird planet called BankWorld.</p>
<p>It is the same size of Earth, but consist of a single large land mass, about the size of Europe and Asia, but situated over North American, the north part of South American and the Pacific Ocean out to the Hawaii Islands.</p>
<p>The best way to visual this geography is Google Earth, especially since the challenge designers gave a set of KML layers. So, bring up your copy of Google Earth, position it over Mexico City so that the whole globe is visible, and turn off all the layers in the Primary Database (lower left box).</p>
<p><a href="http://im-tel.org/files/2012/03/VAST-Global1.png"><img class="alignright size-medium wp-image-627" src="http://im-tel.org/files/2012/03/VAST-Global1-300x248.png" alt="" width="300" height="248" /></a>Ready? Find the folder &#8220;Mini-Challenge 1 Image and Google Earth Files&#8221; that you downloaded. I found that it was best to drag-drop each KML file individually onto the Earth &#8230;transforming it into BankWorld! Do it in this order: BankWorld, BankCenters, Large Regional Offices, Region Boundaries, Small Region Offices, Large Branch Offices, Small Branch Offices, or something like that&#8230;</p>
<p>Play with the check boxes in the upper left. You should see something like the image on the right. Click for hi res version.Get familiar with the geography and the icons. The yellow mailbox is global headquarters for the bank. The yellow &#8216;hairs&#8217; (actually pushpins if you zoom in) are each of the branch offices, each of which have lots of workstations, servers, ATMs, etc.</p>
<p>The important aspect is to note the timezones (not shown) that starts in London (does not exist in BankWorld) and proceeds west on 15 degree intervals of longitude. Hence, a place with longitude of -70.5 would be -4 hours from the BankWorld Mean Time (BMT). IOW Timezone =  integer (Longitude / 15) &#8211; 1. Now you can determine the local time at a branch office, thus monitoring whether transactions are being performed during normal business hours.</p>
<p>This is certainly a creative transformation of Earth&#8217;s geography for this challenge and a fun use of Google Earth (and other KML tools).</p>
]]></content:encoded>
			<wfw:commentRss>http://im-tel.org/2012/03/20/vast-challenge-surveying-the-geography/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>VAST Challenge: Initial Look</title>
		<link>http://im-tel.org/2012/03/18/vast-challenge-initial-look/</link>
		<comments>http://im-tel.org/2012/03/18/vast-challenge-initial-look/#comments</comments>
		<pubDate>Sun, 18 Mar 2012 19:49:17 +0000</pubDate>
		<dc:creator>richardh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[DataMining]]></category>
		<category><![CDATA[DataViz]]></category>
		<category><![CDATA[Education]]></category>

		<guid isPermaLink="false">http://im-tel.fragileearthstudios.com/?p=610</guid>
		<description><![CDATA[The Visual Analytics Community released their VAST Challenge 2012. [By the way, VAST stands for "Visual Analytics Science and Technology".] This challenge has a ten-year lineage initiated by the Human Computer Interface Lab at the University of Maryland and archived at the Visual Analytics Benchmark Repository. The challenge will conclude on July 9 and become [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.vacommunity.org/tiki-index.php" target="_blank">Visual Analytics Community</a> released their <a href="http://www.vacommunity.org/VAST+Challenge+2012" target="_blank">VAST Challenge 2012</a>. [By the way, VAST stands for "Visual Analytics Science and Technology".] This challenge has a ten-year lineage initiated by the Human Computer Interface Lab at the University of Maryland and archived at the <a href="http://hcil.cs.umd.edu/localphp/hcil/vast/archive/viewbm.php" target="_blank">Visual Analytics Benchmark Repository</a>. The challenge will conclude on July 9 and become a session at <a href="http://visweek.org/" target="_blank">IEEE VisWeek</a>, which this year is in Seattle on October 14-19.</p>
<span id="What_is_the_challenge"><h3>What is the challenge?</h3></span>
<p>The challenge deals with &#8220;Big Data&#8221; although the total amount of data is less than 10 GB. The situation is cyber-security for a large bank with hundreds of branch offices spread across a fictitious world, completed with lat/long geographic coordinates and KML annotations.</p>
<p>There are two mini-challenges, only the first of which has been released. Mini-challenge #1 is to provide &#8220;situation awareness of the cyber-health of the bank&#8217;s network. In their words, &#8220;how do you visualize data out of a network containing nearly <strong>a million computers</strong> in a way that you can perceive its state and identify problems?&#8221; Actually, the bank network consists of 895,025 IP addresses, as shown in the table at the right. <a href="http://im-tel.org/files/2012/03/BOM-Organization.png"><img class="alignright size-medium wp-image-614" src="http://im-tel.org/files/2012/03/BOM-Organization-300x105.png" alt="" width="300" height="105" /></a></p>
<p>Mini-challenge #1 requires two responses:</p>
<ul>
<li>A &#8211; Create a visualization of the health and policy status of the entire bank enterprise as of 2 pm on February 2. What areas of concern do you observe?</li>
<li>B &#8211; Use your visualization tools to look at how the network’s status changes over time. Highlight up to five potential anomalies in the network and provide a visualization of each. When did each anomaly begin and end? What might be an explanation of each anomaly?</li>
</ul>
<p>I download the data sets, consisting to three tables:</p>
<ul>
<li>Meta data about the organization and location of network nodes (workstations, routers, servers) &#8211; 1.1 M rows as 63 MB CSV file</li>
<li>Health status data about each nodes through time &#8211; 80 M rows as 7.8 GB CSV file</li>
<li>Health status data for a time window &#8211; 0.9 M rows as 40 MB CSV file</li>
</ul>
<p>Here is an <a href="http://im-tel.org/files/2012/03/metaDB-20K.zip" target="_blank">Excel file</a> (2 MB) containing the first 20K rows for each of these three tables, along with <a href="http://im-tel.org/files/2012/03/BankWorld-Documentation.zip" target="_blank">two documentation files</a> containing an overview of the banking world and a table explanation.</p>
<p>Using <a href="http://www.wampserver.com/en/" target="_blank">WAMP</a>, the data was loaded into MySQL for profiling. The status data took about an hour to load, without any performance tuning. Still loading&#8230;</p>
<span id="Update_3192012_10:05pm"><h3>Update 3/19/2012 10:05pm</h3></span>
<p>Yesterday the load of the large health status table aborted with the error &#8220;multi-statement transaction required more than &#8216;max_binlog_cache_size&#8217; bytes of storage; increase this mysql variable and try again&#8221;. After consulting with my MySQL technical wizard Roland Bouman, I disable the binary logging for replication in &#8216;my.ini&#8217; configuration file and rerun the load. It took all most two hours, but complete after loading 133 M rows. The stats on the tables are shown at the right&#8230;<a href="http://im-tel.org/files/2012/03/VAST-table-stats.png"><img class="alignright size-full wp-image-623" src="http://im-tel.org/files/2012/03/VAST-table-stats.png" alt="" width="429" height="178" /></a></p>
<span id="Why_is_VAST_Challenge_Relevant_to_Immersive_Intelligence"><h3>Why is VAST Challenge Relevant to Immersive Intelligence?</h3></span>
<p>David Burden of <a href="http://daden.co.uk/" target="_blank">Daden Limited</a> suggested that we form a team to conduct a workshop at the <a href="http://www.ndu.edu/icollege/fcvw/" target="_blank">Federal Consortium for Virtual Worlds</a> in May. As part of that workshop, we have been searching for a problem upon which to focus. So, we are discussing whether the VAST Challenge would be an appropriate context for the workshop. Please join us by commenting below, or by attending FCVW.</p>
]]></content:encoded>
			<wfw:commentRss>http://im-tel.org/2012/03/18/vast-challenge-initial-look/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Humanizing Big Data</title>
		<link>http://im-tel.org/2012/03/12/humanizing-big-data/</link>
		<comments>http://im-tel.org/2012/03/12/humanizing-big-data/#comments</comments>
		<pubDate>Tue, 13 Mar 2012 03:41:02 +0000</pubDate>
		<dc:creator>richardh</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[BigData]]></category>
		<category><![CDATA[BusinessIntelligence]]></category>
		<category><![CDATA[DataExplosion]]></category>
		<category><![CDATA[DataViz]]></category>

		<guid isPermaLink="false">http://im-tel.fragileearthstudios.com/?p=605</guid>
		<description><![CDATA[As suggested by Nathan Yau at FlowingData, I just watch a TED talk by Jer Thorp who works for NY Times as Data Artist in Residence. Amazing talk on the human element of Big Data. Excellent visualizations of Internet interactions. Watch it. It is worth the 17:29. As Nathan summarizes: People often miss this point [...]]]></description>
			<content:encoded><![CDATA[<p>As suggested by Nathan Yau at <a href="http://flowingdata.com/2012/03/06/data-in-a-human-context/" target="_blank">FlowingData</a>, I just watch a <a href="http://www.youtube.com/watch?feature=player_embedded&amp;v=Q9wcvFkWpsM" target="_blank">TED talk by Jer Thorp</a> who works for NY Times as Data Artist in Residence. Amazing talk on the human element of Big Data. Excellent visualizations of Internet interactions. Watch it. It is worth the 17:29.</p>
<p>As Nathan summarizes: <em>People often miss this point about data — that it&#8217;s a representation of the physical world — and because of that, things like uncertainty and complexity come attached to the numbers. There are also actual human beings associated with a lot of data. So while optimization, maximization, and efficiency are well and good, stories, ethics, and lessons are pretty good takeaways, too.</em></p>
<p>Jer tells us a critical aspect about the future directions for Data Science&#8230;as the intersection of science, design, and art. Further, the extent that we connect humans &#8211; their stories, dreams, experiences &#8211; to the data will be the extend that we can understand the complexity of our world. What an ideal for Immersive Intelligence!<em></em></p>
]]></content:encoded>
			<wfw:commentRss>http://im-tel.org/2012/03/12/humanizing-big-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stanford Graduate Certificate in Mining Massive Data Sets</title>
		<link>http://im-tel.org/2012/02/29/stanford-graduate-certificate-in-mining-massive-data-sets/</link>
		<comments>http://im-tel.org/2012/02/29/stanford-graduate-certificate-in-mining-massive-data-sets/#comments</comments>
		<pubDate>Wed, 29 Feb 2012 17:26:39 +0000</pubDate>
		<dc:creator>richardh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[DataMining]]></category>

		<guid isPermaLink="false">http://im-tel.fragileearthstudios.com/?p=594</guid>
		<description><![CDATA[This is not new, but this offering amazes me each time I read its description! The Stanford Center for Professional Development at Stanford University offers a &#8216;graduate certificate&#8216; in cutting edge material about Big Data and Data Mining. This is a serious tough sequence of four courses. The cost ranges from $14,000 to $17,000 and [...]]]></description>
			<content:encoded><![CDATA[<p>This is not new, but this offering amazes me each time I read its description! The <strong>Stanford Center for Professional Development</strong> at Stanford University offers a &#8216;<a href="http://scpd.stanford.edu/public/category/courseCategoryCertificateProfile.do?method=load&amp;certificateId=10555807" target="_blank">graduate certificate</a>&#8216; in cutting edge material about Big Data and Data Mining. This is a serious tough sequence of four courses. The cost ranges from $14,000 to $17,000 and will take two years to complete. Shown as follows, the four courses are taught online (with some presence on the Stanford campus).</p>
<ul>
<li><strong>Social and Information Network Analysis</strong> &#8211; how to analyze the structure and dynamics of large networks, how to model links, and how design algorithms that work with such large networks</li>
<li><strong>Machine Learning</strong> &#8211; Design and development of algorithms and techniques that allow computers to &#8220;learn&#8221; by extracting information from data automatically</li>
<li><strong>Mining Massive Data Sets</strong> &#8211; data mining of distributed file systems: Hadoop, map-reduce; PageRank, topic-sensitive PageRank, spam detection, hubs-and-authorities; similarity search; etc etc</li>
<li><strong>Information Retrieval and Web Search</strong>- efficient text indexing; Boolean and vector space retrieval models; evaluation and interface issues; Web search including crawling, etc etc</li>
</ul>
<p>Combining this material with Immersive Intelligence would be awesome! Please contact me if you are enrolled in this program.</p>
]]></content:encoded>
			<wfw:commentRss>http://im-tel.org/2012/02/29/stanford-graduate-certificate-in-mining-massive-data-sets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Going Too Far with Predictive Analytics?</title>
		<link>http://im-tel.org/2012/02/28/going-too-far-with-predictive-analytics/</link>
		<comments>http://im-tel.org/2012/02/28/going-too-far-with-predictive-analytics/#comments</comments>
		<pubDate>Tue, 28 Feb 2012 23:56:12 +0000</pubDate>
		<dc:creator>richardh</dc:creator>
				<category><![CDATA[Blog]]></category>

		<guid isPermaLink="false">http://im-tel.fragileearthstudios.com/?p=586</guid>
		<description><![CDATA[The current issue of KD Nuggets has a poll on &#8220;Was Target wrong in using analytics to find pregnant women?&#8221;. The New York Times detailed Target&#8217;s successful data mining of customer buying patterns to identify pregnant women. There has been a great negative reaction to this story, although there is considerable debate where the Right/Wrong [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.kdnuggets.com/2012/02/new-poll-target-analytics-wrong-to-find-pregnant-women.html" target="_blank">current issue of KD Nuggets</a> has a poll on &#8220;Was Target wrong in using analytics to find pregnant women?&#8221;. The New York Times detailed Target&#8217;s successful data mining of customer buying patterns to identify pregnant women. There has been a great negative reaction to this story, although there is considerable debate where the Right/Wrong line should be in Target&#8217;s situation. Even Colbert<a href="http://www.colbertnation.com/the-colbert-report-videos/408981/february-22-2012/the-word---surrender-to-a-buyer-power" target="_blank"> weighted in</a> on the controversy. In the KD Nuggets poll so far, 75% of about 250 professional data miners have felt that Target did nothing wrong. Watch as this poll unfolds, especially with the variety of comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://im-tel.org/2012/02/28/going-too-far-with-predictive-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Innovation from Cross-Disciplinary Research</title>
		<link>http://im-tel.org/2012/01/16/innovation-from-cross-disciplinary-research/</link>
		<comments>http://im-tel.org/2012/01/16/innovation-from-cross-disciplinary-research/#comments</comments>
		<pubDate>Tue, 17 Jan 2012 01:23:08 +0000</pubDate>
		<dc:creator>richardh</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[DataViz]]></category>

		<guid isPermaLink="false">http://im-tel.fragileearthstudios.com/?p=580</guid>
		<description><![CDATA[From personal experience, I knew that innovative ideas within my discipline often come from research in quite dissimilar disciplines. Michelle Borkin of Harvard University hit that nail squarely, driving it through the 2&#215;4 with this TED talk. She relates medical imaging from MRI scans to astronomy data of distant nebulae. And, then she proceeds from [...]]]></description>
			<content:encoded><![CDATA[<p>From personal experience, I knew that innovative ideas within my discipline often come from research in quite dissimilar disciplines. Michelle Borkin of Harvard University hit that nail squarely, driving it through the 2&#215;4 with <a href="http://www.ted.com/talks/michelle_borkin_can_astronomers_help_doctors.html" target="_blank">this TED talk</a>. She relates medical imaging from MRI scans to astronomy data of distant nebulae. And, then she proceeds from there. Her parting comments is &#8220;You really never know where your next great idea is going to come from.&#8221;</p>
<p>Note the many ways that 3D data is gradually emerging from research in many disciplines. I feel that our current visualization tools are not providing a smooth transition to 3D data analysis from the traditional 2D visualization approaches. Perhaps cross-disciplinary exchanges will provide the necessary catalyst.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://im-tel.org/2012/01/16/innovation-from-cross-disciplinary-research/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Finding Associations in Large Data Sets</title>
		<link>http://im-tel.org/2011/12/23/finding-associations-in-large-data-sets/</link>
		<comments>http://im-tel.org/2011/12/23/finding-associations-in-large-data-sets/#comments</comments>
		<pubDate>Fri, 23 Dec 2011 21:08:23 +0000</pubDate>
		<dc:creator>richardh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://im-tel.fragileearthstudios.com/?p=560</guid>
		<description><![CDATA[While browsing through the latest Scientific American blogs, I found an interesting item on &#8220;How to Find Meaning in a Maelstrom of Data&#8221;. Well, the article did not live up to the title, but it came close! The blog highlighted the team from MIT and Harvard who authored a research article in Science. An informative [...]]]></description>
			<content:encoded><![CDATA[<p>While browsing through the latest <a href="http://blogs.scientificamerican.com/" target="_blank">Scientific American blogs</a>, I found an interesting <a href="http://blogs.scientificamerican.com/observations/2011/12/16/how-to-find-meaning-in-a-maelstrom-of-data/" target="_blank">item </a>on &#8220;How to Find Meaning in a Maelstrom of Data&#8221;. Well, the article did not live up to the title, but it came close!<span id="more-560"></span></p>
<p>The blog highlighted the team from MIT and Harvard who authored a <a href="http://www.sciencemag.org/content/334/6062/1518" target="_blank">research article</a> in <a href="http://www.sciencemag.org/" target="_blank">Science</a>. An informative <a href="http://www.broadinstitute.org/news-and-publications/mine-detecting-novel-associations-large-data-sets" target="_blank">video </a>(4:34) is a must-see! Note the short discussion on patterns detected around 2:00. Try analyzing those patterns with the typical statistical method!</p>
<p>The problem is scanning large amounts of data to find significant associations among the variables. They proposed a new correlation statistic - <em><strong>maximal information coefficient</strong></em> or MIC &#8211; that can find significant associations between two variables or attributes. Lots of statistical methods can find associations among variables and determine their strength, but all the methods make limiting assumption and tend to have narrow applicability. The MIC statistic appears to have fewer assumptions and broader applicability.</p>
<p><a href="http://im-tel.org/files/2011/12/MIC-net.png"><img class="alignright size-medium wp-image-565" src="http://im-tel.org/files/2011/12/MIC-net-297x300.png" alt="" width="297" height="300" /></a>For example, the World Health Organization has collected 357 health related variables from over 200 countries for many years. On their website <a href="http://www.exploredata.net/" target="_blank">http://www.exploredata.net/</a> a download of the MIC program (in Java) is available, along with their sample data sets. I downloaded the MIC program and WHO data and run the pairwise analysis. It worked&#8230;in about 10 minutes of crunching&#8230;and generated a large CSV file showing over 64,000 associations, sorted on descending MIC strength. Impressive!</p>
<p>The potential is to quick map the variable association net for a large unexplored data set. An example from the article is shown in the sample at the right. A researcher can start the discovery process where the patterns are the richest.</p>
<p><a href="http://im-tel.org/files/2011/12/MIC-saves-lots-of-paper.png"><img class="alignleft size-full wp-image-573" src="http://im-tel.org/files/2011/12/MIC-saves-lots-of-paper.png" alt="" width="225" height="277" /></a>UPDATE 12/30/2011: <a href="http://www.kdnuggets.com/2011/12/broad-institute-software-finds-hidden-patterns-in-big-data.html?k11n31" target="_blank">KD Nuggets</a> caught extra commentaries to the Science article on MIC. First, the <a href="http://www.technewsworld.com/story/74026.html" target="_blank">TechNewsWorld article</a> by John Mello (12/21/2011) noted the ability of MIC to cope with noise. He also the MIC is part of a suite of data analysis tools called MINE for Maximal Information-based Nonparametric Exploration. Second, <a href="http://www.sciencedaily.com/releases/2011/12/111215141611.htm" target="_blank">Science Daily</a> reprinted an article from the Broad Institute of MIT and Harvard (&#8220;Tool detects patterns hidden in vast data sets.&#8221; <em>ScienceDaily</em>, 15 Dec. 2011. Web. 30 Dec. 2011). The article highlights that, if researchers print each potential relationship among bacteria in the human gut, the paper would reach to a height of 1.4 miles, as show in the figure on the left. Finally, here is a link to a <a href="http://www.youtube.com/watch?feature=player_embedded&amp;v=Onbn285lris" target="_blank">YouTube video</a> (4:46)  that is quite good in explaining how MIC can be used for exploratory analysis of large data sets.</p>
]]></content:encoded>
			<wfw:commentRss>http://im-tel.org/2011/12/23/finding-associations-in-large-data-sets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

