The following code addresses the following issue in the project list:
What is the probability that I belong to the Contacted Group in male category (or female category). (Clustering algorithm)
In other words, after the customer creates the profile, he/she would know if they have any chance of being contacted. Its good for site transparency but I am not sure if its good for overall business model. Any ways, it answers a VERY CRUCIAL question for the customer.
// ============================
// Create Structure
// ============================
drop mining structure contacteddetails
drop mining model contacteddetails_cl
CREATE MINING STRUCTURE contacteddetails
(
primkeyid long key,
contacted long discrete,
gender text discrete,
country_located text discrete,
country_origin text discrete,
nativelang text discrete,
ethnicity text discrete,
visastatus text discrete,
incomerange text discrete,
occupation text discrete,
education text discrete,
status_marital text discrete
)
ALTER MINING STRUCTURE contacteddetails
ADD MINING MODEL contacteddetails_cl
(
primkeyid,
contacted,
gender,
country_located,
country_origin,
nativelang,
ethnicity,
visastatus,
incomerange,
occupation,
education,
status_marital
) USING microsoft_clustering
INSERT INTO MINING STRUCTURE contacteddetails
(
primkeyid,
contacted,
gender,
country_located,
country_origin,
nativelang,
ethnicity,
visastatus,
incomerange,
occupation,
education,
status_marital
)
OPENQUERY([dsg],'
select
primkeyid,
contacted,
gender,
country_located,
country_origin,
nativelang,
ethnicity,
visastatus,
incomerange,
occupation,
education,
status_marital
from contactedprofiles
')
select flattened
t.primkeyid,
topcount(predicthistogram(cluster()),$probability,3)
/* ,
cluster(),
clusterprobability('Cluster 1'),
clusterprobability()*/
from
contacteddetails_cl
NATURAL PREDICTION JOIN
(
select 3333 as primkeyid,
null as contacted,
'male' as gender,
'united-states' as country_located,
'russia' as country_origin,
'any' as ethnicity,
'russian' as nativelang,
'american citizenship' as visastatus,
'any' as incomerange,
'teacher' as occupation,
'bachelors degree' as education,
'never married' as status_marital
) AS t
HTH..
ZULFIQAR SYED
Have you mined your data today?

Enjoying your blog. Keep it up. It was interesting to see which columns you choose to model in this post.
Posted by: furmangg | March 20, 2006 at 08:37 AM
Thank you for encouragement... It really helps in this cold and lonely data mining world .. LOL...
Posted by: Zulfiqar Syed | March 20, 2006 at 09:41 AM
Hi,
I am using SSAS for the first time and i have been asked to capture events from a particular table . The data looks somethin like this.
Students Table (Data Source)
StudentID grade DOB Enrollltype........
100012 A2 02/85 N
I have been asked to use SSAS and sequence clustering algorithm to capture the events based on conditions .The following event sequence are of our interest to capture from student history table:
· Student Enrollment events:
o when studentID (j-1) <> studentID(j) & enroltype(j) ="N" --> enroll happens at "j"
o when studentID (j-1) <> studentID(j) & enroltype(j) ="EO" --> enroll happens at "j"
o when studentID (j-1) <> studentID(j) & enroltype(j) ="IT" --> enroll happens at "j"
· Student EXIT events:
o when studentID (j) = studentID(j+1)= studentID(j+2) & enroltype(j) = enroltype(j+1) = enroltype(j+2) ="A" --> Exit(Dropout) happens at "j"
o when studentID (j) <> studentID(j+1) & enroltype(j) = "OT" --> Exit(OutTransfer) happens at "j"
· Student Center switch events:
o when studentID (j-1) = studentID(j) & enroltype(j) ="IT" --> Center change happened at "j"
· Student break events:
o when studentID (j) = studentID(j+1) = studentID(j+2) & enroltype(j) = enroltype(j+1) ="A" & roltype(j+2) ="R" --> Absence (2 months) happens at "j"
o when studentID (j) = studentID(j+1) & enroltype(j) ="A" & roltype(j+1) ="R" --> Absence (1month) happens at "j"
· Student break events:
o when studentID (j-n) = studentID(j-n+1) = ... = studentID(j-1) =studentID(j) &
KL(j-n) <> KL(j-n+1) &
KL(j-1) <> KL(j) &
KL(j-n+1) = KL(j-n+2) = ... = KL(j-1) --> Level completion of KL at "j-1"
o when studentID (j-n) = studentID(j-n+1) = ... = studentID(j-1) &
studentID(j-1) <> studentID(j)
KL(j-n) <> KL(j-n+1) &
KL(j-n+1) = KL(j-n+2) = ... = KL(j-1) --> Level completion of KL at "j-1"
When above patterns are observed, the following data should be generated.
· Enrollment event
o StudentID,
o year/month
o enroltype
o starting school grade
o placement (KL and WS#)
· Enrollment event
Please help ......Thanks in advance
Posted by: DSH | June 17, 2009 at 11:43 AM